Hi,
We have been struggling with a really tricky issue with our DB2 v10 on z/OS.
We are trying to migrate DB2 v8 to v10 on z/OS in the last four weeks. After migration, the database becomes unstable during the load test. One table will have a long running reader that never returns after over 12 hours, from DB2 logs, we can see the long running reader warnings. DBA can't even kill the thread. While the other tables in the same schema are still accessible. The application is a Java application and have stuck threads in Socket Read waiting for the response. The whole DB2 subsystem needs restart to recover. Once DB2 subsystem recycled, we could have many successful load tests, but typically across overnight, it stuck next day during the load test. There are four tables so far that experiencing this stuck reader issue, those four tables are the largest tables and have the most intensive inserts and queries in the application. The SQLd that get stuck are just some basic insert statements and very simple queries with a single search parameter with an index hit. The query only returns the surrogate key. When the DB2 is in good shape, it typically take 2 or 3 ms to return. The problem seems not that relevant to the intensity of the load, but seems more relevant to the duration of DB2 subsystem being online.
There are a total of three subsystems, but so far the sysplex was configured to load balance the load to only one subsystem.
We are trying to tweak some global buffer pool sizes, but the problem still persists.
Could anybody point us the right direction?
Thanks