Here's my proposed plan to use the logs from the above script to perform diagnosis -
Determining candidate workloads for termination: When information collected by the overall server monitoring script suggests a problem (based on thresholds defined), and decision is to terminate a workload, 2 successive snapshots of the edus log should help determination which thread and therefore application Id/workload (using the agents log) is consuming highest percentage CPU. The DB snapshot currently generated every half hour is NOT a good indicator for this information since it presents cumulative CPU information which might place a low % CPU consuming low priority workload ahead of a high % CPU high priority workload Determine issue with archival logging: Symptom: Applications report timeout-based commits
Suggested Review: One item to check is the health of the logging process.
Check if the Archival Status has reported failures - if so, work with storage to assess disk health in addition to reviewing the diagnostics for more detailed errors
Compare 'Next Log to Archive' value to 'Current Log Number'. If the former is lower, the log path is about to fill up reducing application throughput and possibly causing application failures
Determine sub-optimal access path:
Symptom: Applications report increased execution time for several workloads
Suggested Review: Examine the dbptnmem log. If a significant amount of memory is being consumed for FCM buffers in comparison with DB and APPL memory, consult the FCM logs to identify application Ids consuming the highest number of FCM buffers. Review these workloads for non-collocated joins and assess if MQTs or application changes could assist Examine at least 2 consecutive bufferpool logs. Determine bufferpool with the highest Data incremental reads and writes. Cross reference this with current workloads and first determine if the workload should even be allowed consumption of as much resources (possible, when workloads kicked off during the day continue well into night). If not, terminate the workload releasing resources for other workloads. If yes, determine if data reads seem to dominate - and subsequently check for missing indexes Selectively run tcbstats generating incremental snapshots to see which tables are being scanned the most, and then trace back to the workload (reported in tcbstats). Assess if indexes, or MQTs, or MDCs will help