DB2 Server Enterprise version 9.5 on Windows Server 2008 SP2 - INTEL 4 64 bit procs 64 GB memory
DB2 Service is programmed to Restart in case of failure.
The DB keeps restarting over and over. There are periods when the feeding applications are idle and the DB is stable. This two conditions occuring intermittently.
Windows logs return the following message for each time the DB restarts:
Faulting application db2fmp64.exe, version 9.5.500.784, time stamp 0x4b0c1399, faulting module DB2APP64.dll, version 9.5.500.784, time stamp 0x4b0c0d47, exception code 0xc0000005, fault offset 0x0000000000633474, process id 0x15d8, application start time 0x01cd08811815919d.
Are there any errors logged in the db2diag.log prior to "Crash Recovery" messages? Do you see any trap files in the diagpath? The messages indicate that db2 performed crash recovery for db TOOLSDB. Did db2 perform crash recovery for other databases as well? Do you have scripts that execute db2pdcfg - https://www-304.ibm.com/support/docv...id=swg1IC67574
Instance crash needs to be addressed by DB2 support. I'd suggest to open a pmr and consider upgrading to a more current fixpack.
Hi, indeed I had to get in touch with IBM guys to resolve this.
About your questions:
No, there were no errors prior to crash recovery that could indicate a cause.
Yes, every time the system reinitiated a trap file was created.
Yes, there were many crash recoveries on our main database too. There are only this one and the Toolsdb at this instance.
Now things are not totally resolved. I moved the DB to other machine, and it happened just the same way (!!!).
So I was told to create a new table, after locating the one that actually was defective, and importing the data (some 15.000.000 rows). The support thinks that it is caused by erratic disk access problems, so the table became corrupt.
After all this, I still have crashes happening, but very sparsely, if compared to how it happened last week.
I'm still in contact with the IBM guys so we finally stabilize this DB.
After struggling for days I've finally discovered the cause and the solution for the DB going off and on repeatedly.
There is a Java application responsible for populating the main table in my DB. It resides in an application server separated from DB2 server. All data are collected from log transaction machines, consolidated and sent to DB2. The way programmers developed this app make it check the connection with DB2 by simply selecting a row from a control table called test, before sending a whole set of rows (thousands actually). This control table is supposed to have only one row. It happens that something (or someone, I don't know yet) was deleting the contents of the table. So the Java app assumed that the connection was down and made another one, and so on... At the end the DB2 service simply goes down, probably because there are so many connections it could not handle them any more.
-Recreate the missing row (Just in case, I put some more rows).
-Change permissions so table users only could select date not delete.
A whole day has passed and no problem so far. I think it might mean something good at this point ;-)
But now another question:
Is there a way to check if a DB2 connection is still up in a Java program without having to make a select in a control table?
I would be good giving Java programmers an option to the current method.
Fengsun2, I was just told that it was tried before and the method really works only on Java environment. Though, DB2 simply kills the connection after some time in idle state and apparently Java does not take notice of this. No matter the Java application "thinks" it's still valid and tries to use the connection, it in fact can't.
Why don't your dev team just to catch the sqlexception to handle connection invaid event?
It is simple and need not to select a extra table。
furthermore, you can use sysibm.dummy1 instead of your test table to prevent anybody oprating the table unexpectly。
you can use sysibm.dummy1 instead of your test table
Hummm, I think that's what we needed!
Problem with catch is that Java thinks that a DB2 connection exists when in fact it's over. I don't know why this is so, but when this happens, Java server tries do send rows to the DB and sometimes, as (for DB2) the connection is over, the rows are lost. It's very important no rows are lost in the process. That's why the dummy table is critical, because Java tries to read this table constantly to make sure the connection is up, before sending any data.