I have a process that runs everyday, lots of short jobs usually back to back, loading data into tables. This process has been around since our business was tiny, but over the past few years we hacve grown alot and now from time to time I have seen the following error messages. Each time I see the error its always it seems like its when my data loads are larger than normal and the process is very long running (12+ hours) and that seems to be the only thing in common during the errors.
I know from experieince that I can usually just resubmit the same exact job and data and its likely to run fine on the second try. I know you dont know my code and table structures, but whats your knee jerk opinion of whats going on? I am a sysadmin and not a programmer, but my opinion is that the application lacks any lock maangement logic and the fact that it runs as well as it does normally might be the product of good luck/good timing?
Cannot read system catalog catalog-table.
The database server refers to the tables of the system catalog while it processes most statements. When it cannot read one of these important tables, a serious error results. Check the accompanying ISAM error code for more information. The effect of the error depends on the statement that is being executed and the particular table, as follows:
* CREATE TABLE statement, systabauth not read; the table is created, but PUBLIC is not granted authorization as it normally is.
* DROP TABLE statement, systables not read; no action taken.
* DROP TABLE statement, sysviews not read; the table is dropped but any views that depended on the table were not automatically dropped.
* DROP VIEW statement, sysviews not read; no action taken.
* DROP INDEX statement, sysindexes or systables not read; no action taken.
* DROP SYNONYM statement, systables or syssynonyms not read; no action taken.
* DROP DATABASE statement, systables not read; no action taken.
* START DATABASE statement, systables not read; no action taken.
* DATABASE statement, systables or sysusers not read; the database was not selected (no current database; for subsequent operations, see error -349).
Other statements may be partially complete before the error is detected. Roll back the current transaction and then investigate the cause of the error. Use the bcheck or secheck utility (tbcheck with INFORMIX-OnLine or oncheck with INFORMIX-OnLine Dynamic Server) to check and repair indexes. If necessary, restore the database
from backup and logical-log tapes.
ISAM error: key value locked.
The current operation inserts a row with a certain primary key value or updates a row with a certain primary key value, but a transaction that has not yet been committed has deleted that key value from the index. This error occurs only when the lock mode is set to NOT WAIT. Treat it the same as error -107 (record is locked). Roll back the current transaction, and re-execute it after a delay. Then, if the other transaction was committed, the lock no longer exists. If it was rolled back, the key exists, and this operation receives a duplicate-key error.
Your locking assessment "good luck or timing" is probibly correct. However, if these jobs are indeed "lots of short jobs usually back to back" vs. running concurrently I'm not sure you have a concurrency issue.
By chance are you getting into a situation where Informix forces a rollback recovery? Like long transaction abort? The single task itself might not be causing this but running in conjuction with other activities you could be hitting you high water mark. It's at that point where the rollback might tie up your system tables while it proceeds
Good morning! The thought had occurred to me, but it seems like everytime this happens I never have a chance to see whats going on until after the fact. Would a long transaction issue be logged in the online log?
One more thing, we do have lots of other apps running concurrently with this job. When I said back to back I was trying to say that this one job runs many times back to back - along side of other user jobs, queries, menu systems, lookup apps that run at the same time.