Unanswered: SQL1224N when max_log < total active utilisation?
A bit of a DB2 noob here. I've a "black box" DB2 v184.108.40.206 enterprise ed instance used by a TSM 220.127.116.11 server (on RHEL 5.5). Yep, I'm a storage admin who's keen to understand what's going on with a backup server.
My question may well boil down to: what is the definition of an "active log"?
The DB2 instance is configured in rollforward mode. logarchmeth1 is configured as a disk based method (i.e. bung 'em in a directory over *there*). Under normal operating conditions pretty much as soon as an active log extent fills it is copied out into the archive log directory...and the primary extent is recycled.
The application (TSM) is running a large transaction. The total transaction size exceeds the defined number of primary log extents (15x 512 MiB in my case, or 7680 MiB).
The application detects that the primary log space is nearly exhausted and runs what it terms to be an incremental database backup.
While this long-running transaction executes each active log extent is copied to the archive log directory as per normal operating procedure (the db2 log manager, I guess) - but the active log extent is not recycled (from what I understand it is still required for crash recovery).
However - once the application initiated incremental backup has finished the active log extents *have* been recycled. This was confirmed by looking at the snapdetaillog.
The ringer though - the long running transaction (which persisted through the incremental backup) terminates with SQL1224N. At this point only around half of the first (and current) active log extent is full. LOG_MAX is set to 90% of primary space (i.e. 13.5ish of 15 extents).
The double-ringer with cheese - although the transaction aborted...*it didn't rollback*.
Why is it that, in the absence of any db2 level incremental, the active log extents weren't getting recycled? According to my understanding this is because the transaction data was still active (i.e. not yet committed).
If these logs were still active, why did the incremental backup recycle them? Surely they were still needed in case a rollback was required?
Given the above - why did the unit of work abort with SQL1224N given that total primary log space utilisation at the point of the abort was under 10%, and LOG_MAX is set to 90%?
Sorry, I'm obviously failing to understand something fairly crucial here. Will someone help put me out of my misery?
Is TSM using a 'regular' Version of DB2. I always thought that TSM is using a specialized Version of DB2.
However, in DB2 the Backup of the Database has no impact on the size of the Transactionlogs. You could keep the initial Backup and all the archived transactionlogs and would be able to restore the database (beside load ... nonrecoverable and not logged initially). The resore would take ages since you have to run thru all the archive logs. The backups give you the option to delete older backups and archived transactionlogs.
Transaction logfiles are reused (or archived) when all transactions in the file are commited (beside inifinite logging).
I believe that it is a "regular" db2, albeit perhaps packaged a little differently. Under TSM v5 there were elements from the db2 engine built directly into the TSM server, but with TSM v6 it runs as a completely separate engine.
I hear what you mean re the active log extents - it may be that there's just something odd about the way TSM is using DB2. From the reading I've been doing since I posted my original query I'm tending towards this theory: The log buffer in DB2 is filling...write ahead causes the log pages to be written out into the active log directory...as active logs fill they get archived (again, write ahead - despite the fact that they are still active)...this backup task (whatever it is) is causing the files in the active log directory to be recycled, but the "source of truth" (i.e. the log buffer) still knows about the inflight transactions - and hence the particular UOW thats executing still aborts with an out of log space condition (despite the abundance of disk space in the active log dir).
Hope that makes sense. If you can offer any insight as to how/why the DB2 layer is (or isn't) reducing crash protection by permitting the files in the active log dir to be removed (recycled) before the UOW has completed I'd be very appreciative.
Some of your reasoning (re: log buffer and archiving of active logs) and terminology (log extents?) does not make much sense to me. Take a look at the number of log files configured in your database (LOGPRIMARY and LOGSECOND).
By the way, SQL1224N has nothing to do with log space.
logprimary = 15, logsecondary = 0. Apologies if my terminology is awry...I've been using the term "extent" to denote a log file on disk (i.e. logprimary = 15 "extents").
From my understanding of SQL1224N I was under the impression that reason code 3 (according to the inbuilt CLP help) was the issue...to whit:
"The application was forced off because it was using more transaction log space than allowed by the database configuration parameter max_log or num_log_span."
As previously mentioned max_log is set to 90%. The error I'm seeing is being interpreted by the application layer as "out of log space". I figured that 90% of the 15 primary extents were being utilised by a single UOW.
Hope that explains my thinking a little better - as you've spotted I might be a bit confused about it all though
The log space available for a transaction is limited by the lesser of disk space or the number of log files, which is 15 in your case. Depending on the size of one log file (LOGFILSIZ), that may be insufficient.
Aye, its certainly not sufficient for the unit of work that's being performed...that doesn't worry me so much.
What I'm curious about is the interaction between the log buffer and the representation of the log buffer contents within the active log directory...the problem that my system is exhibiting is that a backup task is clearing out the active log directory (i.e. recycling the file names) but the job is still terminating with an "out of log" condition despite the fact that there appears to be only about 5% utilisation in the directory at the time.
The only way I can think of for this to happen is if the log buffer and the contents of the active log directory are going out of sync. If that's the case I suspect it will negatively impact the crash recovery of the environment.
I have a decision to make - do I deliberately misconfigure the the triggered backup job within the application to preserve crash recovery integrity within the database layer?