Results 1 to 2 of 2
  1. #1
    Join Date
    Sep 2004
    Posts
    2

    Question Unanswered: DB2 Retry Logic , Deadlock handling & Simulating for Testing

    Hi ,

    I have a requirement of adding retry logic in Cobol-DB2 based application programs.
    The SQLCODEs that I need to handle are -911, -913 & -904 and for these SQLCODES I need to bring in a retry logic.

    Background information for these SQLCODEs.

    -911 -> When this SQLCODE is encountered , a deadlock has occurred on the transaction and an implicit rollback is performed by DB2
    -913 -> A deadlock has occurred on the transaction and an implicit rollback has failed / not occurred on the transaction.
    -904 -> A resource unavailable condition has occurred and no rollback will be ever committed by DB2.

    According to me, the different ways to do a retry when either of these SQLCODEs occur are:

    Retry during -911 ->

    For normal select queries in embedded cobol sql programs, the select query can be retried for a certain period of time (2 mins) and if retry is successful then proceed with normal execution. (It is assumed that there are not insert or updates happening before).

    If -911 deadlock happens in a program where there are inserts or updates happening , after a rollback everything from the tables into which data was inserted during this transaction would vanish. Hence there should be a commit happening after a certain number of records are Inserted/Updated.
    During rollback, which would happen till the last commit point , the processing needs to be resumed using a checkpoint restart method.

    If -911 occurs while using a cursor , the situation becomes really grim. All the cursors used & open till deadlock gets closed. Hence the fetch operation for the remaining records needs to be done again. Please let me know a better way of doing this.

    Retry during -913 & -904 ->

    Since no rollback occurs during this case, the corrsponding queries can be retried and continued is retry is successful.

    Can someone please suggest some efficient way of handling retry logic?
    It would be required to test the changes involving retry logic in the programs. This can be done by simulating the appropriate deadlock/resource unavailable condition.

    It is known that by locking or deleting tables, these conditons can be simulated. Is there a way to simulate deadlock -913 SQLCODE?

    Please let me know if you have any better methods for checking /handling & simulating deadlock conditions and using retry logic.

    Regards,
    Kishalay

  2. #2
    Join Date
    Feb 2005
    Location
    United States
    Posts
    22

    Unhappy -911 and rollbacks (We don't want a rollback !!!!)

    Recently, we have hit upon a -911/rollback production problem, that was indirectly caused by retry logic, so I am in a little bit of a position to answer your question about retry logic and simulating a -911 or -913. (forgive the lateness).
    The problem happened in our COBOL-DB2 batch processes, while running DB2 version 7 for Z/OS, and let me state that the problem was exacerbatted by the fact that we coded RETRY logic. While this may have been a good choice for a CICS DB2 application or ORACLE application, it is not always a good choice for DB2 batch. Because we were not aware that in batch, a
    -911 does an implicit rollback and closes all cursors as soon as it is encountered. Therefore, retrying is ill-advised, unless one has countered with sophisticated 'on-the-fly' recovery logic.
    If someone out there reading this, is aware of how to control a batch DB2 process so that a timeout does not cause a rollback, please advise. In our situation, we preferred that just the immediate transaction failes. A full rollback is not desired at all. In CICS you can control this with the ROLBE RCT parameter. In ORACLE, a timeout does not automatically cause a rollback. That is for the application to decide. So DB2 has truely dissappointed us with this drastic measure of doing a rollback when a timeout happens.

    As far as simulating a -911, we used a DB2 edit tool (such as PLATINUM RC) to lock the table; and then we ran our batch job during this edit session. To simulate a -913, the only way was inject some 1-time testing code into our program that based on the last digit of the seconds component of the current-time, it would move -913 into the SQLCODE. This 1-time code also contained an explicit rollback (which is what one is supposed to do anyway when one gets a -913)

    We applied retry logic by simply performing any SQL insert or update through a perform varying until the loop counter was greater than 3. (this would allow up to 3 attempts). I.E.
    PERFORM 5000-INSERT THRU 5000-INSERT-EXIT
    VARYING LOOP-CNTR FROM 1 BY 1
    UNTIL LOOP-CNTR > 3
    OR GOOD-INSERT OR BAD-INSERT.

    Within the actual routine that did the insert of update, we would have the following code after the insert or update attempt:

    5000-INSERT.

    IF (GOOD-INSERT OR BAD-INSERT)
    GO TO 5000-INSERT-EXIT
    END-IF.

    EXEC SQL
    INSERT etc.. etc..
    END-EXEC.


    EVALUATE SQLCODE
    WHEN 0 SET GOOD-INSERT TO TRUE
    GO TO 5000-INSERT-EXIT
    WHEN -911
    WHEN -913
    IF LOOP-CNTR = 3
    SET BAD-INSERT TO TRUE
    MOVE SQLCODE TO WS-HOLD-SQLCODE
    GO TO 5000-INSERT-EXIT
    ELSE CONTINUE
    END-IF
    WHEN OTHER
    SET BAD-INSERT TO TRUE
    MOVE SQLCODE TO WS-HOLD-SQLCODE
    GO TO 5000-INSERT-EXIT

    END-EVALUATE.

    5000-INSERT-EXIT.
    EXIT.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •