Results 1 to 9 of 9
  1. #1
    Join Date
    Jan 2016
    Posts
    5

    Unanswered: DB2 HADR cannot be synced

    Hi all,
    I had running DB2 HADR on TSM database to standby server for few weeks, and last week we need to restart HADR node for some HW reason. When we tested restart of HADR node and resync HADR after restart before, all went fine, it works automaticaly OK.

    But now, after a month of HADR works and be synced, HADR DB did not start after HADR node restart and when I tried to start HADR by command by hand

    "db2 start hadr on db tsmdb1 as standby"

    it attempts to resync for a while but then fall down again. I cathed the status within the attempt and get this:


    smbth04:/tsmb04/home > db2pd -hadr -db tsmdb1

    Database Member 0 -- Database TSMDB1 -- Standby -- Up 0 days 00:00:08 -- Date 2016-01-25-14.30.25.496917

    HADR_ROLE = STANDBY
    REPLAY_TYPE = PHYSICAL
    HADR_SYNCMODE = ASYNC
    STANDBY_ID = 0
    LOG_STREAM_ID = 0
    HADR_STATE = REMOTE_CATCHUP_PENDING
    HADR_FLAGS =
    PRIMARY_MEMBER_HOST = smbtsmb04.sap.skoda.vwg
    PRIMARY_INSTANCE = tsmb04
    PRIMARY_MEMBER = 0
    STANDBY_MEMBER_HOST = smbth04.sap.skoda.vwg
    STANDBY_INSTANCE = tsmb04
    STANDBY_MEMBER = 0
    HADR_CONNECT_STATUS = CONNECTED
    HADR_CONNECT_STATUS_TIME = 01/25/2016 14:30:25.350280 (1453728625)
    HEARTBEAT_INTERVAL(seconds) = 30
    HEARTBEAT_MISSED = 0
    HEARTBEAT_EXPECTED = 0
    HADR_TIMEOUT(seconds) = 120
    TIME_SINCE_LAST_RECV(seconds) = 0
    PEER_WAIT_LIMIT(seconds) = 0
    LOG_HADR_WAIT_CUR(seconds) = 0.000
    LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000000
    LOG_HADR_WAIT_ACCUMULATED(seconds) = 188.232
    LOG_HADR_WAIT_COUNT = 4911489
    SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 131400
    SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 65700
    PRIMARY_LOG_FILE,PAGE,POS = S0000124.LOG, 5891, 66308511328
    STANDBY_LOG_FILE,PAGE,POS = S0000118.LOG, 3795, 63094473860
    HADR_LOG_GAP(bytes) = 0
    STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000118.LOG, 3795, 63094473860
    STANDBY_RECV_REPLAY_GAP(bytes) = 0
    PRIMARY_LOG_TIME = 01/25/2016 14:30:09.000000 (1453728609)
    STANDBY_LOG_TIME = 01/19/2016 12:33:26.000000 (1453203206)
    STANDBY_REPLAY_LOG_TIME = 01/19/2016 12:33:26.000000 (1453203206)
    STANDBY_RECV_BUF_SIZE(pages) = 4096
    STANDBY_RECV_BUF_PERCENT = 0
    STANDBY_SPOOL_LIMIT(pages) = 4194304
    STANDBY_SPOOL_PERCENT = 3
    STANDBY_ERROR_TIME = NULL
    PEER_WINDOW(seconds) = 0
    READS_ON_STANDBY_ENABLED = N



    It can be seen, that it is waiting for retrieve logs from primary, but ....

    I noticed line "STANDBY_LOG_FILE,PAGE,POS = S0000118.LOG, 3795, 63094473860" which is some old log from 12/2015, which I guess it is the time of first HADR init after DB restore. It is weird for me, why it needs this old log? Of course it is not available on primary side anymore, last available log is S0000125.LOG from Dec 27 09:00.

    In oposite, line "STANDBY_LOG_TIME = 01/19/2016 12:33:26.000000 (1453203206)" corresponds with time of HADR node restart, from which HADR is not working. Why the standby database do not use this timestamp for retrieving last log not send to standby DB?

    It is possible someway to force HADR DB forgot S0000118.LOG and resync only missing data from point 01/19/2016 12:33:26.000000 from primary logs without restoring whole DB from primary again?

    Thanks for any advice.
    Thomas

  2. #2
    Join Date
    Jul 2013
    Location
    Moscow, Russia
    Posts
    666
    Provided Answers: 55
    Hi,

    What appears in the db2diag.log files on both servers when you try to activate the standby db?
    Regards,
    Mark.

  3. #3
    Join Date
    Jan 2016
    Posts
    5
    Hi Mark, log from only one attempt to start HADR has 56000 lines (tried now).... and I don't know what exactly to search in log .... only what I can do, is put it here as attachment for you to review ...

    Could you be so kind, please?

    Many thanks, Thomas
    Attached Files Attached Files

  4. #4
    Join Date
    Jul 2013
    Location
    Moscow, Russia
    Posts
    666
    Provided Answers: 55
    What's the db2 version and the fixpack?
    Regards,
    Mark.

  5. #5
    Join Date
    Sep 2012
    Posts
    11
    Provided Answers: 1
    Hi,

    Where are you archiving logs to on Primary ?

    Based on what is seen in the provided log DB2 encounters a 'bad log'.
    Ideally if you can afford it restoring database on standby from online backup made on primary will resolve this issue.

    Something I would test if applicable:
    Disable access for standby database to primary's log archive media and try to activate HADR. DB2 might skip it and 'assume' it is applied.

  6. #6
    Join Date
    Jan 2016
    Posts
    5
    Hi all,

    DB2 version is:

    smbsbgw1p:/tsma04/home > db2 connect to tsmdb1

    Database Connection Information

    Database server = DB2/AIX64 10.5.5
    SQL authorization ID = TSMA04
    Local database alias = TSMDB1

    smbsbgw1p:/tsma04/home >


    It is enclosed DB2 under Tivoli Storage Manager server on AIX.


    Problematic log is not available anymore on primary, neither in archive logs, because they are cleared on every backup DB performed on daily basis. Problem is that by my opinion, it should not get this log (S0000118.LOG) at all, because I guess it was active log from some day of 12/2015, when HADR was activated for the first time.

    When I list actual active log directory on primary, I can see this:

    smbsbgw2p:/tsmb04/actl/NODE0000/LOGSTREAM0000 > ls -l
    total 33758216
    drwxr-x--- 2 tsmb04 tsmsrvrs 256 Jan 26 10:26 LOGSTREAM0000
    -rw------- 1 tsmb04 tsmsrvrs 104071168 Jan 26 09:00 S0000124.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 26 14:40 S0000125.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Dec 28 09:01 S0000126.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Dec 28 16:41 S0000127.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Dec 29 09:00 S0000128.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Dec 30 09:00 S0000129.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Dec 31 09:00 S0000130.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 1 09:01 S0000131.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 2 09:01 S0000132.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 3 09:00 S0000133.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 4 09:00 S0000134.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 5 09:00 S0000135.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 6 09:00 S0000136.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 7 09:00 S0000137.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 8 09:00 S0000138.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 9 09:01 S0000139.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 10 09:00 S0000140.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 11 09:00 S0000141.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 12 09:00 S0000142.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 13 09:00 S0000143.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 14 09:00 S0000144.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 15 09:01 S0000145.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 16 09:01 S0000146.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 17 09:00 S0000147.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 18 09:00 S0000148.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 19 09:00 S0000149.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 20 09:00 S0000150.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 21 09:01 S0000151.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 22 09:01 S0000152.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 23 09:00 S0000153.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 24 09:00 S0000154.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 25 09:00 S0000155.LOG
    -rw------- 1 tsmb04 tsmsrvrs 536879104 Jan 26 09:00 S0000156.LOG
    -rw------- 1 tsmb04 tsmsrvrs 512 Oct 1 11:51 SQLLPATH.TAG


    ... so oldest available log is S0000126.LOG from DEC 28, from that I guess that S0000118.LOG was sometimes around at DEC 20, which corressponds with date I initialized HADR.

    And I am sure, that HADR was still synced at Jan 25 before HADR node restart (I checked it). From that I thought, that standy DB should start retrieving on S0000155.LOG which is available, instead of S0000118.LOG .... this is what I don't understand.

    Thomass
    Last edited by thomas66; 01-26-16 at 10:05.

  7. #7
    Join Date
    Jan 2016
    Posts
    5
    Hi,

    I want to note, that I can do new DB2 backup and restore to standby and reinitialize HADR, it is not problem, so I can solve it .... problem is why it happened, I need to know if I am not doing something wrong in HADR configuration or so...

    Because I was facing the same trouble some time ago on another TSM server with HADR on Solaris, and I did not get the cause of it ... I did restore DB to standby and reinit HADR again without knowing why it happens. It also wants some old log already not present on primary.

    Regards, Tomas

  8. #8
    Join Date
    Jul 2013
    Location
    Moscow, Russia
    Posts
    666
    Provided Answers: 55
    You should open a PMR with IBM on this.
    They can try to analyze the situation from the very beginning (using db2diag.log files from both sides probably) and help you with this.
    Regards,
    Mark.

  9. #9
    Join Date
    Jan 2016
    Posts
    5
    Yes, it was my second alternative, if nobody will be able to give me quick reasonable advice . You know how it works with IBM PMR's ... I had tens of them opened in last 2 years and it is always long distance run

    Anyway, thanks for your try, Mark.

    Regards, Tomas

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •