Results 1 to 5 of 5
  1. #1
    Join Date
    Mar 2011
    Posts
    9

    Unanswered: TSA HADR standby marked bad after trying to reduce a tablespace size on primary ????

    Hello dear colleagues;

    I'm a bit puzzled:
    Today I reduced the size of a tablespace on a primary db in a DB2 TSA HADR cluster,
    and this caused the standby db to become "marked bad" and stopped the Master replication.
    How can this happen?

    What I did extactly is this: (all done on the primary of course)
    1. I first dropped some tables which where no longer needed. (all in the same tablespace)
    2. Then I did a reorg table on all remaining tables in the tablespace to get the high watermark down.
    3 Then I did a alter tablespace reduce... a couple of times. 3 times it worked, 2 times it didn't because there was not enough space.

    -> In the end the standby got "marked bad".
    May main question is: How can reducing the tablespace size kill a standby database?

    Info:database = highpeak
    servers:
    - primary = alx00005
    - standby = alx00006
    - TSA HADR cluster = RedHat linux servers, NO shared storage system.

    Below some more info via lssam, db2pd, and the db2diag.log of the standby:

    1. lssam output:
    Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HIGHPEAK-rg Request=Lock Nominal=Online
    '- Online IBM.Application:db2_db2inst1_db2inst1_HIGHPEAK-rs Control=SuspendedPropagated
    |- Online IBM.Application:db2_db2inst1_db2inst1_HIGHPEAK-rs:alx00005
    '- Offline IBM.Application:db2_db2inst1_db2inst1_HIGHPEAK-rs:alx00006

    2. db2pd output of the primary db:
    db2pd -hadr -d highpeak

    Database Partition 0 -- Database HIGHPEAK -- Active -- Up 18 days 03:00:48 -- Date 06/10/2011 20:44:43

    HADR Information:
    Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
    Primary Disconnected Nearsync 0 7674895

    ConnectStatus ConnectTime Timeout
    Disconnected Fri Jun 10 20:11:03 2011 (1307729463) 30

    PeerWindowEnd PeerWindow
    Fri Jun 10 20:14:23 2011 (1307729663) 200

    LocalHost LocalService
    10.104.116.25 db2_hadr_highpeak

    RemoteHost RemoteService RemoteInstance
    10.104.116.26 db2_hadr_highpeak db2inst1

    PrimaryFile PrimaryPg PrimaryLSN
    S0007324.LOG 4987 0x0000002AA933314F

    StandByFile StandByPg StandByLSN
    S0007324.LOG 2958 0x0000002AA8B46B73

    3. Resume of the db2diag.log of the standby:2011-06-10-20.11.00.505903+120 I37115A501 LEVEL: Error
    PID : 19475 TID : 4401452216864PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
    APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
    EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
    FUNCTION: DB2 UDB, buffer pool services, sqlbAlterPool, probe:95
    MESSAGE : ZRC=0x8002013A=-2147352262=SQLB_TBSPACE_TOO_SMALL
    "Cannot reduce tablespace size as requested"
    ...
    2011-06-10-20.11.00.529353+120 E37617A874 LEVEL: Error
    PID : 19475 TID : 4401452216864PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
    APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
    EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
    FUNCTION: DB2 UDB, buffer pool services, sqlbAlterPool, probe:135
    MESSAGE : ADM6083E An error occurred while redoing an alter tablespace
    operation
    against table space "TS_R_LEGAL_D_4K" (ID "5") This error
    will be temporarily ignored while the remainder of the transaction is
    replayed. If the alter operation is eventually rolled back then the
    error will be discarded. However, if the operation is committed then
    this error will be returned, stopping recovery against the table
    space
    .

    2011-06-10-20.11.00.569794+120 I38492A502 LEVEL: Error
    PID : 19475 TID : 4401452216864PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
    APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
    EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
    FUNCTION: DB2 UDB, buffer pool services, sqlbAlterPool, probe:135
    MESSAGE : ZRC=0x8002013A=-2147352262=SQLB_TBSPACE_TOO_SMALL
    "Cannot reduce tablespace size as requested"

    2011-06-10-20.11.00.834191+120 E38995A666 LEVEL: Error
    PID : 19475 TID : 4401452216864PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
    APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
    EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
    FUNCTION: DB2 UDB, buffer pool services, sqlbAlterPoolAct, probe:10
    MESSAGE : ADM6084E An attempt is being made to commit an alter operation
    against table space "TS_R_LEGAL_D_4K" (ID "5") but a previous error
    is preventing this from being done. Resolve the original error
    before attempting the recovery again.


    2011-06-10-20.11.00.835554+120 I41330A964 LEVEL: Severe
    PID : 19475 TID : 4401452216864PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
    APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
    EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
    FUNCTION: DB2 UDB, data management, sqldCriticalSectionEnd, probe:8661
    CALLED : DB2 UDB, data management, sqldmpnd
    RETCODE : ZRC=0x8002013A=-2147352262=SQLB_TBSPACE_TOO_SMALL
    "Cannot reduce tablespace size as requested"
    ...
    2011-06-10-20.11.00.854046+120 E42295A946 LEVEL: Critical
    PID : 19475 TID : 4401452216864PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
    APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
    EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
    FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
    MESSAGE : ADM14001C An unexpected and critical error has occurred:
    "DBMarkedBad".
    The instance may have been shutdown as a result.
    "Automatic" FODC (First Occurrence Data Capture) has been invoked and
    diagnostic information has been recorded in directory
    "/home/db2inst1/sqllib/db2dump/FODC_DBMarkedBad_2011-06-10-20.11.00.8
    36005/". Please look in this directory for detailed evidence about
    what happened and contact IBM support if necessary to diagnose the
    problem.

    2011-06-10-20.11.00.854676+120 E43242A442 LEVEL: Severe
    PID : 19475 TID : 4401452216864PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
    APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
    EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
    FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
    MESSAGE : ADM7518C "HIGHPEAK" marked bad.

    2011-06-10-20.11.01.359061+120 I50646A434 LEVEL: Warning
    PID : 19475 TID : 4401460605472PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
    APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
    EDUID : 186 EDUNAME: db2agent (HIGHPEAK) 0
    FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:9500
    MESSAGE : Stopping Replay Master on standby.

    My question remains:
    How can trying to reduce the size of a tablespace put a standby as "marked bad"?

    Reducing the size of a tablespace should not be a dangerous operation, right?
    2 more things:
    - The concerned database is quite active. (all other dbs in the cluster are much more static)
    - There are 14 databases in this TSA HADR cluster, isn't that a bit much?

    Thank you very much in advance for any comment or help;
    Piri

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    My guess would be that replaying the reorg operation(s) on the standby failed to bring the HWM low enough for the resize to work. I think it's obvious that the physical layout of data may not be the same on both databases.

  3. #3
    Join Date
    Mar 2011
    Posts
    9
    Yes, but why does the cluster stop replication because reducing the tablespace on the standby did not work?
    Isn't that a bit drastic?

    Anyway, I'm restoring the standby now.

    Thank you for your comment:
    Piri.

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Piri View Post
    Yes, but why does the cluster stop replication because reducing the tablespace on the standby did not work?
    Isn't that a bit drastic?
    What does it have to do with resizing a tablespace?

    A transaction failed to be replayed. Subsequent transactions' integrity cannot be ensured anymore. End of story.

  5. #5
    Join Date
    Apr 2006
    Location
    Belgium
    Posts
    2,514
    Provided Answers: 11

    resize

    it has todo with resize tablespace - this is the statement that can not be replayed. after this error message - just enter start hadr on db xx - this will succeed and replay will continue -
    so if replay cannot be done - db should be marked bad
    why start hadr continued replay without problem (no restore needed)
    best regards, Guy
    Best Regards, Guy Przytula
    Database Software Consultant
    Good DBAs are not formed in a week or a month. They are created little by little, day by day. Protracted and patient effort is needed to develop good DBAs.
    Spoon feeding : To treat (another) in a way that discourages independent thought or action, as by overindulgence.
    DB2 UDB LUW Certified V7-V8-V9-V9.7-V10.1-V10.5 DB Admin - Advanced DBA -Dprop..
    Information Server Datastage Certified
    http://www.infocura.be

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •