If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > TSA HADR standby marked bad after trying to reduce a tablespace size on primary ????

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-10-11, 16:43
Piri Piri is offline
Registered User
 
Join Date: Mar 2011
Posts: 6
TSA HADR standby marked bad after trying to reduce a tablespace size on primary ????

Hello dear colleagues;

I'm a bit puzzled:
Today I reduced the size of a tablespace on a primary db in a DB2 TSA HADR cluster,
and this caused the standby db to become "marked bad" and stopped the Master replication.
How can this happen?

What I did extactly is this: (all done on the primary of course)
1. I first dropped some tables which where no longer needed. (all in the same tablespace)
2. Then I did a reorg table on all remaining tables in the tablespace to get the high watermark down.
3 Then I did a alter tablespace reduce... a couple of times. 3 times it worked, 2 times it didn't because there was not enough space.

-> In the end the standby got "marked bad".
May main question is: How can reducing the tablespace size kill a standby database?

Info:database = highpeak
servers:
- primary = alx00005
- standby = alx00006
- TSA HADR cluster = RedHat linux servers, NO shared storage system.

Below some more info via lssam, db2pd, and the db2diag.log of the standby:

1. lssam output:
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_HIGHPEAK-rg Request=Lock Nominal=Online
'- Online IBM.Application:db2_db2inst1_db2inst1_HIGHPEAK-rs Control=SuspendedPropagated
|- Online IBM.Application:db2_db2inst1_db2inst1_HIGHPEAK-rs:alx00005
'- Offline IBM.Application:db2_db2inst1_db2inst1_HIGHPEAK-rs:alx00006

2. db2pd output of the primary db:
db2pd -hadr -d highpeak

Database Partition 0 -- Database HIGHPEAK -- Active -- Up 18 days 03:00:48 -- Date 06/10/2011 20:44:43

HADR Information:
Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
Primary Disconnected Nearsync 0 7674895

ConnectStatus ConnectTime Timeout
Disconnected Fri Jun 10 20:11:03 2011 (1307729463) 30

PeerWindowEnd PeerWindow
Fri Jun 10 20:14:23 2011 (1307729663) 200

LocalHost LocalService
10.104.116.25 db2_hadr_highpeak

RemoteHost RemoteService RemoteInstance
10.104.116.26 db2_hadr_highpeak db2inst1

PrimaryFile PrimaryPg PrimaryLSN
S0007324.LOG 4987 0x0000002AA933314F

StandByFile StandByPg StandByLSN
S0007324.LOG 2958 0x0000002AA8B46B73

3. Resume of the db2diag.log of the standby:2011-06-10-20.11.00.505903+120 I37115A501 LEVEL: Error
PID : 19475 TID : 4401452216864PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbAlterPool, probe:95
MESSAGE : ZRC=0x8002013A=-2147352262=SQLB_TBSPACE_TOO_SMALL
"Cannot reduce tablespace size as requested"
...
2011-06-10-20.11.00.529353+120 E37617A874 LEVEL: Error
PID : 19475 TID : 4401452216864PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbAlterPool, probe:135
MESSAGE : ADM6083E An error occurred while redoing an alter tablespace
operation
against table space "TS_R_LEGAL_D_4K" (ID "5") This error
will be temporarily ignored while the remainder of the transaction is
replayed. If the alter operation is eventually rolled back then the
error will be discarded. However, if the operation is committed then
this error will be returned, stopping recovery against the table
space
.

2011-06-10-20.11.00.569794+120 I38492A502 LEVEL: Error
PID : 19475 TID : 4401452216864PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbAlterPool, probe:135
MESSAGE : ZRC=0x8002013A=-2147352262=SQLB_TBSPACE_TOO_SMALL
"Cannot reduce tablespace size as requested"

2011-06-10-20.11.00.834191+120 E38995A666 LEVEL: Error
PID : 19475 TID : 4401452216864PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
FUNCTION: DB2 UDB, buffer pool services, sqlbAlterPoolAct, probe:10
MESSAGE : ADM6084E An attempt is being made to commit an alter operation
against table space "TS_R_LEGAL_D_4K" (ID "5") but a previous error
is preventing this from being done. Resolve the original error
before attempting the recovery again.


2011-06-10-20.11.00.835554+120 I41330A964 LEVEL: Severe
PID : 19475 TID : 4401452216864PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
FUNCTION: DB2 UDB, data management, sqldCriticalSectionEnd, probe:8661
CALLED : DB2 UDB, data management, sqldmpnd
RETCODE : ZRC=0x8002013A=-2147352262=SQLB_TBSPACE_TOO_SMALL
"Cannot reduce tablespace size as requested"
...
2011-06-10-20.11.00.854046+120 E42295A946 LEVEL: Critical
PID : 19475 TID : 4401452216864PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM14001C An unexpected and critical error has occurred:
"DBMarkedBad".
The instance may have been shutdown as a result.
"Automatic" FODC (First Occurrence Data Capture) has been invoked and
diagnostic information has been recorded in directory
"/home/db2inst1/sqllib/db2dump/FODC_DBMarkedBad_2011-06-10-20.11.00.8
36005/". Please look in this directory for detailed evidence about
what happened and contact IBM support if necessary to diagnose the
problem.

2011-06-10-20.11.00.854676+120 E43242A442 LEVEL: Severe
PID : 19475 TID : 4401452216864PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
EDUID : 188 EDUNAME: db2redom (HIGHPEAK) 0
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::MarkDBBad, probe:10
MESSAGE : ADM7518C "HIGHPEAK" marked bad.

2011-06-10-20.11.01.359061+120 I50646A434 LEVEL: Warning
PID : 19475 TID : 4401460605472PROC : db2sysc 0
INSTANCE: db2inst1 NODE : 000 DB : HIGHPEAK
APPHDL : 0-34 APPID: *LOCAL.DB2.110523165654
EDUID : 186 EDUNAME: db2agent (HIGHPEAK) 0
FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:9500
MESSAGE : Stopping Replay Master on standby.

My question remains:
How can trying to reduce the size of a tablespace put a standby as "marked bad"?

Reducing the size of a tablespace should not be a dangerous operation, right?
2 more things:
- The concerned database is quite active. (all other dbs in the cluster are much more static)
- There are 14 databases in this TSA HADR cluster, isn't that a bit much?

Thank you very much in advance for any comment or help;
Piri
Reply With Quote
  #2 (permalink)  
Old 06-10-11, 17:33
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
My guess would be that replaying the reorg operation(s) on the standby failed to bring the HWM low enough for the resize to work. I think it's obvious that the physical layout of data may not be the same on both databases.
Reply With Quote
  #3 (permalink)  
Old 06-11-11, 10:34
Piri Piri is offline
Registered User
 
Join Date: Mar 2011
Posts: 6
Yes, but why does the cluster stop replication because reducing the tablespace on the standby did not work?
Isn't that a bit drastic?

Anyway, I'm restoring the standby now.

Thank you for your comment:
Piri.
Reply With Quote
  #4 (permalink)  
Old 06-11-11, 12:41
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by Piri View Post
Yes, but why does the cluster stop replication because reducing the tablespace on the standby did not work?
Isn't that a bit drastic?
What does it have to do with resizing a tablespace?

A transaction failed to be replayed. Subsequent transactions' integrity cannot be ensured anymore. End of story.
Reply With Quote
  #5 (permalink)  
Old 07-11-11, 02:07
przytula_guy przytula_guy is offline
Registered User
 
Join Date: Apr 2006
Location: Belgium
Posts: 1,159
resize

it has todo with resize tablespace - this is the statement that can not be replayed. after this error message - just enter start hadr on db xx - this will succeed and replay will continue -
so if replay cannot be done - db should be marked bad
why start hadr continued replay without problem (no restore needed)
best regards, Guy
__________________
Best Regards, Guy Przytula
Database Software Consultant
DB2 UDB LUW Certified V7-V8-V9-V9.7 DB Admin - Dprop..
Information Server Datastage Certified
http://www.infocura.be
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On