If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > DB2 hadr on standby hangs after disconnect from primary

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-19-09, 12:16
KarenKoekemoer KarenKoekemoer is offline
Registered User
 
Join Date: Mar 2009
Posts: 3
DB2 hadr on standby hangs after disconnect from primary

Hi,
We are running db2 v8.2 fixpack 14 on linux sles 10. Hadr is configured as sync with a hadr_timout value of 60 seconds. During high activity times, the primary db loses connection to the standby db (not sure why), blocking update transactions for 60 seconds. The only way to re-establish the hadr connection is to go to the standby server and issue a db2_kill because the deactivate of the standby db "hangs". A db2start and activate of the standby database then results in the connection being re-established and all is well until the next time this happens. Has anyone else experienced something similar? Is it correct that db2 on primary does not automatically re-establish the connection and why does the standby database seem to be in this "hanging" state when trying to deactivate?
Reply With Quote
  #2 (permalink)  
Old 03-19-09, 17:44
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
Very few people run in synch mode, because near synch has basically the same level of redundancy, with less overhead.

I would check the db2diag.log on both machines.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #3 (permalink)  
Old 03-19-09, 18:43
KarenKoekemoer KarenKoekemoer is offline
Registered User
 
Join Date: Mar 2009
Posts: 3
Thanks Marcus, unfortunately the db2diag.log files are not very helpful.

Primary log file after 60 second timeout
2009-03-19-12.42.57.525017+120 I2619992E513 LEVEL: Error
PID : 4052 TID : 47398647731840PROC : db2hadrp (BPRP) 0
INSTANCE: db2inst1 NODE : 000 DB : BPRP
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20200
MESSAGE : Did not receive anything through HADR connection for the duration of
HADR_TIMEOUT. Closing connection.
DATA #1 : Hexdump, 4 bytes
0x00007FFFD38B887C : 3D00 0000 =...

2009-03-19-12.42.57.525227+120 I2620506E338 LEVEL: Severe
PID : 4052 TID : 47398647731840PROC : db2hadrp (BPRP) 0
INSTANCE: db2inst1 NODE : 000 DB : BPRP
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20200
RETCODE : ZRC=0x00000000=0=PSM_OK "Unknown"

2009-03-19-12.42.57.525308+120 E2620845E354 LEVEL: Event
PID : 4052 TID : 47398647731840PROC : db2hadrp (BPRP) 0
INSTANCE: db2inst1 NODE : 000 DB : BPRP
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000
CHANGE : HADR state set to P-RemoteCatchupPending (was P-Peer)


Status in the snapshot shows : Disconnected

Nothing appears in the standby db2diag.log until the database is deactivated.

Secondary log file after deactivate of database
2009-03-19-13.13.04.353260+120 I19685252E394 LEVEL: Warning
PID : 19375 TID : 47600804660864PROC : db2agent (BPRP) 0
INSTANCE: db2insta NODE : 000 DB : BPRP
APPHDL : 0-1632 APPID: *LOCAL.db2insta.090319111304
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduStartup, probe:21151
MESSAGE : Info: HADR Startup has begun.

2009-03-19-13.13.04.360058+120 I19685647E413 LEVEL: Error
PID : 2123 TID : 47600804660864PROC : db2redom (BPRP) 0
INSTANCE: db2insta NODE : 000 DB : BPRP
APPHDL : 0-1653
FUNCTION: DB2 UDB, recovery manager, sqlpshrScanNext, probe:1450
RETCODE : ZRC=0x80100003=-2146435069=SQLP_LINT "Interrupt from application"
DIA8003C The interrupt has been received.

2009-03-19-13.13.04.360729+120 I19686061E413 LEVEL: Error
PID : 2123 TID : 47600804660864PROC : db2redom (BPRP) 0
INSTANCE: db2insta NODE : 000 DB : BPRP
APPHDL : 0-1653
FUNCTION: DB2 UDB, recovery manager, sqlpPRecReadLog, probe:1275
RETCODE : ZRC=0x80100003=-2146435069=SQLP_LINT "Interrupt from application"
DIA8003C The interrupt has been received.

2009-03-19-13.13.06.272529+120 I19686475E413 LEVEL: Error
PID : 2123 TID : 47600804660864PROC : db2redom (BPRP) 0
INSTANCE: db2insta NODE : 000 DB : BPRP
APPHDL : 0-1653
FUNCTION: DB2 UDB, recovery manager, sqlpPRecReadLog, probe:1280
RETCODE : ZRC=0x80100003=-2146435069=SQLP_LINT "Interrupt from application"
DIA8003C The interrupt has been received.

Nothing else get written to the log file until I do the db2_kill. I have in the past waited longer than 10 minutes for my deactivate command to respond.

I will try setting the mode to near-sync before our change freeze for month-end and see what happens. During this period the hadr heartbeat breaks almost everyday on the busy database. The less busy databases are not affected.
Reply With Quote
  #4 (permalink)  
Old 03-20-09, 02:04
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
You probably need to do some network tests between your primary and standby. If the machines are located too far apart (different buildings or cities) or go through too many network hops, you may need HADR asynchronous mode. Or perhaps there is just a problem with your particular network that could be fixed. Doing some file transfers tests (large and small files) to determine the speed may suffice, but I would consult with your network staff also.

For high volume HADR applications I use a private Ethernet connection between the HADR primary and standby servers. This requires an extra NIC on each machine (we actually use a pair of bonded NICs on each machine for redundancy) that are hooked together without any other network connections (if they are close enough together you can simply use a crossover cable without any switch, router, hub, etc). That way the HADR log traffic between primary and standby has its own private network and cannot be slowed down by any other traffic on the network.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On