Page 1 of 2 12 LastLast
Results 1 to 15 of 19
  1. #1
    Join Date
    May 2012
    Posts
    22

    Unhappy Unanswered: Cannot setup DB2 HADR

    Hi all,

    I am trying to set up DB2 HADR on two databases. For that I prepared two virtual machines that are running on two different physical machines. They are running same OS (Redhat Linux 6) and DB2 9.7

    Now, everything seems to be working fine but the HADR doesn't start on Primary database when I issue the START HADR ON DATABASE ABC AS PRIMARY;

    It gives the following error:


    [B]SQL1768N Unable to start HADR. Reason code = "7".

    I have tried and modified the parameters as recommended after the given error but nothing seems to be working.

    Following are the parameters on both databases:

    HADR Configuration on Standby:

    HADR database role = STANDBY
    HADR local host name (HADR_LOCAL_HOST) = DB2_2host.DB2_2domain
    HADR local service name (HADR_LOCAL_SVC) = DB2_HADR_8
    HADR remote host name (HADR_REMOTE_HOST) = DB2host.DB2domain
    HADR remote service name (HADR_REMOTE_SVC) = DB2_HADR_7
    HADR instance name of remote server (HADR_REMOTE_INST) = db2inst1
    HADR timeout value (HADR_TIMEOUT) = 120
    HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
    HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 120


    HADR Configuration on Primary:

    HADR database role = STANDARD
    HADR local host name (HADR_LOCAL_HOST) = DB2host.DB2domain
    HADR local service name (HADR_LOCAL_SVC) = DB2_HADR_7
    HADR remote host name (HADR_REMOTE_HOST) = DB2_2host.DB2_2domain
    HADR remote service name (HADR_REMOTE_SVC) = DB2_HADR_8
    HADR instance name of remote server (HADR_REMOTE_INST) = db2inst1
    HADR timeout value (HADR_TIMEOUT) = 120
    HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC
    HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 120

  2. #2
    Join Date
    May 2012
    Posts
    22

    Cont

    Following is my db2diag file:





    2012-05-21-02.44.45.077659+300 E255250E481 LEVEL: Info
    PID : 2720 TID : 139935659058944PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : IB
    APPHDL : 0-129 APPID: *LOCAL.DB2.120520214446
    EDUID : 334 EDUNAME: db2agent (IB) 0
    FUNCTION: DB2 UDB, recovery manager, sqlpReplayMaster, probe:1740
    MESSAGE : ADM1603I DB2 is invoking the forward phase of the database
    rollforward recovery.

    2012-05-21-02.44.45.087565+300 I255732E555 LEVEL: Warning
    PID : 2720 TID : 139935659058944PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : IB
    APPHDL : 0-129 APPID: *LOCAL.DB2.120520214446
    EDUID : 334 EDUNAME: db2agent (IB) 0
    FUNCTION: DB2 UDB, recovery manager, sqlpForwardRecovery, probe:710
    DATA #1 : <preformatted>
    Invoking database rollforward forward recovery,
    lowtranlsn 0000000008328010 in log file number 6
    minbufflsn 0000000008328010 in log file number 6

    2012-05-21-02.44.45.078683+300 I256288E460 LEVEL: Severe
    PID : 2720 TID : 139935558395648PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000
    EDUID : 647 EDUNAME: db2hadrs (IB) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20280
    MESSAGE : Failed to connect to primary. rc:
    DATA #1 : Hexdump, 4 bytes
    0x00007F45493FD100 : 1900 0F81 ....

    2012-05-21-02.44.45.089927+300 I256749E396 LEVEL: Severe
    PID : 2720 TID : 139935558395648PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000
    EDUID : 647 EDUNAME: db2hadrs (IB) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20280
    RETCODE : ZRC=0x810F0019=-2129723367=SQLO_CONN_REFUSED "Connection refused"

    2012-05-21-02.44.45.121131+300 I257146E468 LEVEL: Warning
    PID : 2720 TID : 139935659058944PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : IB
    APPHDL : 0-129 APPID: *LOCAL.DB2.120520214446
    EDUID : 334 EDUNAME: db2agent (IB) 0
    FUNCTION: DB2 UDB, recovery manager, sqlprecm, probe:2000
    DATA #1 : <preformatted>
    Using parallel recovery with 3 agents 9 QSets 27 queues and 68 chunks

    2012-05-21-02.44.45.191011+300 E257615E434 LEVEL: Error
    PID : 2720 TID : 139933062784768PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000
    EDUID : 638 EDUNAME: db2logmgr (IB) 0
    FUNCTION: DB2 UDB, data protection services, sqlpSearchForLogArchiveOnDisk, probe:4000
    MESSAGE : ZRC=0x860F000A=-2045837302=SQLO_FNEX "File not found."
    DIA8411C A file "" could not be found.

    2012-05-21-02.44.45.191158+300 I258050E373 LEVEL: Info
    PID : 2720 TID : 139932689491712PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000
    EDUID : 641 EDUNAME: db2lfr (IB) 0
    FUNCTION: DB2 UDB, data protection services, sqlpgPostLogMgrToRetrieve, probe:1050
    DATA #1 : <preformatted>
    RTStatus is in state 16 at index 0.

    2012-05-21-02.44.45.191240+300 I258424E415 LEVEL: Warning
    PID : 2720 TID : 139935612921600PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : IB
    APPHDL : 0-129 APPID: *LOCAL.DB2.120520214446
    EDUID : 648 EDUNAME: db2shred (IB) 0
    FUNCTION: DB2 UDB, recovery manager, sqlpshrEdu, probe:18300
    MESSAGE : Maxing hdrLCUEndLsoRequested

    2012-05-21-02.44.45.305214+300 E258840E388 LEVEL: Event
    PID : 2720 TID : 139935558395648PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000
    EDUID : 647 EDUNAME: db2hadrs (IB) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000
    CHANGE : HADR state set to S-RemoteCatchupPending (was S-LocalCatchup)

    2012-05-21-02.48.11.004337+300 E259229E417 LEVEL: Warning
    PID : 2736 TID : 140541432276752PROC : db2acd 0
    INSTANCE: db2inst1 NODE : 000
    FUNCTION: DB2 UDB, Health Monitor, HealthIndicator::update, probe:500
    MESSAGE : ADM10502W Health indicator "HADR Operational status"
    ("db.hadr_op_status") is in state "Disconnected" on "database"
    "db2inst1.IB ".

    2012-05-21-02.54.46.161815+300 I259647E460 LEVEL: Severe
    PID : 2720 TID : 139935558395648PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000
    EDUID : 647 EDUNAME: db2hadrs (IB) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20280
    MESSAGE : Failed to connect to primary. rc:
    DATA #1 : Hexdump, 4 bytes
    0x00007F45493FD100 : 1900 0F81 ....

    2012-05-21-02.54.46.161972+300 I260108E396 LEVEL: Severe
    PID : 2720 TID : 139935558395648PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000
    EDUID : 647 EDUNAME: db2hadrs (IB) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduAcceptEvent, probe:20280
    RETCODE : ZRC=0x810F0019=-2129723367=SQLO_CONN_REFUSED "Connection refused"


    Please help me out. Will be really thankful.
    Last edited by Rabaail; 05-21-12 at 07:26.

  3. #3
    Join Date
    Apr 2006
    Location
    Belgium
    Posts
    2,514
    Provided Answers: 11
    have you started standby first ? (listener) look at message and reason code
    Best Regards, Guy Przytula
    Database Software Consultant
    Good DBAs are not formed in a week or a month. They are created little by little, day by day. Protracted and patient effort is needed to develop good DBAs.
    Spoon feeding : To treat (another) in a way that discourages independent thought or action, as by overindulgence.
    DB2 UDB LUW Certified V7-V8-V9-V9.7-V10.1-V10.5 DB Admin - Advanced DBA -Dprop..
    Information Server Datastage Certified
    http://www.infocura.be

  4. #4
    Join Date
    May 2012
    Posts
    22
    Yes. Standby has been started. Its primary that does not start.

  5. #5
    Join Date
    Apr 2006
    Location
    Belgium
    Posts
    2,514
    Provided Answers: 11
    if you don't indicate this we can not know...
    firewall ??
    hadr port ? not equal to db2 instance port....
    try db2pd -db xx -hadr on standby server to see if really started
    pw identical at both servers ?
    Best Regards, Guy Przytula
    Database Software Consultant
    Good DBAs are not formed in a week or a month. They are created little by little, day by day. Protracted and patient effort is needed to develop good DBAs.
    Spoon feeding : To treat (another) in a way that discourages independent thought or action, as by overindulgence.
    DB2 UDB LUW Certified V7-V8-V9-V9.7-V10.1-V10.5 DB Admin - Advanced DBA -Dprop..
    Information Server Datastage Certified
    http://www.infocura.be

  6. #6
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    How did you check connectivity from DB2_2host.DB2_2domain to DB2host.DB2domain on port DB2_HADR_7?

  7. #7
    Join Date
    Apr 2006
    Location
    Belgium
    Posts
    2,514
    Provided Answers: 11
    are the logs in tsm ?
    for the first startup on standby - I deactivate tsm log archive and change this to disk
    when in sync I change it back to tsm
    there seems to be cases that standby needs logfiles and want to retrieve them from tsm
    have you configured with db2adutl the authorization for this machine
    check the redbook about db2 tsm and hadr....
    Best Regards, Guy Przytula
    Database Software Consultant
    Good DBAs are not formed in a week or a month. They are created little by little, day by day. Protracted and patient effort is needed to develop good DBAs.
    Spoon feeding : To treat (another) in a way that discourages independent thought or action, as by overindulgence.
    DB2 UDB LUW Certified V7-V8-V9-V9.7-V10.1-V10.5 DB Admin - Advanced DBA -Dprop..
    Information Server Datastage Certified
    http://www.infocura.be

  8. #8
    Join Date
    May 2012
    Posts
    22
    1) Firewall is disabled.

    2) db2pd -db -hadr returns the following:

    Database Partition 0 -- Database IB -- Standby -- Up 0 days 01:32:38 -- Date 05/21/2012 04:17:22

    HADR Information:
    Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
    Standby RemoteCatchupPending Nearsync 0 1

    ConnectStatus ConnectTime Timeout
    Disconnected Mon May 21 02:44:44 2012 (1337550284) 120

    PeerWindowEnd PeerWindow
    Null (0) 120

    LocalHost LocalService
    DB2_2host.DB2_2domain DB2_HADR_8

    RemoteHost RemoteService RemoteInstance
    DB2host.DB2domain DB2_HADR_7 db2inst1

    PrimaryFile PrimaryPg PrimaryLSN
    S0000000.LOG 0 0x0000000008328010

    StandByFile StandByPg StandByLSN StandByRcvBufUsed
    S0000006.LOG 0 0x0000000008328010 0%

    3) Log archiving is done on disk

    @n_i What do you mean by checking connectivity on that port? The service and port is added to etc/services file and it gets replies from primary system while using ping command.

  9. #9
    Join Date
    May 2012
    Posts
    22
    I am not using TSM.

  10. #10
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Rabaail View Post
    What do you mean by checking connectivity on that port?
    In other words, you did not check connectivity on that port. The easiest way to do that is to use telnet <hostname> <port>.

    You have potentially five different firewalls between the two instances: standby guest OS, standby host OS, a dedicated network firewall, primary host OS, and primary guest OS. At least in one of those places the connection from standby to primary is being rejected, as indicated by the error message.

  11. #11
    Join Date
    May 2012
    Posts
    22
    Done all. Still no luck. Tried both IPs and Hostnames, firewalls are disabled on host and guest systems. I tried BY FORCE command on the primary and it worked, but its status is disconnected.


    HADR Information:
    Role State SyncMode HeartBeatsMissed LogGapRunAvg (bytes)
    Primary Disconnected Nearsync 0 0

    ConnectStatus ConnectTime Timeout
    Disconnected Wed May 23 01:03:25 2012 (1337717005) 180

    PeerWindowEnd PeerWindow
    Null (0) 120

    LocalHost LocalService
    DB2host.DB2domain DB2_HADR_7

    RemoteHost RemoteService RemoteInstance
    DB2_2host.DB2_2domain DB2_HADR_8 db2inst1

    PrimaryFile PrimaryPg PrimaryLSN
    S0000006.LOG 687 0x00000000085D7157

    StandByFile StandByPg StandByLSN
    S0000000.LOG 0 0x0000000000000000
    [db2inst1@DB2host ~]$

  12. #12
    Join Date
    Apr 2012
    Posts
    1,035
    Provided Answers: 18
    Did you try getting the HADR simulator to work first?

    The simulator needs no installation and since all the settings are on the command-line it is easy to verify end-to-end connectivity that way.

    Consider trying the simulator, and use IP4 addresses (just to eliminate any DNS trouble).

    IBM developerWorks: Wikis - IBM Database Wiki - HADR_sim

    It remains most likely that ports are blocked somewhere along the path.

    Remember: you can make a test where primary and standby are on the same physical machine - just to show



    You posted part of db2diag.log from principal-standby (posted May21 @11:16). I see errors relating to the archive-path. Are you using a shared-archive device? What is it? Also the ECONNREFUSED probably results from the fact that you start the standby before the primary.

    Please post relevant bits of the db2diag.log from the PRIMARY that get logged when you try starting hadr on primary (temporarily set diaglevel 4 first on primary, revert back to 3 when test is complete)

  13. #13
    Join Date
    May 2012
    Posts
    22
    Alright. I just checked this whole thing with simulator.

    Here's the output at standby server:

    ./simhadr.Linux -role standby -lhost DB2host.DB2domain -lport 60000 -rhost DB2_2host.DB2_2domain -rport 60000
    + simhadr -role standby -lhost DB2host.DB2domain -lport 60000 -rhost DB2_2host.DB2_2domain -rport 60000

    Measured sleep overhead: 0.000814 second, using spin time 0.000976 second.

    Resolving local host DB2host.DB2domain via gethostbyname()
    hostname=DB2host.DB2domain
    address_type=2 address_length=4
    address: 192.168.7.132

    Resolving remote host DB2_2host.DB2_2domain via gethostbyname()
    hostname=DB2_2host.DB2_2domain
    alias: DB2_2host
    alias: DB2_2host
    alias: localhost6.localdomain6
    alias: localhost6
    address_type=2 address_length=4
    address: 192.168.7.133
    address: 127.0.0.1

    Socket property upon creation
    BlockingIO=true
    NAGLE=true
    SO_SNDBUF=16384
    SO_RCVBUF=87380
    SO_LINGER: onoff=0, length=0

    Connecting to remote host TCP port 60000

    Connected.

    Calling fcntl(O_NONBLOCK)
    Calling setsockopt(TCP_NODELAY)
    Socket property upon connection
    BlockingIO=false
    NAGLE=false
    SO_SNDBUF=50748
    SO_RCVBUF=87584
    SO_LINGER: onoff=0, length=0

    Here's the output at Primary server:

    [db2inst1@DB2host ~]$ ./simhadr.Linux -role primary -lhost DB2host.DB2domain -lp ort 60000 -rhost DB2_2host.DB2_2domain -rport 60000 -syncmode nearsync
    + simhadr -role primary -lhost DB2host.DB2domain -lport 60000 -rhost DB2_2host.D B2_2domain -rport 60000 -syncmode nearsync

    Measured sleep overhead: 0.011476 second, using spin time 0.013771 second.
    flushSize = 32 pages
    Simulation run time = 4 seconds

    Resolving local host DB2host.DB2domain via gethostbyname()
    hostname=DB2host.DB2domain
    alias: DB2host
    alias: DB2host
    alias: localhost6.localdomain6
    alias: localhost6
    address_type=2 address_length=4
    address: 192.168.7.132
    address: 127.0.0.1

    Resolving remote host DB2_2host.DB2_2domain via gethostbyname()
    hostname=DB2_2host.DB2_2domain
    address_type=2 address_length=4
    address: 192.168.7.133

    Socket property upon creation
    BlockingIO=true
    NAGLE=true
    SO_SNDBUF=16384
    SO_RCVBUF=87380
    SO_LINGER: onoff=0, length=0

    Binding socket to local address.
    bind() failed on local address. errno=98, Address already in use


    After running simulator, I tried to start primary again. db2diag gives the following:


    2012-05-23-03.36.04.408868+300 I4012364E573 LEVEL: Severe
    PID : 1926 TID : 140481526753024PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : IB
    APPHDL : 0-2523 APPID: *LOCAL.db2inst1.120522223303
    AUTHID : DB2INST1
    EDUID : 113 EDUNAME: db2agent (IB) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduStartup, probe:21300
    MESSAGE : Error: HADR EDU did not start up. HADR role:
    DATA #1 : Hexdump, 4 bytes
    0x00007FC4677F7088 : 0100 0000 ....

    2012-05-23-03.36.04.409116+300 I4012938E541 LEVEL: Error
    PID : 1926 TID : 140481526753024PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : IB
    APPHDL : 0-2523 APPID: *LOCAL.db2inst1.120522223303
    AUTHID : DB2INST1
    EDUID : 113 EDUNAME: db2agent (IB) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduStartup, probe:21300
    MESSAGE : HADR EDU zrc:
    DATA #1 : Hexdump, 4 bytes
    0x00007FC4677F7080 : 1A00 8082 ....

    2012-05-23-03.36.04.409219+300 I4013480E545 LEVEL: Error
    PID : 1926 TID : 140481526753024PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : IB
    APPHDL : 0-2523 APPID: *LOCAL.db2inst1.120522223303
    AUTHID : DB2INST1
    EDUID : 113 EDUNAME: db2agent (IB) 0
    FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrEduStartup, probe:21300
    MESSAGE : HADR EDU sqlcode:
    DATA #1 : Hexdump, 4 bytes
    0x00000002011B5B84 : 18F9 FFFF ....

    2012-05-23-03.36.04.409306+300 I4014026E513 LEVEL: Severe
    PID : 1926 TID : 140481526753024PROC : db2sysc 0
    INSTANCE: db2inst1 NODE : 000 DB : IB
    APPHDL : 0-2523 APPID: *LOCAL.db2inst1.120522223303
    AUTHID : DB2INST1
    EDUID : 113 EDUNAME: db2agent (IB) 0
    FUNCTION: DB2 UDB, base sys utilities, sqeApplication::AppStartUsing, probe:17
    DATA #1 : Hexdump, 4 bytes
    0x00000002011B5B84 : 18F9 FFFF

  14. #14
    Join Date
    Apr 2012
    Posts
    1,035
    Provided Answers: 18
    Per my previous advice "...and use IP4 addresses (just to eliminate any DNS trouble)."

    Your lhost and rhost values on command-line s.b mirrored on either side, but your post shows them the same...

  15. #15
    Join Date
    May 2012
    Posts
    22
    Just confirmed from our network guy that we're already using IPv4. Still no luck Will keep trying though. I don't understand what this socket warning is about.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •