Results 1 to 9 of 9

Thread: HADR problems

  1. #1
    Join Date
    Dec 2002
    Location
    Pakistan
    Posts
    31

    Cool Unanswered: HADR problems

    Hello all

    We are implementing HADR followed as recommended by INFROMIX :

    1:- Take the 0 level backup of primary instance
    2:- Execute on primary instance this command “ommode –d primary sec_ser_name” while secondary server is down.
    3:- Restore the backup on secondary server using ontape –p command
    4:- After completion of restoring execute this command “ommode –d secondary prim_ser_name”

    Replication should be started but practically there are very less chances that replication starts just performing these steps.

    If one fails then it is must to restart from step 1?
    Or one can re-start the process from step3?
    Is it necessary one take every time 0 level backup if answer is then why it is require?

    Is there any alternative?

    I found an alternative but it some time work some time not i.e. the following

    If HADR not established after applying all steps then
    1 down the primary server
    2 down the secondary server
    3 bring up the secondary server
    4 bring up the primary server

    Note:- Do not bring up the primary server until secondary server writes in log files following lines

    DR: Cannot connect to primary server
    DR: Turned off on secondary server

    In this way primary can connect to the secondary server and replication would be started. Once I established HADR this way.

    Now I'm trying again I’m facing problems while I’m bringing up secondary server it writes following lines in log file:

    02:33:59 Event alarms enabled. ALARMPROG = '/u/informix/etc/log_full.sh'
    02:34:04 DR: DRAUTO is 0 (Off)
    02:34:04 Requested shared memory segment size rounded from 1104KB to 4096KB
    02:34:04 Informix Dynamic Server Version 7.31.UC5 Software Serial Number AAC#J909566
    02:34:04 Assert Failed: chunk failed sanity check

    02:34:04 Informix Dynamic Server Version 7.31.UC5
    02:34:04 Who: Session(1, root@posaetp1, 0, 285728788)
    Thread(10, main_loop(), 11056014, 3)
    File: rspartn.c Line: 7355
    02:34:04 Results: Chunk 4 is being taken OFFLINE.
    02:34:04 Action: Restore chunk from archive. If this is a temporary dbspace
    chunk, drop and add the dbspace to enable it.
    02:34:04 See Also: /tmp/af.3f23bec
    02:34:05 I/O error, Primary Chunk '/u/informix/dbs/dbs_temp' -- Offline (sanity)
    02:34:05 Informix Dynamic Server Initialized -- Shared Memory Initialized.
    02:34:05 DR: Logs will be cleared during logical recovery
    02:34:05 DR: Trying to connect to primary server ...
    02:36:05 DR: Cannot connect to primary server
    02:36:05 DR: Turned off on secondary server
    02:36:06 Physical Recovery Started.
    02:36:06 Physical Recovery Complete: 0 Pages Restored.
    02:36:06 Dataskip is now OFF for all dbspaces
    02:36:06 Recovery Mode
    02:36:42 DR: Secondary server connected
    02:36:48 DR: Failure recovery from disk in progress ...
    02:36:49 DR: Failure during start of logical restore (1)
    02:36:49 DR: Failure recovery error (6)
    02:36:50 DR: Turned off on secondary server

    while primary server writes following lines in its log files

    02:52:11 DR: Sending log 132 (current), size 40000 pages, 0.00 percent used
    02:52:11 DR: Receive error
    02:52:11 DR: Failure recovery error (14)
    02:52:13 DR: Turned off on primary server

    any comments, suggestion, recmendation are welcome

    Look forward to hear soon

    Best regards
    Wicky

  2. #2
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    Hi,

    You had an assert failure when restoring to the secondary server.
    Did it come online, I mean to fast recovery? what does 'onstat -' say when you execute it on the secondary server?
    It was a problem with the temporary dbspace.

    check with onstat -d whether the temporary dbspace (chunk 4) is not marked down on the primary server.

    'onstat -g dri' will show you the status of the HDR pair.

    What version of IDS do you use?
    Did you reserve the space on your secondary machine?

    Is it a test instance? If so are you able to do the restore on your primary machine?

    Normally you don't have to go all the way back to step 1. But you wil have to make sure that your level 0 backup is correct!
    rws

  3. #3
    Join Date
    Dec 2002
    Location
    Pakistan
    Posts
    31
    Hi

    Thanx.

    You are right, chunk issues have been resolved but replication still so far not started.
    I have two machines on which IDS 7.31 UC5 for SCO UNIX are running.

    I basically want to implement HADR on five instances which are running on primary machine. So far I have successfully implemented HADR on three instances. HADR stopped on last instance (third one) after few days. I had facing problem since start implementation of HADR on more instances.

    Is a HADR support multiple instances?

    I was unable to implement HADR on second instance as informix recommended, when I was unable to start HADR after applying four steps recommended by informix. I just down the secondary server followed by primary server. Then bring up the secondary first and do not bring up the primary server until secondary server writes in log that “DR Turned off on secondary server”. Then I brought up the primary server and HADR process started.

    Is it the right approach?


    Any comments, suggestion, recommendations are warm welcome

    Look forward to hear soon

    Best regards
    Wicky

  4. #4
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    I don't know about the support of multiple HDR instances on one machine; I never tested it. You might find a solution in the admin guide.
    Did you try it with the latest version of 7.31?
    Maybe a test with IDS 9.3 or in a few months 9.4 where lots of HDR enhancements are implemented.

    The right way indeed in that case is to first start the secondary, wait for the log msg and than start the primary. HDR must have been configured and run before doing this.

    I would advise you to at least test it with a 7.31.UDx version. 9.x already has lots of new features for HDR, but 9.4 will be a bomb!
    rws

  5. #5
    Join Date
    Dec 2002
    Location
    Pakistan
    Posts
    31
    hi

    So far, i had made a lot of tries but I'm able to run HADR on two instances out of five.

    Whenver I try to implement HADR on other instance following lines are written on log files of servers

    on secondary server
    02:36:42 DR: Secondary server connected
    02:36:48 DR: Failure recovery from disk in progress ...
    02:36:49 DR: Failure during start of logical restore (1)
    02:36:49 DR: Failure recovery error (6)
    02:36:50 DR: Turned off on secondary server

    while primary server writes following lines in its log files

    02:52:11 DR: Sending log 132 (current), size 40000 pages, 0.00 percent used
    02:52:11 DR: Receive error
    02:52:11 DR: Failure recovery error (14)
    02:52:13 DR: Turned off on primary server

    Any comments , suggestions..........

    Is this is a possiblity that, may HADR it is not allowing to run more than 2 instances and to do so need to increase some parameter limit of UNIX/Informix.

    If yes can you guid me which parameter I need to more fine tune?

    a lot of thanx

    wicky

  6. #6
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    HDR WILL NOT RESYNC IF ONTAPE IS NOT PRESENT ON THE SECONDARY RECEIVE THE FOLLOWING ERROR: FAILURE RECOVERY ERROR (6)

    Did you set your INFORMIXDIR correctly?
    If so maybe the OS locked the ontape file?

    Do you use ontape or onbar to setup HDR?
    Check your temp spaces too. There might be a problem with them.
    rws

  7. #7
    Join Date
    Dec 2002
    Location
    Pakistan
    Posts
    31
    Answers as follows

    Did you set your INFORMIXDIR correctly?
    yes

    Do you use ontape or onbar to setup HDR?
    ontape

    If so maybe the OS locked the ontape file?
    why OS do so?
    how i check it?
    is there any work around?

    Check your temp spaces too. There might be a problem with them.
    what type of problem i should check on primary and secondary server?

  8. #8
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    You could always try to install informix in another directory.
    Point INFORMIXDIR to that directory.
    Don't forget to set INFORMIXSQLHOSTS in that case.
    rws

  9. #9
    Join Date
    Dec 2002
    Location
    Pakistan
    Posts
    31
    I have installed informix on both servers (primary and secondary) on following location

    /u/informix

    Can you provide me more information on locking of ontape files by OS?

    Because the informix errors I found (1, 6, 14) have the description like that

    Error Description
    1 Not owner. (secondary site)
    6 No such device or address. (secondary site)
    14 Bad address. (Primary site)

    These errors only occurred when after executing four steps provided by informix HADR not established then I down both server and bring up one by one (Secondary first).

    Kindly guide me in this regard i'm stuck for many days. I'm looking for alternative but I'm not sure that it will work or not. May i again stuck as i implemented ER on more instances.


    wicky

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •