Results 1 to 9 of 9
  1. #1
    Join Date
    Dec 2013
    Posts
    4

    Unanswered: db2start is very slow after failover from node 1 to node 2

    Hi DB2 Gurus,

    We are having problems with our DB2 failover solution for a customer. Failover from node 1 to node 2 is very slow. Db2start command completes in about 20 minutes, whereas failing over from node 2 to node 1 only takes 5 minutes maximum. Our applications run on SUSE enterprise linux, while SLES HAE is our failover software. Db2 version is 9.1 FP12. Server configuration on both nodes are identical.

    The description of IY92336 is similar to our scenario.

    Any suggestions/advice would be highly appreciated.

    Thanks in advance!

  2. #2
    Join Date
    Jan 2003
    Posts
    4,292
    Provided Answers: 5
    DB2 V9.1 went out of service quite awhile ago. Maybe you should look at upgrading to a more recent version and see it that fixes your problem.

    Andy

  3. #3
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    Also, what was the difference in your workload when you failed over from one to the other and back? Were there a lot of inflight transactions the one way and none the other?

  4. #4
    Join Date
    Dec 2013
    Posts
    4
    Quote Originally Posted by dav1mo View Post
    Also, what was the difference in your workload when you failed over from one to the other and back? Were there a lot of inflight transactions the one way and none the other?

    there was basically no transactions running since the system hasn't gone live yet. we've performed a number of failover tests by rebooting each node, and each time node1 fails over to node2, we see the same behavior (20+ minutes before db2start completes on node2).

    few things i noticed during failover tests:
    -when moving from node1 to node2, db2mtrk -i -v -d command hangs upon issue of db2start. on the other hand, db2mtrk quickly shows the instance memory consumption upon instance startup in node1.
    -top command shows db2sysc as the top process during startup on node1, the opposite goes for node2. looking like the server's memory allocation for db2 is slow.
    -on node2, once the HA software triggers the db2start command, opening a new session and connecting to the database works. db2diag.log also says that the instance has been started. crash recovery also completes in a few seconds.
    -graceful (not coming from a failover) db2start on node2 finishes very quick
    -disabling/enabling STMM did not help either

    the servers is quite powerful, having lots of CPUs and large RAM. im starting to think that there could be a memory leak on node2's server.

  5. #5
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    Did you check your db2diag log? See any crash recovery's? Start time, end time will explain a lot.
    Somewhere between " too small" and " too large" lies the size that is just right.
    - Scott Hayes

  6. #6
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by dirk18 View Post
    we've performed a number of failover tests by rebooting each node, and each time node1 fails over to node2, we see the same behavior (20+ minutes before db2start completes on node2).
    Can you please explain in detail your environment configuration and what exactly you mean by "failover"? Since you mention "db2start on node2", I'm assuming it's not HADR we're talking about. If so, how is the database kept in sync between the nodes?
    ---
    "It does not work" is not a valid problem statement.

  7. #7
    Join Date
    Dec 2013
    Posts
    4
    Quote Originally Posted by n_i View Post
    Can you please explain in detail your environment configuration and what exactly you mean by "failover"? Since you mention "db2start on node2", I'm assuming it's not HADR we're talking about. If so, how is the database kept in sync between the nodes?
    It is an active-passive setup where we have 2 identical linux servers used by the HA Cluster. The Cluster handles the volume groups, file system mount points used by the application/DB2 (binary path, installation path, etc.), and IP addresses for the virtual hostname. Should the node where the application and DB2 goes down (we reboot the server as a test), the HA software will then move the volume groups to the other node, then mount the FS's, after which it will automatically start DB2, afterwards the application is then started. Failing over from node1 to node2 takes 20+ minutes before db2start completes, while it only takes 5mins the opposite. there is a moment of downtime in this setup, the idea is in case of disaster, the cluster should bring the system up on its own.

    note: db2 was installed on node2 natively

  8. #8
    Join Date
    Dec 2013
    Posts
    4
    Quote Originally Posted by dr_te_z View Post
    Did you check your db2diag log? See any crash recovery's? Start time, end time will explain a lot.
    yes, crash recovery is being performed as per db2diag.log but it completes very quick and does not explain the long db2start time.

  9. #9
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    Quote Originally Posted by dirk18 View Post
    yes, crash recovery is being performed as per db2diag.log but it completes very quick and does not explain the long db2start time.
    Okay. But you can see when crash recovery starts and ends. After the crash-recovery is finished, db2 must ready for action. Is that the case or do you have to wait another 19 minutes? Any other entries in db2diag in that 20 minutes timeframe? Could it be replication going on and it takes that long unitl the active log files arrive?

    I think you must focus on the O.S/cluster/storage area. If that is all okay there is NO WAY db2 takes 20 minutes to start. The only delay is the crash-recovery phase, and that is very quick you say.
    Somewhere between " too small" and " too large" lies the size that is just right.
    - Scott Hayes

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •