Results 1 to 11 of 11

Thread: HADR Congestion

  1. #1
    Join Date
    Jul 2004
    Posts
    306

    Unanswered: HADR Congestion

    8.2 & 9.5 on AIX

    Hey All,

    I've been running ASYNC HADR now for almost a year and I've been gettin some slow down where the Primary DBs are getting congestion messages,

    The only remedy I have at the moment is to turn off HADR for a bit then let it resync later, which is not ideal. It also shows as congested (from a db2pd -hadr) when it's in remote catchup, which strikes me as weird!

    So I'm wondering 3 things...

    Is the congested message during catchup real? It's getting the logs from TSM so there is no reliance on the primary

    How can I determine the root cause of the congestion? It looks like memory and CPU are ok, and the network guys tell me the link is not maxed out....

    So are there any strategies to tackle this?
    I've increased the HADR buffer on the standby, but that just buys time... I've tried NEARSYNC but that had too much of an overhead....

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by meehange
    It's getting the logs from TSM ...
    How can I determine the root cause of the congestion?
    Hmmm. Maybe the problem is...
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Jul 2004
    Posts
    306
    heheeh no, it happens with the live xfer of the log portions AS WELL AS when it's getting them from TSM during catchup

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    For high volume transaction systems running HADR, I asked that they set up a private network between the two servers just to handle the HADR traffic.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  5. #5
    Join Date
    Jul 2004
    Posts
    306
    Quote Originally Posted by Marcus_A
    For high volume transaction systems running HADR, I asked that they set up a private network between the two servers just to handle the HADR traffic.

    Well we were using a 10Mbit wireless connection to link the 2 sites (HADR for 16 systems of varying size/workload) and during problem determination we've switched to a 10Mbit fiber link.... still seeing similar amount of congestion... The network guys tell me that the link never reaches the full 10Mbits but I get the feeling that 10Mbit just isn't quite enough bandwidth for what we're doing... but all the options are expensive whether it be buy more CPU or RAM at the standby site (it's about 50% the spec of the Primary machine) or up the bandwidth... which is why I need to be able to figure out what the bottleneck is.

    I mean it's be pretty bad to spend $50k on that stuff only to find out later it was because I'd poorly configured something

  6. #6
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    I would run some tests with and without HADR turned on to see what your application throughput is, and whether it is actually affected by the claimed "congestion". It is possible that there is nothing to worry about. I would also open an problem with IBM for a better explanation as to what the message means.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  7. #7
    Join Date
    Jul 2004
    Posts
    306
    Quote Originally Posted by Marcus_A
    I would run some tests with and without HADR turned on to see what your application throughput is, and whether it is actually affected by the claimed "congestion". It is possible that there is nothing to worry about. I would also open an problem with IBM for a better explanation as to what the message means.
    There's definitely an effect on throughput (at least when HADR is peered) as we can see commit hangs on the primary which immediately disappear when HADR is turned off.

    I'll try my hand at the PMR but previous HADR related PMR's I've raised don't seem to get much expert support, I reckon it's an area of DB2 where there just isn't that much practical experience...

  8. #8
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by meehange
    There's definitely an effect on throughput (at least when HADR is peered) as we can see commit hangs on the primary which immediately disappear when HADR is turned off.

    I'll try my hand at the PMR but previous HADR related PMR's I've raised don't seem to get much expert support, I reckon it's an area of DB2 where there just isn't that much practical experience...
    I don't agree with your comments about HADR and support, but you have to be persistent. If you just have a question, then that is not what they are really there for, but if there is a problem then they should help you. Anyway, there have been some APARs relating to congestion. But you should also have someone check out your network.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  9. #9
    Join Date
    Jul 2004
    Posts
    306
    Quote Originally Posted by Marcus_A
    I don't agree with your comments about HADR and support, but you have to be persistent. If you just have a question, then that is not what they are really there for, but if there is a problem then they should help you. Anyway, there have been some APARs relating to congestion. But you should also have someone check out your network.
    I should probably point out that I'm in Asia-Pacific, so I think perhaps the support team here isn't as large or as battle-hardened as their US colleagues

    I am having the externals checked, like network etc. but I'm trying to make sure I've done all that I can to make sure the problem isn't in my own yard....

  10. #10
    Join Date
    Nov 2008
    Posts
    1
    You may want to check out the HADR wiki area on developerWorks, and in particular this page: http://www.ibm.com/developerworks/wi...data/HADR_tune

    It helps explain "congestion" (there are several potential causes) and gives some suggestions on how to make sure your comms setup is well tuned for HADR use.

    Regards,
    - Steve P.
    --
    Steve Pearson, DB2 for Linux, UNIX, and Windows, IBM Software Group
    "Portland" Development Team, IBM Beaverton Lab, Beaverton, OR, USA

  11. #11
    Join Date
    Jul 2004
    Posts
    306
    That's ace Steve... thanks!

    Quote Originally Posted by stevep222
    You may want to check out the HADR wiki area on developerWorks, and in particular this page: http://www.ibm.com/developerworks/wi...data/HADR_tune

    It helps explain "congestion" (there are several potential causes) and gives some suggestions on how to make sure your comms setup is well tuned for HADR use.

    Regards,
    - Steve P.
    --
    Steve Pearson, DB2 for Linux, UNIX, and Windows, IBM Software Group
    "Portland" Development Team, IBM Beaverton Lab, Beaverton, OR, USA

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •