Page 1 of 2 12 LastLast
Results 1 to 15 of 23
  1. #1
    Join Date
    May 2008
    Location
    Cambridge
    Posts
    26

    Unanswered: Replication slow to a certain server

    Hi folks,

    I'm wondering if anyone can provide any insight.

    We've got an ordinary MSA replication setup, one repserver replicating to two replicates.

    What we've been seeing for some time now is that replication to one of the two replicates is much much slower than to the other. Ironically, the one that's always behind is more powerful than the other. Now, we do run reports on this box, but even when there are no reports running, if data builds up in the stable queues, the offending server still seems to process it more slowly.

    The configs of the replicate servers, both within Sybase and at the Unix level is very similar except that the slow server is more powerful with more memory and has a slightly 'increased' config to suit.

    I've run a trace on the repserver_maint user process and it appears to be processing sql fairly quickly, usually with around 2-3 seconds between command executions.

    Thinking about it, I should do the same on the other replicate and see how that one is performing. I'll have to wait for the opportunity again where we're seeing a back log in the queues.

    Otherwise, if anyone has any other ideas or suggestions then I'd love to hear about them.

    Repserver box is a Sun-Fire V440, 4GB RAM, dual 1062MHz US-IIIi's (Sol 8) (Repserver 12.6)
    Good Replicate is a Sun-Fire 880, 16GB RAM, 8-way 1200MHz US-III+'s (Sol 8) (ASE 12.5.4)
    Bad Replicate is a Netra-1280 (T12), 24GB RAM, 12-way 1200MHz US-III+'s (Sol 8) (ASE 12.5.4)
    ASE's have the same ESD level.

    Thanks/Regards

    Bob
    Last edited by harq; 09-11-08 at 08:25.
    Bob Holmes
    Sybase ASE/Replication Server Administrator
    Digital Data Safe - managed database services
    Email: bob.holmes@ddsafe.co.uk

  2. #2
    Join Date
    Apr 2007
    Location
    hyderabad
    Posts
    4

    Hi

    Can you check at that time what the slow server was excuting . Might be a case like for a perticular table on fast replicate side a index is their and for other slow one index is not thier !!! ...( as usually it should not be a case i am assumimg it as i faced one issue once where the replciate side index was dropped mistankenly ) . Also see index that indexes on slow sides are not corrupted .

    Can you get the showplan of rhe maint users which was applying tran on both sides ?

  3. #3
    Join Date
    May 2008
    Location
    Cambridge
    Posts
    26
    Hi Ashish,

    Many thanks for your input. At times when it's been going slow I've examined the sql running, the query plans (where possible), indexes, and the last update on the statistics and all looks fine (with the exception of certain queries which have no suitable index and are updated on all columns, but both replicates have to deal with the same thing anyway). I've even tried injecting object statistics direct from the primary using optdiag and it's made no difference. Not only that, but I've also seen this happen immediately after both replicates have been refreshed (dump & load) from the primary. So it's not indexes, and it's not statistics. Either way, both replicates are processing the same thing with recent copies of the primary db. This is why I'm so stumped.

    We've also monitored the repserver using admin who,sqt and tuned the value for sqt_max_cachesize. All to no avail.

    We've rebuilt replication several times for one reason or another and we do this the same way for each server, so the connections and setup are exactly the same. So I guess maybe it's some quirk with repserver, or perhaps the Netra architecture doesn't perform as well as the Sun-Fire 880?

    The only other thing I can add, more for information, is that we've added around 30 table rep defs with replicate minimal columns to help improve performance and this has given a tremendous improvement (but yes, we're still seeing the one server lagging behind).

    At this stage I'm thinking I'll need to write a script to monitor the actual throughput for each replicate...

    Thanks/Regards

    Bob
    Bob Holmes
    Sybase ASE/Replication Server Administrator
    Digital Data Safe - managed database services
    Email: bob.holmes@ddsafe.co.uk

  4. #4
    Join Date
    Apr 2007
    Location
    hyderabad
    Posts
    4

    Hi Bob

    Sorry Bob , No idea about it .
    I am stumped now . Any would like to hear from anyone what other possibleties might be for it .

    WAITING FOR SOMEONES RESPONCE ON IT ... Thanks .

  5. #5
    Join Date
    Aug 2004
    Posts
    38

    Network Related?

    I'm sure you've already considered it, but I take it that both replicates are in the same datacentre and/or on the same network?

  6. #6
    Join Date
    Sep 2003
    Location
    Switzerland
    Posts
    443
    Is it right that its a single RepServer for both primary and the replicates?

  7. #7
    Join Date
    Sep 2003
    Location
    Switzerland
    Posts
    443
    Since you mentioned that you have changed repdefs to minimal columns, check if you have any autocorrection enabled in one site for any set of tables in subscriptions but not the other site. That could kill your performance for sure.

  8. #8
    Join Date
    Sep 2003
    Location
    Switzerland
    Posts
    443
    Quote Originally Posted by harq
    At this stage I'm thinking I'll need to write a script to monitor the actual throughput for each replicate...
    Bob
    You should. If you cant find anything on the web, let me know and I can get you some stuff.

  9. #9
    Join Date
    May 2008
    Location
    Cambridge
    Posts
    26
    KevR/trvishi, many thanks for your input. Very useful!

    Yes, there's only the one repserver.
    Auto correction is definitely off, we simply haven't used it at all.
    With regard to the datacentres, that's a very good point! In fact, they're in different datacentres, but surprisingly it's the server that's lagging which is actually local to the primary. That's not to say that network isn't the problem, it could be that there's a faulty or overloaded switch local to primary which is causing the problem whereas the route the other site might be ok. I'll look into that and do some benchmarking to each replicate.

    trvishi - thanks for the offer, I'll have a look around the web and will certainly get back to you if I can't find anything.

    Bob
    Bob Holmes
    Sybase ASE/Replication Server Administrator
    Digital Data Safe - managed database services
    Email: bob.holmes@ddsafe.co.uk

  10. #10
    Join Date
    Aug 2004
    Posts
    38
    As they're in different datacentres the network is certainly worth a look. I'd start by ftping a largish file from the repserver host to both the replicate hosts and timing the transfer. If there's a big difference between the two, maybe it's not the ASE or repserver config at all, and it's possible that there's not much you can do at the database level. In which case it could be network or san config.

    I've worked at sites that use netbackup and we've used the backup network for heavily replicated systems. It's obviously busy at night, but where most of the rep work was done during the day, we would utilise the quiet network during business hours. We'd set up a additional listener on the alternative interface, give that one to the repserver in its interfaces file and replicate without touching the app facing network. Don't know if you have a seperate backup network at your site, but maybe worth a try if you do?

    Regards,
    Kevin

  11. #11
    Join Date
    Sep 2003
    Location
    Switzerland
    Posts
    443
    Quote Originally Posted by KevR
    As they're in different datacentres the network is certainly worth a look. I'd start by ftping a largish file from the repserver host to both the replicate hosts and timing the transfer.
    Could be the winner.

    Also just compare the two ASE configs on target servers if you havent done already.

  12. #12
    Join Date
    May 2008
    Location
    Cambridge
    Posts
    26
    Ok, I'm back - sorry for the delay.

    Transfer rate to the DR site is 6.0MB/s
    Transfer rate to the local (slow replicate) site is 12.6MB/s

    Repserver box is also on the same subnet as the slow replicate.

    Still a mystery.

    On the bright side, I've written a script to monitor throughput, so maybe that will give some clues as to what's going on.

    I've already compared the two ASE configs - all looks good to me. They are not configured the same, but they are properly configured for their respective hardware environments - number of processors, memory, etc. I might revisit it for good measure.

    Thanks for all input so far.

    BR
    Bob
    Bob Holmes
    Sybase ASE/Replication Server Administrator
    Digital Data Safe - managed database services
    Email: bob.holmes@ddsafe.co.uk

  13. #13
    Join Date
    Sep 2003
    Location
    Switzerland
    Posts
    443
    Quote Originally Posted by harq
    Transfer rate to the DR site is 6.0MB/s
    Transfer rate to the local (slow replicate) site is 12.6MB/s
    Bob
    What does this mean? Thought you are going to post transfer rate between repserver --> fast and slow replicates

  14. #14
    Join Date
    May 2008
    Location
    Cambridge
    Posts
    26
    Sorry, I wasn't very clear with that, here's what I should have put:

    repserver to fast replicate: 6.0MB/s
    repserver to slow replicate: 12.6MB/s

    That was a straight scp of a 200mb file.
    Last edited by harq; 09-24-08 at 05:30.
    Bob Holmes
    Sybase ASE/Replication Server Administrator
    Digital Data Safe - managed database services
    Email: bob.holmes@ddsafe.co.uk

  15. #15
    Join Date
    Sep 2003
    Location
    Switzerland
    Posts
    443
    Quote Originally Posted by harq
    Sorry, I wasn't very clear with that, here's what I should have put:

    repserver to fast replicate: 6.0MB/s
    repserver to slow replicate: 12.6MB/s

    That was a straight scp of a 200mb file.
    ok. So, that doesnt help.

    I guess then we have to go back to the basics.

    Is there any difference in types of replication between the two patterns. i.e. Is one a warm-standby replication and other a table-table replication, if so post which one is warm-standby and which one is table-table (fast/slow)?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •