Results 1 to 15 of 15
  1. #1
    Join Date
    May 2003
    Location
    UK
    Posts
    220

    Unanswered: Is SQL 2000 clustering robust ??

    Howdy,

    I have had questions from management about whether SQL 2000 clustering on windows 2000 is any good.

    We are looking at spending *quite* a bit of money to implement it, but I want a "from the trenches" opinion of what its like , from the people that actually use it & look after it.

    e.g.

    DOES IT WORK LIKE IT SHOULD????????
    Is it reliable?
    Is it resource hungry?
    Can I trust it to do what its supposed to do?

    All repsonses very welcome. No response too small.

    Thanks for your help.

    Cheers,

    SG.
    Last edited by sqlguy7777; 01-11-04 at 23:49.

  2. #2
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322

    Re: Is SQL 2000 clustering robust ??

    Originally posted by sqlguy7777
    All repsonses very welcome. No response too small.

    SELECT LEN('What are you going to use it for?')
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  3. #3
    Join Date
    Jul 2002
    Posts
    58

    Re: Is SQL 2000 clustering robust ??

    I support a SQL 2000 cluster, and I'm quite happy with it.

    Before spending a bunch of money on a cluster, the first question to ask yourself is what type of disasters are you expecting it to protect you from?

    Clustering is primarily protection from hardware and operating system failure. And with RAID disks, you're probably already protected from most disk failures. Depending on your server, you may also be protected from single failures of netowrk cards and processors (though with some performance degredation until you resolve the problem)

    Opinions on the stability of Windows are all over the map. You're milage may vary.

    Whether or not clustering will protect your applicaiton is a murkier question. Since there will only be one copy of your databases, you will *not* be protected from DB corruption unless you implement further protections beyond basic clustering. Whether or not and subsidary services you write will be protected depends on their implementation.

    If you have any specific questions, shoot.

  4. #4
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    I have been watching 3 SQL 2000 clusters for around 2 years, now. Only advice I have for you is to put the quorum on its own physical device. I have had one of my clusters fail over when the transaction log got hit hard one day. After a day on the phone with MS, they pointed me to one sentence in one article in one section of technet, and promptly washed their hands. I am not sure if you can move a quorum after it has been created.

    Oh, and one other thing. may you never face a quorum corruption problem. Quorum disks can still go bad.

  5. #5
    Join Date
    Jul 2002
    Posts
    58
    Originally posted by MCrowley
    I have been watching 3 SQL 2000 clusters for around 2 years, now. Only advice I have for you is to put the quorum on its own physical device. I have had one of my clusters fail over when the transaction log got hit hard one day. After a day on the phone with MS, they pointed me to one sentence in one article in one section of technet, and promptly washed their hands. I am not sure if you can move a quorum after it has been created.

    Oh, and one other thing. may you never face a quorum corruption problem. Quorum disks can still go bad.

    One thing they don't tell you (at least not that I could ever find) is to be sure to turn off write caching on your quorum drive. We had many uncommanded failovers until we did this. Although I can't claim to know for a fact that this is the cause, it seems reasonable that the node owning the quorum writes some data to it, then signals the other node via the heartbeat network that the other node can acquire the quorum. Only problem is, the controller hasn't really put the data on disk yet, so the other node either can't read the quorum quick enough (the controller won't let it have it) or it doesn't see what it's expecting to see. Either way it then forces a failover. Whether or not this scenario is is what is really going on or not, as soon as we turned off write caching on the quorum drive, our uncommanded failovers ceased.

  6. #6
    Join Date
    May 2003
    Location
    UK
    Posts
    220
    Howdy,

    Thanks everyone so far for your thoughts.

    Brett - its mainly for supporting multiple databases ( direct access from desktop ) and some mission critical apps ( access via web system ) .

    We wanted to split the cluster ( and mirror quorum drive using SAN ) across 2 computer rooms such that should one computer room die ( as has happened in the past ) the cluster will keep running.

    What physical architecture do you run and what problems ( if any ) have you had with the cluster and why? Is it worth the money?

    I'm interested if we are making a rod for our own backs, but we need resiliance for our systems.
    Log shipping is out of the question by the way..

    Thanks,

    SG.

  7. #7
    Join Date
    Jul 2002
    Posts
    58
    Keep in mind there is only ONE copy of each DB. You won't get a copy of each DB in each computer room. The databases go on shared DASD, which is pretty much going to be in one room or the other. In our case, our DASD (a SAN-attached IBM FAStT 500 storage controller) has redundant power supplies on separate power circuits, you might be able to power your own DASD similarly from each room. But clustering does not give you two mirrored copies of your DB, a la one on each node.

    Originally posted by sqlguy7777
    Howdy,

    Thanks everyone so far for your thoughts.

    Brett - its mainly for supporting multiple databases ( direct access from desktop ) and some mission critical apps ( access via web system ) .

    We wanted to split the cluster ( and mirror quorum drive using SAN ) across 2 computer rooms such that should one computer room die ( as has happened in the past ) the cluster will keep running.

    What physical architecture do you run and what problems ( if any ) have you had with the cluster and why? Is it worth the money?

    I'm interested if we are making a rod for our own backs, but we need resiliance for our systems.
    Log shipping is out of the question by the way..

    Thanks,

    SG.

  8. #8
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    I'd keep a warm standby...how long can you be out...or can't you?

    Are you dealing with trades?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  9. #9
    Join Date
    May 2003
    Location
    UK
    Posts
    220
    Howdy,


    We are running 24x7 apps that need to be up.

    I thought ( possibly naively ) that we could have a server cluster but also pysically split the SAN so half of it is in one computer room , the other half in the other computer room, and each part of the SAN was an exact mirror of the other ( using hardware mirroring ).

    Then, if we lost one computer room, the cluster would just flick over to the part of the SAN and a server in the other room.

    Sounds simple in theory......is it possible??


    Thanks for oyur help so far

    SG.

  10. #10
    Join Date
    Jul 2002
    Posts
    58
    The trick is going to be finding a disk controller that will allow you to do this. You'll need a pair of such controllers, one in each room, that will either communicate with each other in lock-step, or share managing a set of RAID arrays that are build in RAID 10 with one set in one room and the mirror pair in the other.

    Some hardware genius is going to have to locate that for you. I'm not sure it exists in the Wintel world, but it might.

    What's the failure mode whereby you lose an entire computer room?


    Originally posted by sqlguy7777
    Howdy,


    We are running 24x7 apps that need to be up.

    I thought ( possibly naively ) that we could have a server cluster but also pysically split the SAN so half of it is in one computer room , the other half in the other computer room, and each part of the SAN was an exact mirror of the other ( using hardware mirroring ).

    Then, if we lost one computer room, the cluster would just flick over to the part of the SAN and a server in the other room.

    Sounds simple in theory......is it possible??


    Thanks for oyur help so far

    SG.

  11. #11
    Join Date
    May 2003
    Location
    UK
    Posts
    220
    Howdy,

    Well, we seem to push boundaries on most things so...*sigh*

    Q. WHat is Failure Mode called whereby we lose entire computer room ?

    A. rapidly locate nearest place that serves alcohol. Stay there AT LEAST 2 days or until they find you.Deny all knowledge....

    I guess the concept is borne from the fact our computer room has COMPLETELY been taken off the air in the past.

    For 3 hours.

    Ouch.......

    Cheers,

    SG.

  12. #12
    Join Date
    Nov 2003
    Location
    Christchurch, New Zealand
    Posts
    1,618
    If you are making plans for losing a computer room you may as well make plans for losing the building.... the extra cost involved probably wouldn't be very significant considering the value that it would add.

    I work for a few banks and they all have offsite failover systems.

  13. #13
    Join Date
    May 2003
    Location
    UK
    Posts
    220
    Howdy

    Sadly, no offsite capabilty ( hey, nuts I know but I'm just the hired help...) so we have to assume one computer room gets nailed and the other one will take over......

    Like I mentioned, as usual....pushing the boundaries.......

    Have you seen any SANs where we could use hardware mirroring to replicate all data ( including quorum ) ?

    Cheers,

    SG.

  14. #14
    Join Date
    Nov 2003
    Location
    Christchurch, New Zealand
    Posts
    1,618
    I know the banks that I have worked for have hot-hot swap over so if one room is lost they don't lose anything, the applications don't even pause as far as the users are concerned,...

    I have no idea how it is done though, I just write the code and let the network guys and the system admins take care of all that.

  15. #15
    Join Date
    May 2003
    Location
    UK
    Posts
    220
    Howdy

    Any chance of finding out how they do it please?

    Cheers,

    SG

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •