We are running Clustered SQL2000 Enterprise on 2 Win2K Advanced Servers.
The data is held on a Compaq Storageworks SAN and each server has 2 fiber connections to the san for resilience. Our cluster failover tests all worked well until we simulated a loss of connection to the Storageworks disks by disconnecting the fiber connections to one of the servers.
Instead of failing over to the other server we had a series of hard disk write failure messages. When we manually moved the clustering to the other server the SQL database would not start and had a corrupted master database.
When questioned our reseller theorised that when we pulled the fiber connections from the server there was still traffic in the fiber cable in the form of photons of light that caused the corruption.
I am not convinced by this answer as I would have thought that error checking algorithms in the fiber comms would prevent this happening.
I would like to know:
Has anyone successfully failed over a sql2000 cluster when the server’s connection to the storage is cut?
Has anyone a better theory on why the masterdb corrupted?