The company I work for is been having serious problems with MSCS clusters getting Event ID: 1038 Reservation lost errors. Most of our Production environments are now at Windows 2000 SP4, some are still on SP3. We have had this issue at my company since SP2.
The environments that I support run Oracle and SAP (different versions). Other Production Win2k MSCS environments connected to the SAN are having the same issue. No development environments at my company have seen this issue yet. We are using EMC storage. Our problems have become worse lately, because we are getting reservation lost errors on our Oracle drives, and the system is hanging. The cluster.log file is showing an error 170 (along with 1038), and the cluster resources are not failing over.
We are forced to power off the node, and then recover the database when it comes back up.
We are considering de-clustering our environments at this point. We have been engaged with the storage vendor (EMC) and Microsoft daily for about a year, with no resolution yet.
Has anyone else out there run into this type of issue? Any input or suggestions would be appreciated.