Unanswered: SQL just stops allowing new connections
I manage a trading system running SQL2K on a clustered pair of huge boxes. Performance never above 20% or so. About 3 times a week, suddenly nothing can create new ODBC connections to it.
Nothing in the logs (as usual). All I get is 'timeout expired' errors in the client applications (IIS/MDAC2.6). If I wait long enough, the system seems to wake up again, but I need to find the source of the problem. Anyone got any ideas? There seem to be about 100 active SPIDs at any one time, and the spid numbers don't increase very much, so I guess I don't have a connection leak?? (anyway the number of connections to sql is 'unlimited').
Any help, or suggestions about what to log, much appreciated.
You should see messages in the SQL Server log, and probably the NT Event log like these:
Login failed. The maximum workstation licensing limit for SQL Server access has been exceeded.
18458 Login failed. The maximum simultaneous user count of %d licenses for this server has been exceeded. Additional licenses should be obtained and registered through the Licensing application in the Windows NT Control Panel.
18459 Login failed. The maximum workstation licensing limit for SQL Server access has been exceeded.
18460 Login failed. The maximum simultaneous user count of %d licenses for this '%ls' server has been exceeded. Additional licenses should be obtained and installed or you should upgrade to a full version.
"Unlimited" refers to total number of connections, not the total number of users. Your number of users is limited to the number of seat licenses, so once you hit 10, nobody else will be able to connect.
No - nothing like that! Literally the only message I get is from the client applications (including applications on the same machine) reporting ODBC timeouts.
I sort of suspect that the cluster service is managing to lose its virtual IP/machine name, but I've been assured that this isn't possible (and anyway I'd get some sort of error like SQL server does not exist)
When this problem occurs, can you make an OSQL connection to the database from the database server? If so, then problem is in your connection. If not, then the problem is on the database server (most likely out of seats).
no - it froze when i tried to view 'current activity' in Enterprise manager. Query analyzer worked ok.
What seems to be happening is I've got about 100 SPIDs with a status of RUNNABLE, a command of EXECUTE and a BLKBY of another SPID (this is running sp_who2). So I think what's happening is not a total blackout but an apparent one, because loads of SPIDs are blocked.
I guess what happens then is the ODBC queries time out, so the impression is that SQL has stopped responding. What I don't understand is once the query times out, why is the SPID not destroyed (because it's blocked?)
Each of the blocked SPIDs will indicate the SPID that is blocking it, and that SPID will indicated the SPID that is blocking it, etc. Track back through until you find a blocking SPID that is not itself being blocked, and then check the Properties of that SPID to find out what SQL it is running. That is what is causing your deadlock and your timeouts.
Query Analyzer makes a more direct connection to the database than Enterprise manager, so use it do analyze the problem.