Results 1 to 7 of 7
  1. #1
    Join Date
    Sep 2004
    Posts
    9

    Unhappy Unanswered: Random communication link failures

    We have been seeing random inexplicable communication link failures when communicating with a Win2K SQL server for a while now. After a very detailed analysis of the various causes of the problem (network, name lookups, etc.), we've narrowed it down to possibly the ODBC driver. We are using TCP/IP.

    I've stuck a packet sniffer on the connection between the SQL server and the client and in almost all cases, the connection suddenly terminates with the client sending a TCP reset to the server.

    Looking at the packet traces further, it seems like in about 60% of the cases, there is period of activity on the TCP connection, then some inactivity during which there is a constant stream of TCP keepalives between the client and server and then suddenly the client resets the TCP connection.

    Now, we can usually correlate this TCP reset to some new activity initiated on the client application, so could this be related to connection pooling in the ODBC? Thats the only inference I can obtain.

    We are running Win2K SP3a on the server.

    Any ideas on what else to look for or how to debug this further? I have 10GB of packet traces and can provide more details on the connection traces if necessary. The problem also is that we have ~100 clients constantly communicating with the SQL server and we will see anywhere from 10-20 random CLFs in a day.

    I've searched the archives extensively and this does seem to be a problem for many people, but a few of them seem to have had genuine network problems and we've pretty much ruled that out since there are other simultaneous TCP connections between the client and the server and they seem to be okay.

    Thanks,

    TN

  2. #2
    Join Date
    Nov 2004
    Posts
    1
    I am having the same random communication link errors (using ODBC DSN) with one of our applications running on a client (MDAC 2.8 ... verified with Component Checker) and a SQL 2000 SP3a server (8 processors and 16GB of RAM). I am at wits end trying to solve this and I know it is not network related because we have already attacked the problem from that angle. Any ideas out there guys??

  3. #3
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    What are the users getting? If they are not getting errors, - it may be code-related.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  4. #4
    Join Date
    Feb 2005
    Posts
    2

    work around?

    I've been searching Google on "odbc sql server driver communication link failure" and have run across a lot of messages like this one. I'm having the same connection problems but my case is different in one key way. I *know* why I'm getting the disconnection problem but since I can't seem to fix it, I'm looking for a temporary work around until I can. With my situation, I have 25 users roaming a building on wireless tablet-PC's and 1 out of 10 times when the tablet roams between WAP's, it will momentarily lose it's connection to the network. It's very brief, but it causes the ODBC disconnection error to pop up and since the application being used is apparently hyper-sensitive, it has to be completely exited and restarted to begin working again. Talk about angry users. We've been in a long process to correct the network problems and have made improvements, but it's slow going getting it completely fixed.

    I was hoping for a temporary fix via SQL Server or within the client ODBC settings. I'm a SQL dummy so I don't even know where to begin. Are there timeout settings in the registry on the client or settings within Enterprise Manager on the server that could be tweaked/increased to allow for these brief disconnection periods? Our application vendor hinted that there were, but was unwilling to give any additional information. I don't want to do anything "experimental" that could cause me additional headaches, but I figure there might be tweaks that I just don't know about being that I'm such a SQL ignoramous.

  5. #5
    Join Date
    Jan 2005
    Location
    TempDb
    Posts
    228
    Why does your client try to maintain a connection to SQL Server? This error should only happen during a transaction, and it appears your client is holding results open with your server - that is a poor application design.

    Look at how web applications are designed - you cannot possibly expect (allow) a user to hold a connection from their browser. Every transaction must run to completion or be gracefully aborted.

    If you are unfamiliar with NT's connection keep-alive, you may want to read up on it. NT has what I call 'obnoxious' connections, but what are formally known as 'persistent' connections - NT (Win 2K) tries to buffer a client when the client goes away, and that can be a bad thing for database connections. This is a typical NOS 'feature', and is what you are seeing on your sniff.

    Search on KeepAliveInterval and KeepAliveTime. Microsoft KB Article
    Last edited by MaxA; 02-02-05 at 04:09.

  6. #6
    Join Date
    Feb 2005
    Posts
    2
    Quote Originally Posted by MaxA
    Why does your client try to maintain a connection to SQL Server? This error should only happen during a transaction, and it appears your client is holding results open with your server - that is a poor application design.
    are you replying to me or to the original post? (which is from September)

  7. #7
    Join Date
    Jan 2005
    Location
    TempDb
    Posts
    228
    Pardon me, didn't pay any attention to the dates. Made the wrong assumption that this was current.

    Either way, my observation stands and if you are having similar problems, I would guess they are for similar reasons. There is no ODBC silver bullet for you, because it is not the cause of the problem. It is a symptom.

    I assure you what you are seeing is due to the need to hold connections that should not be held. Were I developing a WAP application for an unreliable network, I would either ensure my transactions were extremely fast and eliminate the headaches the network could present, or do what you are doing and beef up the network. On second thought, I'd do both.

    Your problem is not due to SQL Server timeouts, it is due to SQL Server disconnects. Significant difference. If your clients were experiencing SQL Server timeouts, the path to the resolution would be different.

    You may want to search on finding a way to avoid the network disconnect (simulated, because it is gonna happen) when the clients switch access points. That is a network problem - not a SQL Server problem. SQL Server expects to have a reliable network on which to communicate, thus it errors when it cannot.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •