There are 3 people on my team and we all work remote in varying parts of the U.S. We have been trying to run queries in SAS that have run with no issues for years. Some take 15 hours and up to 5 days. We connect to a DB2 using VPN. The error we get is:
Error: CLI open cursor error: [ibm][cli driver] SQL30081N A communication error has been detected. Communication protocol being used: "TCP/IP". Communication API being used: "SOCKETS". Location where the error was detected: "10.114.160.30". Communication function detecting the error: "recv". Protocol specific error code(s): "10054", "*", "0". SQLSTATE=08001
Error: CLI close cursor error: [ibm][cli driver] CLI1080E Communication link failure SQLSTATE=40003
disconnect from db2;
note: the sas system stopped processing this step because of errors.
note: Procedure SQL used (total process time):
All 3 systems were getting this error and we were told to put in: querytimeoutinterval =0 and lightweightpooling=yes
After putting this in our db2cli.ini file one of our virtual agents no longer gets the error and everything runs for her. The only difference is she has a single processor on her system and she has a local phone company that handles her Internet package. Her upload speed is less than mine and my managers.
My manager and I have dual processors and both have Comcast and have good upload speeds.
My manager and I also have this strange error in our db2.log which the girl that gets this to work does not. Our error is:
If anyone can assist us with trying to figure this out. Our helpdesk is clueless. They cannot figure out what to do about it. So we are hoping maybe a tech on this forum can make some heads or tails out of this. Thanks
You need to provide your DB2 version and OS (server and client).
I suspect it has nothing to do with DB2 but with your network. I think something is killing your connection after it detects no activity for a period of time. You probably need to set a keep alive interval to be smaller than it is.
If I was managing that environment, I would fix the multi-day queries but that's not what you are asking, although it is the "elephant in the room", and if those multi-day queries could be made to complete significantly faster then you would not experience this issue...
Your posted symptom is SQL30081N(10054), this is not a timeout - it is a hard-reset/abort of the DB2 tcp connection...
(if it was a timeout I would expect SQL30081N (10060)).
Your symptom (10054) is that the db2 tcp connection(s) are getting reset by "the other side" (i.e. something between the workstations and the remote DB2 server). This can be gateways or firewalls or even something at the DB2-server forcing those agents servicing the multi-day queries...
You should ask your network people to identify the host with address 10.114.160.30 - is it the DB2-server? If it is the DB2-server, then it suggests that the connections are being reset *there*. If this machine is not the DB2 server, then your support people should be able to determine what has changed on that machine. By default, Windows has a 2-hour keepalive heartbeat on tcp connections and this can be changed (by a registry setting), but if the connections are being reset/forced on the DB2-server itself then changing keepalive will not help you. So that's why you need to identify 10.114.160.30.
You should ask the DBAs at your site to monitor the connections from your IP address to see how those connections terminate.
You did not give any precise DB2-version + fixpack information (on the Windows clients, and on the DB2 server) but this information is often critical for problem resolution.