Unanswered: Weird Access 2003 multi-user performance problem
(I'm sorry this post is so long, but I wanted to be sure to mention the sorts of things that get asked in follow-up posts, so as not to waste anyone's time)
I have a perplexing performance problem in a multi-user Access application that has me stumped. I have done exhaustive research on this topic, and I believe all the usual "best practices" from the many previous posts on this topic (especially Tony Toew's tips at granite.ab.ca) were already implemented and have been double-checked recently. I've been developing and supporting multi-user Access applications for many years, and I have never seen anything like this.
Here's the setup:
An unsecured Access 2000 MDE (split FE/BE), developed in Access 2000 and deployed as an Access 2000 MDE on each client PC; it is decompiled, compacted and repaired prior to each release. The app maintains a persistent connection the BE. Name autocorrect is off, record-level locking is off, default open mode is shared. In modules and classes, ADO is used extensively, DAO 3.6 only rarely. All ADO connections use the CurrentProject.Connection object. The data structure is highly normalized with approximately 65 tables all participating in RI. Subdatasheets are off in the BE. The application is well-indexed, and is not a heavy data entry application (probably less than 100 records a day) - mostly users browse data, make periodic edits and inserts, and run reports. All forms are bound, but none are linked directly to the full table - every form opens only after the user has requested a specific record or small set of records.
This app is currently installed at 13 different organizations, most of whom use XP Pro and Access 2000 or Access 2003, with the BE on a LAN shared drive. Most sites have between 5 and 15 simultaneous users. All sites except ONE have perfectly acceptable performance running the app all day long, and have had for years.
The problem is that at only one client organization, the application launches and runs with acceptable performance in a multi-user setting for between 10 and 30 minutes, after which point the performance becomes "unacceptable" for at least one user, and eventually for all users, until they exit and re-launch the app. ("Fast" is seeing the primary forms opening in 1-3 seconds, and "slow" is between 6 and 10 seconds.)
The "problem" site has 4 users with new PCs with 1 gig RAM, all running Windows XP Pro SP2 with Office 2003 11.6566.8122 SP 2, msjet40.dll version 4.0.8618.0. The data file is on a shared drive on a DL 370 server with 2 gig of RAM. The data file is named in 8.3 format (QB891176) and is located just one level down the directory tree (e.g. R\AppName\AppData.mdb). All users have R/W/D permission on that folder. The IT staff at the client site has added the registry key for "shared file access delay" (QB150384). We are not seeing any data corruption, nor are we getting any locking conflicts. IT staff has said that no other major processes are running on this server.
To check for any hidden corruption in the data file, we created a new empty database, imported all the tables, and re-linked. The BE data file is approximately 280 meg and is regularly compacted and repaired. This client's data file is larger than any other organization using the app, but not by enormously so.
When multiple user are running the app and one has slow performance, the other users are not instantly affected; it may be another 10 or 15 minutes before any of the other users experience the slowdown. When users exit the Access and restart, they have acceptable performance again, until the next slowdown occurs. We saw a user regain performance by just exiting the MDE and staying in Access, then re-opening the MDE, but we have not extensively tested this scenario. We have not asked users to stay in the slow mode indefinitely to see if they spontaneously regain performance.
This organization has a large physical plant with an extensive network. In order to rule out network infrastructure as a factor, the IT staff created a discrete LAN by moving the current server to the office where the users are located and hooking them up to the server directly with a switch, but we still saw the same performance drag after 10-15 minutes.
I am completely stumped - if there is a serious problem in how my application is designed or coded, I would be hearing complaints from other organizations, but we've checked, and the other 12 sites are seeing acceptable performance, so logically it seems like it must be something site-specific, or a problem in my app that is only apparent because of some site-specific setting or condition.
The fact that performance goes back up (temporarily) after users exit makes me suspect some sort of resource is being used up and then released, but how can I determine what it would be specific to this site? Access memory usage in Task Manager starts out at about 30,000k and is at only about 50,000k when the user experiences the slowdown, so it doesn't seem like it is specifically Access that is using up resources. I saw some posts in 2002 about a memory leak with Jet, but it looked like it had been fixed in recent SPs.
I believe that every ADO and DAO object that I open is closed and set to nothing, of course it is possible that somewhere in the many, many lines of code I've missed a few, but again, wouldn't I be seeing the same performance problem at other sites?
I don't think it is related the absolute size of the BE because when the app is fast, it's very fast. We also transferred a scrambled version to another client site and ran it there and performance for two users was fine (although we couldn't really take up their time to do extensive testing). I'm going to take it to a colleague's office and try testing on his server, but I get very good performance using their data on my peer-to-peer setup.
We are not seeing any corruption, so I haven't asked the IT people to mess around with OpsLocks, but maybe I should try that next?
This client is running enterprise-wide Symantec real-time virus scan - could this be a factor? Does anyone know what sort of settings I should be looking at or directing the IT people to look at?
I have been collecting user-specific data for 2 days (my central error handler has been writing time in and out of every function) and I will be receiving those files Monday for analysis, but I'm not sure what I should be looking for.
Three things, no four things have gone through my head while reading this. Here is the fourth: Very good job explaining, thanks! Now the first three.
1. Is there any way to archive some of this data to bring the BE size below the biggest size of your other locations? Just for a day, for testing?
2. Call Symantec and see what they have to say about their real-time virus scan and how that could effect an Access app.
3. For $245 Microsoft will work with you to get this app running faster. No, I'm not on commission, but have used it a couple of times and they have been extremely good.
Have you considered SQL Server for the BE? I really don't think this would be the answer as it seems to be something with the FE, or the network.
I would look at the PC with the performance problems. Try using another PC to see if the performance problem persists. Check the duplexing on the PC's Network Card, anti-virus this PC even try some the AntiSpy software or even a new ethernet card. Does the PC see all its installed RAM. If the problem PC has been re-located and it still has a problem, I would strongly suspect the PC. I have only once see something like this and it was the Network Card. I changed the duplexing and the problem was resolved.
GolferGuy - thanks for your comments and suggetions. Let's see - archiving is not really a possibility, but if modifying the virus scan to skip our MDE/MDB fiels doesn't solve the problem, I will take this app and the client's large BE to a colleague's office and test on his server. I can also deposit a smaller sample BE database on the client's site for them to test, but I'm a little hesitant to do that because this is a busy time for them, and they would have to stop work and "mess around" in something that it not "live". We might try that later this week.
SQL Server for the BE is not really feasible - this is not a custom app, and 12 other organizations are using it, many of whom can't or won't use SQL Server. And it will be a lot of time and money to optimize the app for SQL Server, neither of which are readily available right now. Someone on another forum suggested upsizing to SQL Server for the query analysis tools alone, but I sure hope it doesn't come to that.
Calling is MS tech support is on the list, once I get some clarity from the IT people on the virus scan . Thanks for the suggestion.
Simon - I appreciate your suggestions. All the PCs are experience the same performance problems, but not all at the same time. I will ask my IT contact to check the NIC settings - the PCs are all new and were purchased at the same time - it could be that all of them have the same setting that is causing this.
Also, I have requested that they configure the real-time antivirus software to skip our MDE and MBD files.
As of this afternoon I am logging memory usage in each workstation while they use the app, and I have geven them a shortcut key that allows then to mark that they have experienced the slowdown, which I then log with a date/time stamp and all the memory stats. Once I gather all the logs I hope to have enough data to look for a pattern. If the problem lies within my code, I shoudl be able to identfy it. if not, at least I should be able to see if the PCs are running out of memory.
What Simon said about the network card on one of the PCs gave me the idea to suggest you look at the network hookup on the server. Seeing as how all the PCs are experiencing the slow down, at one time or another, the server and it's network card is just one of the "common" things to all the PCs. (I know this is a shot in the dark, but I have found brainstorming very helpful in the past.)
What is the O/S and the file system (NTFS, Novell, etc. ) on the remote server?
I do know that Access db's are somewhat sensitive to O/S and/or file system format in shared database environments.
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
- turning off Symantec real-time virus scanning of MDB files on the
server made no difference;
- turning off Symantec real-time virus scanning of the local PCs also
made no difference (according to the users - I am looking into this
assertion using my logs but it takes a while to go through them).
- I've been logging all kids of stats - memory and page file useage,
and ISAM stas, and now I have a ton of data to wade through. I'm not
really sure what I am looking for, so right now I am just looking for
- from what I can see so far from the logs, the slowdowns seem not to
occur inside many of my procedures, but rather between them. I'm not
sure what to make of this - either it is Access form and control
processing that is bogging down, or something totally outside of
Access. I can only log Access actions - I have no way of knowing or
tracking what else is going on.
- I have sped up the opening and loading of some app forms by using
subselects in some places instead of custom functions in queries, and
other tweaks. The users are running this version today. This will be
good in the long run, but the issue is not that the app IS too slow,
it's that it BECOMES too slow.
- when these users start my app, they are already running with a
memory load of 50% or more. This seems high - my PCs all run with a
load of about 30% when starting Access. I am going to run a
controlled experiment today with some colleagues to see what our
memory usage is like compared to these users.
What I need is a hypothesis to test - but I don't know much, if
anything, abut memory management, pages files and caching, etc.
What can cause multiple PCs sharing an Access BE database to become
slower over time? (When other PCs using the same app in other
locations o not experience the slowdown?)
Any ideas? We're ready to bring in anyone who feels like that have
some ideas of how to identify the problem.
Here's what I'm planning today:
1. Use a recipe of actions and have multiple users here in my office
simultaneously run through the steps (e.g. start at time marker 1:
open form A, at time marker 2: find record, At time marker 3: do
x,y,z, at time marker 4: go to form B, etc.) - log files will be
2. Have the users in the problem site follow the same script
3. Ask some of our other nice clients at another site to do the same
One the IT staff at the site uninstalled the desktop AV software and resintalled a later version. Since then, (after an inexplicable brief delay?) all the users have been in the application for a week now with uniformly and consistently fast (e.g. normal) performance.
For the record, the version of Symantec AV was 126.96.36.1990 (according to the version for the "running modules" from msinfo on a user PC) before the reinstall and now it is 10.1.5.5000. There are other versions for other Symantec modules, but these are the main ones, as far as I can tell, and every AV component's version number went up.
None of the changes I made to the application to improve performance in specific activities made any noticable difference whatsoever, and in fact, they were using the same release of my app before the AV was reinstalled and afterwards, when they finally experienced acceptable performance.
In retropsect, I wish I had pushed a little harder to get the environmental issues addresses first, instead of digging around so much (and ultimately, with so little gain) in my app. However, I did learn a great deal about Jet performance on large indexed tables (it's fast!), and I have much more confidence in the solidity and performance of my app. I also now have a set of nifty memory and ISAM logging functions, in case I ever need to go through this again, so that's good.