Hello!
I have some performance issues with our DB2 8.2 production environment.
We are running a 3d party software (medical journal system) that's utilizing db2 as it's backend on windows 2003 servers.
The system is designed to use 1 central database which holds all user information plus patient information that is shared with the clinics. Then there is 1 database per clinic that holds the actual journal of the patient (we have 36 clinics so a total of 37 databases).
The databases is then connected to each other using FEDERATION. So mostly the client's just connects to 1 database then can access the others through federation.
This system was designed in the early 90's when "general" hardware was really slow, and servers couldn't hold that many simultaneous connections so the developers line was to minimize the time each client was connected to the database. This led to how the system functions today, very high connect rate but low simultaneous connected users.
So the client only connects to the database(s) when it needs new information or when the user saves information other wise it's disconnected.
The authentication works by having an "encrypted table" in the central database where the users and passwords resides.
And then using a static local dummy user on the server which the client utilize to do the initial connection to the database (all clients uses the same dummy account to make the first contact).
This dummy user is a windows local account that only have connect rights to the database.
When the client have made the initial connection using the dummy account it then sends the user and password that the user typed in, to verify it against the "encrypted table".
If they get a match the server sends the "right" user and password that the client will use to make a second connection to the database.
This "right" user is a windows local account that have more rights in the database and is the user which the client will use when speaking to the database from now on.
(Yeah i know it's a strange way to authenticate users. But it works and it allows the customers to administrate users and passwords through the client. Probably not very secure though.)
On the windows 2003 server there are only 2 local accounts (except builtin).
"dummy" which is used to make initial connect by the client to authenticate user.
"right" which is used to make the real connect by the client to transfer data after successful authentication .
So all clients are using the same account to communicate with the database.
The servers that hosts the database's are all configured the same.
C: where the system resides consists of 2 local disks in raid1, 15K SCSI.
D: where database's resides are fiberchannel SAN disks in a RAID 5
E: where database active logs resides are fiberchannel SAN disks in a RAID 10
F: where backups resides are fiberchannel SAN disks (slow ATA)
The instances are created with db2icrt, so alla "instance profiles" are located under C:/.../SQLLIB/.
The databases are created by "db2 create db blah on D:" so they are located under D:
Then the active logs are moved from D: to E: with newlogpath.
The problem we are facing is that our "system disk" is not coping the pressure.
Even though we don't have that many simultaneous connected users (around 110 ) the C: raid have an average disk idle time below 20%. I have filemon'ed C: and it looks like the majority of all writes go to SAM and SAM.LOG.
Is this system disk load to be expected with db2 or do we have a seriously flawed configuration.
The funny thing is that C: have about 8 times more "TOTAL IOs per seconds" than D: which is the database volume. I had understand if it was the opposite but this is just crazy.
Can it be that we have so many short connects so db2's authentication rutin don't coupe with the speed. Or maybe db2 does coupe but it's windows that can't handle all user query's from db2.
Anyone able to shed som light on how db2 interact with windows authentication when a user connects?
Should every db2 connect generate disk IO on the system disk?
The reason i know this is generated by user connects is that after work hours when there is low user activity the IOs to C: drops radicly. And even when all db backups run att fullspeed (So that D: is saturated) there is hardly any IOs to C: at all.
Here is a link to som nagios graphs i have taken on one of the server.
Represents: DISK IDLE TIME, DISK IO READS, DISK IO WRITES and DISK IO TOTALS. Over the period of 07.00 to 16.00.
http://img528.imageshack.us/img528/9179/tmp6oo0.jpg
Anyone able to figureout why this happens?
And what loads do you have on your system disk when db2 operates?
What system disk configuration do you have?
I haven't seen any recommendations on system disk configuration from IBM to operate DB2, anyone know a good site or book or could share som knowledge?