Results 1 to 5 of 5
  1. #1
    Join Date
    Sep 2006
    Posts
    5

    Post Unanswered: Is lsass.exe a bottleneck in DB2 performance? (Solved)

    Hello!

    I have some performance issues with our DB2 8.2 production environment.

    We are running a 3d party software (medical journal system) that's utilizing db2 as it's backend on windows 2003 servers.

    The system is designed to use 1 central database which holds all user information plus patient information that is shared with the clinics. Then there is 1 database per clinic that holds the actual journal of the patient (we have 36 clinics so a total of 37 databases).
    The databases is then connected to each other using FEDERATION. So mostly the client's just connects to 1 database then can access the others through federation.

    This system was designed in the early 90's when "general" hardware was really slow, and servers couldn't hold that many simultaneous connections so the developers line was to minimize the time each client was connected to the database. This led to how the system functions today, very high connect rate but low simultaneous connected users.
    So the client only connects to the database(s) when it needs new information or when the user saves information other wise it's disconnected.

    The authentication works by having an "encrypted table" in the central database where the users and passwords resides.
    And then using a static local dummy user on the server which the client utilize to do the initial connection to the database (all clients uses the same dummy account to make the first contact).
    This dummy user is a windows local account that only have connect rights to the database.

    When the client have made the initial connection using the dummy account it then sends the user and password that the user typed in, to verify it against the "encrypted table".
    If they get a match the server sends the "right" user and password that the client will use to make a second connection to the database.
    This "right" user is a windows local account that have more rights in the database and is the user which the client will use when speaking to the database from now on.

    (Yeah i know it's a strange way to authenticate users. But it works and it allows the customers to administrate users and passwords through the client. Probably not very secure though.)

    On the windows 2003 server there are only 2 local accounts (except builtin).
    "dummy" which is used to make initial connect by the client to authenticate user.
    "right" which is used to make the real connect by the client to transfer data after successful authentication .
    So all clients are using the same account to communicate with the database.

    The servers that hosts the database's are all configured the same.
    C: where the system resides consists of 2 local disks in raid1, 15K SCSI.
    D: where database's resides are fiberchannel SAN disks in a RAID 5
    E: where database active logs resides are fiberchannel SAN disks in a RAID 10
    F: where backups resides are fiberchannel SAN disks (slow ATA)

    The instances are created with db2icrt, so alla "instance profiles" are located under C:/.../SQLLIB/.
    The databases are created by "db2 create db blah on D:" so they are located under D:
    Then the active logs are moved from D: to E: with newlogpath.

    The problem we are facing is that our "system disk" is not coping the pressure.
    Even though we don't have that many simultaneous connected users (around 110 ) the C: raid have an average disk idle time below 20%. I have filemon'ed C: and it looks like the majority of all writes go to SAM and SAM.LOG.

    Is this system disk load to be expected with db2 or do we have a seriously flawed configuration.

    The funny thing is that C: have about 8 times more "TOTAL IOs per seconds" than D: which is the database volume. I had understand if it was the opposite but this is just crazy.

    Can it be that we have so many short connects so db2's authentication rutin don't coupe with the speed. Or maybe db2 does coupe but it's windows that can't handle all user query's from db2.

    Anyone able to shed som light on how db2 interact with windows authentication when a user connects?

    Should every db2 connect generate disk IO on the system disk?

    The reason i know this is generated by user connects is that after work hours when there is low user activity the IOs to C: drops radicly. And even when all db backups run att fullspeed (So that D: is saturated) there is hardly any IOs to C: at all.

    Here is a link to som nagios graphs i have taken on one of the server.
    Represents: DISK IDLE TIME, DISK IO READS, DISK IO WRITES and DISK IO TOTALS. Over the period of 07.00 to 16.00.
    http://img528.imageshack.us/img528/9179/tmp6oo0.jpg

    Anyone able to figureout why this happens?
    And what loads do you have on your system disk when db2 operates?
    What system disk configuration do you have?

    I haven't seen any recommendations on system disk configuration from IBM to operate DB2, anyone know a good site or book or could share som knowledge?
    Last edited by Palle Kula; 09-17-08 at 04:57.

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    I'd say this has little to do with DB2, but rather with the Windows Active Directory performance. Every time you connect to the database the database manager redirects the authentication request to the operating system.
    ---
    "It does not work" is not a valid problem statement.

  3. #3
    Join Date
    Sep 2006
    Posts
    5
    Yeah i see your point, maybe i should rename the thread to Is Windows 2003 a bottleneck in DB2 performance

    Have done some more investigation with snapshots. I made a vbs script that's using ibmdadb2 ADODB object to connect to my central database and retrieving a db snapshot every 5 mins. The script measures the time it takes to connect to the database and how long it takes to retrieve 2 db snapshot's containing TOTAL_CONS but with a sleep(1000) command between them.

    So i have TOTAL_CONS and TOTAL_CONS one second later. Subtracting those in the right order gives me the number of connections per second. Then i load this data back to nagios and get a nice graph witch updates every 5 mins.

    When inspecting this data i was shocked to see that the average connections per second to the database was about 15-20 connects. That's not a whole lot.

    Can it relay be that windows 2003 local SAM only can handle about 15-20 authentications per second? This can't be right can it?

    I did a little test. As i said in the previous post i knew that the system disk load dictated the connection performance to the database.
    So I thought to my self, if i copy a file to C: then connections per sec should drop like a stone because lsass.exe will have to share some of its write IO's.
    But i didn't predict that it would come to a grinding halt. My monitor script actually timed out when the file copy was in progress. And it has a timeout value of 20 sec.

    I can't be the only one with this problem?
    How do those gigantic company's utilizing DB2 to there E-commerce with thousands upon thousands connections work? maybe they don't use windows.

    20 connections per second thats nothing in the DB world. Something is truly wrong.

    Anyone have any suggestions?

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Most companies use some sort of connection pooling on the application tier so there are not so many new DB2 connections to the database being requested.

    But you are also correct in that most do not use DB2 for Windows.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  5. #5
    Join Date
    Sep 2006
    Posts
    5

    Smile

    Actually i solved the problem.

    It turns out that using the db2 variable DB2_GRP_LOOKUP=LOCAL,TOKENLOCAL cuts db2 connect times from our avg of 75-300ms to 15-16ms
    And it totally cured our IO problems on the system disk. Going from a avg of 170 IO's to 0-5 IO's per second
    I find this rather strange because the DB2_GRP_LOOKUP=LOCAL is supposed to limit group enumeration to only local groups. So why does hindering a DB2 server to enumerate groups in AD result in less IO's to C:.
    I can understand that the connection times drops by not having to make an AD query , but IO's i have no clue.

    And why does DB2 enumerate groups in AD on a user that only exists on the local computer and is not a member of a AD group?

    Changing topic to solved for anyone that might have the same problem.

    EDIT:
    Hmm looks like the topic won't change on the forum board.
    Last edited by Palle Kula; 09-17-08 at 05:00.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •