Results 1 to 13 of 13
  1. #1
    Join Date
    Aug 2003
    Location
    UK
    Posts
    6

    Question Unanswered: Latch waits on a problem database

    Hi,
    I have a problem database that I am dealing with.
    So far I have managed :-
    By doing the following :-

    create table as
    (select
    name,
    gets, misses
    from v$latch
    where gets > 0
    and misses > 0
    );

    ------------ Waiting 2 minutes

    create table odg_after_latch_details as
    (select
    name,
    gets, misses
    from v$latch
    where gets > 0
    and misses > 0
    );


    select T1.name
    , (T2.gets-T1.gets) "Delta Gets"
    , (T2.misses-T1.misses) "Delta Misses"
    from odg_begin_latch_details T1
    , odg_after_latch_details T2
    where T1.name = T2.name
    order by 2 desc
    ;

    I get the following results:
    Am I right in assuming from this (since it is supported in the StatsPack
    we take each Monday morning at peak time) that the following 2
    points are to be noted :-

    The library cache is probably too large
    - Misses on library cache latch imply that its
    taking too long to look for sql

    The buffer cache could be larger
    - cache buffers lru chains misses
    implies misses on entering items into the
    the buffer cache.

    NAME Delta Gets Delta Misses
    -------------------- ---------- ------------
    library cache 1325191 9840 0.74%
    cache buffers lru ch 1167962 8586 0.74%
    row cache objects 1072756 4216 0.39%
    redo writing 6878 23 0.33%
    shared pool 150162 220 0.15%
    multiblock read obje 410510 406 0.10%
    messages 7258 5 0.07%
    session idle bit 427124 96 0.02%
    checkpoint queue lat 6446 1 0.02%
    redo allocation 16834 2 0.01%
    cache buffers chains 6144406 603 0.01%
    enqueues 8035 0 0.00%

    I believe the system to have a high number of tablescans, and literal sql
    running on Oracle 8i with cursor_sharing=FORCE.

    Can anyone comment of help me to move this forward to some kind
    of 'what do I look for now ?'

  2. #2
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    It appears it is past time to RTFM; where the Fine Manual is "Oracle Performance Tuning 101" (ISBN#0-07-213145-4). Also learn & use SQL_TRACE and TKPROF to actually identify poorly performing SQL and quantify the improvment after the bad SQL is rewritten to be better tuned SQL. Better performance is achieved one SQL statement at a time; no magic bullet exists which suddenly make the whole DB and application run faster.

  3. #3
    Join Date
    Aug 2003
    Location
    UK
    Posts
    6

    Thumbs down

    Originally posted by anacedent
    It appears it is past time to RTFM; where the Fine Manual is "Oracle Performance Tuning 101" (ISBN#0-07-213145-4). Also learn & use SQL_TRACE and TKPROF to actually identify poorly performing SQL and quantify the improvment after the bad SQL is rewritten to be better tuned SQL. Better performance is achieved one SQL statement at a time; no magic bullet exists which suddenly make the whole DB and application run faster.
    Actually its working from that very book and from a discussion with the authour that I have got to this point.
    My problem is that I know the SQL is c**p but its a supplied package so thats it. In addition I also know that I am playing in the 20% or less that I can make a difference -BUT if possible I need to understand what it is that my results are telling me, because real soon now the suppier is going to start pointing fingers and I had better be covered.

  4. #4
    Join Date
    Apr 2003
    Location
    Greenville, SC (USA)
    Posts
    1,155
    I did not write this but have done this and seen an improvement

    Cache Buffers LRU Chain - The “cache buffers lru chain” latch is used to scan the LRU (least recently used) chain containing all of the blocks in the buffer cache. A small buffer cache, excessive buffer cache throughput, many cache based sorts, and DBWR not keeping up with the workload are all culprits that can cause this issue. Try to fix the queries that are causing the excessive logical reads. You can increase the initialization parameter DB_BLOCK_LRU_LATCHES to have multiple LRU latches will reduce contention. Generally, non-SMP (symmetric multi processor) machines only need a single LRU latch. Oracle automatically sets this to ½ the number of CPUs on SMP machines. You must have at least on LRU latch for each database writer; make sure that you increase this if you add database writers

    As for the library cache ... I might look at:
    ... increasing the SGA
    ... enlarge the KEEP pool
    ... Find the objects that have been reloaded and possible pin some
    in memory

    HTH
    Gregg

  5. #5
    Join Date
    Aug 2003
    Location
    UK
    Posts
    6
    Originally posted by gbrabham
    I did not write this but have done this and seen an improvement

    Cache Buffers LRU Chain - The “cache buffers lru chain” latch is used to scan the LRU (least recently used) chain containing all of the blocks in the buffer cache. A small buffer cache, excessive buffer cache throughput, many cache based sorts, and DBWR not keeping up with the workload are all culprits that can cause this issue. Try to fix the queries that are causing the excessive logical reads. You can increase the initialization parameter DB_BLOCK_LRU_LATCHES to have multiple LRU latches will reduce contention. Generally, non-SMP (symmetric multi processor) machines only need a single LRU latch. Oracle automatically sets this to ½ the number of CPUs on SMP machines. You must have at least on LRU latch for each database writer; make sure that you increase this if you add database writers

    As for the library cache ... I might look at:
    ... increasing the SGA
    ... enlarge the KEEP pool
    ... Find the objects that have been reloaded and possible pin some
    in memory

    HTH
    Gregg
    Thanks Gregg, that sort of fits with the other stuff I have found. esp DB_BLOCK_LRU_LATCHES which on an 8 cpu system have been set to 1 ???????
    thanks again
    pwl

  6. #6
    Join Date
    Nov 2003
    Location
    Rotterdam, Netherlands
    Posts
    127
    I think DB_BLOCK_LRU_LATCHES should be set to atlease 4 in your case and gradually increased with a check on performance.

    I have gained performance with this process.

  7. #7
    Join Date
    Aug 2003
    Location
    UK
    Posts
    6

    Some further info

    On finally trapping some sql, and talking to the suppliers staff, we have found that there are 2 tables which are about 50 rows, held in 2 blocks. And in order to generate a key for other tables a counter is being updated within a row of these tables .

    - somehow the idea that a sequence might be useful has bypassed the designers - But no !

    The system was purchased because it was not RDMS specific, so it could be easily switched from Oracle to DB2 UDB or whatever.

    I think we could have a lot of work here, as this is currently running with 400 users, and within 2 weeks 3400 (approx) extra users are going to be added.
    And that does not address the lack of stats (no analyze is to run we have been told); and the table scans...
    Ho Hum....

  8. #8
    Join Date
    Jul 2003
    Posts
    2,296
    use
    STATSPACK and get some info from that

    who's to stop you from executing dbms_stats.gather_schema_stats????
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  9. #9
    Join Date
    Apr 2003
    Location
    Greenville, SC (USA)
    Posts
    1,155
    I FEEL your pain !!!! Did they give any reason what-so-ever for not wanting statistics on the objects ???

    I agree with DUCK ... run it anyway !!!! Remember .... It is always easier to beg forgiveness than to ask permission ....

    Gregg

  10. #10
    Join Date
    Jul 2003
    Posts
    2,296
    "Whatever you do, don't make the database run FASTER!"

    I say:
    Take charge! You are the DBA!
    They won't know statistics if it hit them in the head so just run them as a job latenight.
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  11. #11
    Join Date
    Nov 2002
    Location
    Desk, slightly south of keyboard
    Posts
    697
    Hi,

    but be prepared and make sure you a are in a position to drop the statistics afterwards.

    If the App was designed on 7, they may be relying on the RBO (not the CBO) and gathering statistics could cause you no end of problems. Anyone analysed SYS recently on 8? Google or search here for experiences.

    Are you sure they haven't already created generic plans, or that their systems don't run jobs to produce statistics with extra column (viz. histogram) analysis?. If you ran a straight analyse on some systems of mine you'd kill them. That's why these particular systems of mine provide and run their own analysis routines.

    Someone might one day ask why there are FTS on numerous tables - and analyse the schema anyway. The FTS will continue (because they're on tiny tables) but the execution result on the large tables would be awful - like the application would be unusable.

    I would suggest you

    a: Determine (ask) why they insist that you don't gather statistics - there might be a simple answer.

    b: Dertermine where and what exactly is causing the problem and report it to the vendor. Once you can show them exactly where your big bucks hardware's performance is being wasted *then* they have no excuse to fix it.

    c: If someone decided to analyse my schema's after they've been told specifically not to *and why*, and their application becomes unusable, they'd lose all possibility of support from me.

    Hth
    Bill
    Last edited by billm; 01-02-04 at 20:33.
    Please don't email me directly with questions. I've probably just got home from the pub and cannot guarantee the sanity of my answers. In fact, I can't believe I actually made it home.

  12. #12
    Join Date
    Sep 2003
    Location
    Milan, Italy
    Posts
    130

    Re: Some further info

    Originally posted by pwl256
    On finally trapping some sql, and talking to the suppliers staff, we have found that there are 2 tables which are about 50 rows, held in 2 blocks. And in order to generate a key for other tables a counter is being updated within a row of these tables .

    - somehow the idea that a sequence might be useful has bypassed the designers - But no !

    The system was purchased because it was not RDMS specific, so it could be easily switched from Oracle to DB2 UDB or whatever.

    If the problem is this (serialization on the counter-rows) ... you're not going to solve the problem without rewriting the application to use sequences. Whatever you do, the first process to update the row will lock it, and if someone else tries to update the same row, it will wait until the first one commits - and in the meanwhile a queue of process will form, if a lot of processes need a new value from the same counter. There's nothing you can do to prevent it. Nothing.

    But perhaps it's just a problem of serialization on the 2 blocks containing the 50 rows - how's the INITRANS and PCTFREE on this tables ? Me, i would set INITRANS to 50 AND REBUILD the table (otherwise you will see no effect on existent rows) to allow an ITL slots per counter.
    I would also set INITRANS to 50 on all indexes of this tables.

    To rebuild the table: ALTER TABLE xxx MOVE; also remember to rebuild all the indexes on the table.

    We had the very same problem with a $1,000,000 product (no kidding) and the solution was ... to ditch the product. You may imagine the political struggle with the managers that chose that product (of a Very Big Name), but at the end it was the only thing to do.
    Let me say that it's quite typical of so-called "database indipendent" products. They are normaly so database independent that they can't work well with any of them!

    But - are you sure that this is really the problem ? I will follow the suggestion of the_duck - run statspack (on a 15 minutes interval) and look for the "most waited event" section. You may be surprised by the results.

    HTH
    Al

  13. #13
    Join Date
    Apr 2003
    Location
    Minneapolis, MN
    Posts
    273

    Re: Latch waits on a problem database

    Originally posted by pwl256

    NAME Delta Gets Delta Misses
    -------------------- ---------- ------------
    library cache 1325191 9840 0.74%
    cache buffers lru ch 1167962 8586 0.74%
    row cache objects 1072756 4216 0.39%
    redo writing 6878 23 0.33%
    shared pool 150162 220 0.15%
    multiblock read obje 410510 406 0.10%
    messages 7258 5 0.07%
    session idle bit 427124 96 0.02%
    checkpoint queue lat 6446 1 0.02%
    redo allocation 16834 2 0.01%
    cache buffers chains 6144406 603 0.01%
    enqueues 8035 0 0.00%

    I think that anythig with under 1% is fine. But in your case, you should not take decision based on this benchmark as it is taken only once and for only 2 minutes. Take more results and then decide.
    Bhavin

    MS Computer Science
    OCP DBA 9i/8i

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •