Results 1 to 3 of 3
  1. #1
    Join Date
    Feb 2004
    Location
    Lisbon, Portugal
    Posts
    5

    Unanswered: Cache replacement policy considered harmful (for union statements)

    Folks, it looks like DB2 (UDB 8.2.2/Linux) cache replacement policy is not "good" when multiple performing full table scans are involved.

    Here's what I'm trying to do: I'm transforming columns into rows of a source table loanevt(loanno char(7),evtyp char(6),captl decimal(9,2),tax decimal(5,2),expns decimal(5,2),bonus decimal(5,2)) into a target table named events using the following query:

    insert into events (loanno, evtyp, amtyp, amt)
    select loanno, evtyp, 'CAPTL' as AMTYP, captl
    from loanevt
    where captl > 0
    union all
    select loanno, evtyp, 'TAX' as AMTYP, tax
    from loanevt
    where tax > 0
    union all
    select loanno, evtyp, 'EXPNS' as AMTYP, expns
    from loanevt
    where expns > 0
    union all
    select loanno, evtyp, 'BONUS' as AMTYP, bonus
    from loanevt
    where bonus > 0

    The issue here is that this statement scans the loanevt table multiple times. However, I was expecting that as I increase the size of the buffer pool I would observe gradually lesser Phys. Reads. But this does not happen. Instead, the number of physical reads only drops (abruptly) when the input table can be totally fitted in the buffer pool.

    I'm testing this with 5M records, so loanevt ends up with 26640 blocks (8K) read through TBSCAN.

    With a buffer pool of 25600 blocks, I get this:
    Buffer pool data logical reads = 110490
    Buffer pool data physical reads = 104795
    ...
    Total execution time (sec.ms) = 246.635994

    With a buffer pool of 27000 blocks, I get this:
    Buffer pool data logical reads = 110490
    Buffer pool data physical reads = 26640
    ...
    Total execution time (sec.ms) = 80.343795

    It looks to me that DB2 is reading the scanned pages and marking them as MRU, this appears ok, I think this is how it should work. When it starts scanning the relation again to build up the result of the union, it will need the first block. Now, even if the buffer is one block smaller than the input relation, when the last block of loanevt is read-in, by age, block #1 is select as a victim. When block #1 of loanevt is read-in, then block #2 is the victim, when block #2 is read-in block #3 is the victim and on it goes.

    I'm still making sense?

    Anyway I could change this? I cannot afford a 1GB readpool to treat a real-world version of loanevt.
    Any pointers would be greatly apreciated.

    Thanks.

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    It looks to me like you are conducting some sort of test of the bufferpool LRU algorithm and not really trying to address a real-world problem with a SQL statement (maybe the "PhD" is a hint). If the query was a real-world problem, you could eliminate the UNION ALL and put all the predicates in the same query and only scan the table one time. UNION ALL is valuable when each individual query can be optimized to a different index, but since each of your UNION ALL queries do table scans, then it seems like a poor way to do it.

    The way you described the LRU algorithm, while maybe not optimal in the case of your specific query, it does seem to be working as designed. In most systems there are multiple users at the same time and referencing different tables, so your scenario seems a bit artificial to me.

    In general, bufferpools are not optimized for queries which do complete table scans on large tables. That is why large tables should usually be put in a separate bufferpool, so that the indexes and small tables can stay resident in their own bufferpool without getting flushed out by a table scan on a large table.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Feb 2004
    Location
    Lisbon, Portugal
    Posts
    5
    Marcus, the query I have presented is a simplifyed version of several rw probs. I have ran into. I don't see how I could eliminate the union all as you sugest. Note that, if I pull all the predicates to the same query, then each input record will result in atmost one output record. Herein, our query needs to produce atmost 4 output records for each input record, hence, the union all.

    As I see it, more users accessing the different tables would worsen the problem even more if this load competes for the same cache as the union all. But, true, the LRU algo appears to be working as designed and though I would find a most exciting task messing arround with the LRU algo, I'm afraid I'm not that lucky yet. You can check out topic on my home page. -- BTW, perhaps I should consider ajusting my signature :-).

    Indeed, large table scans hinder the effectiveness of buffer pool caches. If the tables unioned are different little can be done. Howewer I would expect the plan generator to do better for a simple case like T union T.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •