Results 1 to 6 of 6
  1. #1
    Join Date
    Jul 2004
    Posts
    306

    Unanswered: Indexes - low card skewed dist

    9.1 on Win

    Hey guys,

    I've got a fairly awful query (that typically I have virtually no power to change) the worst part of the Access Plan shows a tablescan on a 410k row table which plugs into a join.

    The predicates are TYPEID = 1 and STATUS <> 4
    The distro/card for these are as follows

    STATUS
    ------------ -----------
    1 457
    2 89
    3 440
    4 229
    5 401530
    7 23
    8 7277

    TYPEID count
    ------------- -----------
    1 360998
    2 33609
    3 14065
    4 5
    5 1368


    The Card of their combination is:
    -----------
    360770

    I know low card indexes are not desireable but I'm wondering if the fact the data is so skewed can be used to advantage somehow (even though ~80% of the rows fulfil the criteria).

    I've tried a few different indexing options and runstats with dist and sampling etc. to no avail... I can't avoid the table scan...

    Any tips?

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    ... a 410k row table which plugs into a join.
    So, there must be column(s) of the table other than status or typeid in ON condition to join with other table(s).

    Try an index including the join column(s), typeid, and status in this sequence.

  3. #3
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Unless the data in the table is in physical sequence by TYPEID, STATUS (or at least TYPEID by itself, or STATUS by itself), then that means it is very likely that there will be one qualifying row in every single table page. Remember that DB2 does I/O at the page level, not the row level, so if DB2 determines that every page will have at least one qualifying row, it is usually faster for DB2 to do a table scan.

    To get the table in a particular physical sequence, you could define a clustering index that matches the sequence you want. DB2 "tries" to maintain the sequence when inserting if there is room on the correct page or a nearby page, but the sequence is only guaranteed after a reorg if you specify the clustering index in the reorg command (although if you do an inplace reorg, you don't have to specify the clustering index in the reorg command). But if you have a clustering index defined, it can put extra overhead on the system for inserts, and the physical order is usually not exactly maintained if you have a lot of inserts.

    Another option to avoid a table scan would be if you had one index with all the columns that are needed by the query, then DB2 could possibly do an complete index scan (sort of like a table scan, but of the index) not using the b-tree, and that might be slightly faster than a table scan. This will be indicated with index-only access in the explain.

    Table scans are not always bad, and can be the fastest access for a particular situation, which may be why DB2 has chosen it.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  4. #4
    Join Date
    Jul 2004
    Posts
    306
    @Tonkuma - Yea there's another ID field used for the join that has a much better cardinality but including that in the index doesn't help. I think Marcus is right about there being rows on all the pages... I was thinking about a clustering index ... need to figure out how often this table gets written to...
    The main reason that it's such a bad query is that the final predicate is a wildcard comparison agains about 20 VARCHAR fields concatenated together... awful stuff.

  5. #5
    Join Date
    Jul 2004
    Posts
    306
    Quote Originally Posted by Marcus_A View Post
    Table scans are not always bad, and can be the fastest access for a particular situation, which may be why DB2 has chosen it.
    For sure, I trust the optimiser
    This query is the main search people use and it takes several seconds to run each time... dropping that down would have a big psychological impact with the users... but if it can't be done...
    I'll try a cluster but I reckon it'll only shave a fraction off it... we'll see

  6. #6
    Join Date
    May 2009
    Posts
    508
    Provided Answers: 1
    Meehange, when you mentioned you tried Indexing some of the columns, what where they and what order were they in? Composite Indexes only match equality predicates until they get to a Range predicate. If you Indexed with TYPEID, STATUS, ID then DB2 would match on 1 column (TYPEID) and do an index scan on STATUS. It wouldn't match on ID (but it may still use it). Instead, an Index of TYPEID, ID, STATUS may work better.
    ====
    After looking at the first post a little closer, I see you are matching on TYPEID = 1 which is 360,998 out of 410,00 rows. Even with STATUS, you are still accessing 360,770 out of 410,000 rows. With those types of numbers, you will get a Tablespace scan pretty much all the time.

    The clustering index should help, but I think the index should be on ID (the joined column), TYPEID and STATUS.
    Last edited by Stealth_DBA; 09-30-11 at 01:05.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •