Results 1 to 8 of 8
  1. #1
    Join Date
    Jan 2004
    Posts
    71

    Red face Unanswered: head column uniqueness on composite index?

    Hello all. I read an interesting article on an IBM website about evaluating the uniqueness of indexed columns. One part that was not clear to me was how a composite indexes columns should be ordered on creation and how that effects the uniqueness measurement. For example, I have a table called tblRecords that has 93416076 rows.

    Amongst other indexexs, I have one index in particular thats got me questioning its formation based on IBMs instructions on how to measure uniqueness. The index looks like this:

    create index "informix".cl_tapekey_v2 on "informix".tblRecords
    (subclient_key,claim_no,claim_line_key) using btree ;

    Now in order to measure an indexes uniqueness, I use the following SQL:
    select tabname, idxname, nrows, nunique
    from systables t, sysindexes i
    where t.tabid = i.tabid
    and t.tabid >99
    and nrows >0
    and nunique >0
    and tabname = 'tblRecords'
    and idxname = 'tblRecords_pk_v2'

    And my results look like this:
    tabname tblRecords
    idxname tblRecords_pk_v2
    nrows 92177086
    nunique 90

    And again, per the IBM aritcle to get the percentage of uniqueness I use <percentage of uniqueness> = ((nunique/nrows)*100) which yields a pretty low percentage, something like 9.7nnnnnnnnnnn!

    I think this is because the number of unique values in my "head column" (first column in index creation statement) of the composite index is a lowly 90. But, I am not sure I am interpretting this correctly or if the "head column" is actually the most important part of the composite index? If the head column is the heavy hitter, should the this always be a column with the most unique data?

  2. #2
    Join Date
    Jan 2004
    Posts
    71
    thanks in advance!
    -wc

  3. #3
    Join Date
    Mar 2004
    Location
    Netherlands
    Posts
    183
    Hi,

    What they are trybing to tell in this story, is that you should avoid highly duplicate indexes. And with thgis query you can find those indexes.

    First off all why avoid highly duplicate indexes, for this there aretwo reasons:
    1. If you still find, let's say, 30% of the rows with an index scan. Then the
    filtering is pretty low and still a lot rows have to be scanned.
    2. IDS is optimized to work with small index entries. Deleting a complete entry
    (= 1 key value) can be done by the b-tree cleaner/scanner in the
    background. But deleting a row from a list of rows with this key, must be
    done on the foreground. All rowids after the deleted row must be shifted
    one position. This is really bad.

    Especially to prevent the second problem IBM recommend to avoid highly duplicate indexes. You can do this by adding an extra field to the index (at the end) which makes the key nore unique. This can be done for exapmple by adding the primary key. This looks ridiculous at first sight, but it achieves two goals:
    1. (More) unique keys. So easier/faster deleting of rows (and in the
    background)
    2. Same search result and performance


    Remember the nunique is not for the head column but for the whole index.

    Hope this helps a bit,

    Rob Prop

  4. #4
    Join Date
    Jan 2004
    Posts
    71

    ..

    "Remember the nunique is not for the head column but for the whole index."

    I did not know that, thanks! So by looking at the figures I had in my post what do you think about the index I was analyzing? I think I have become totally lost in it and other opinions might help.

  5. #5
    Join Date
    Mar 2004
    Location
    Netherlands
    Posts
    183
    Hi,

    I hope I don't offend you but the index looks terrible.

    Some explanation:
    - There are over 92 million rows
    - There are only 90 unique values

    Which means over a million values for every key. Looking at the fields it looks prettty impossible. Do you agree?

    Can you run an update statistics low for this table and re-select the info, because I wonder if the given data is correct.

    If it is correct, i think you should add extra fields to the index to make it more unique.

    Hope this helps,


    Rob Prop

  6. #6
    Join Date
    Jan 2004
    Posts
    71

    hooray...I'm not crazy!

    Your not offending me, thats exactly what I thought when I looked at it but the number seemed so OUTRAGEOUS that I automatically assumed I was misinterpreting!

    This is an index I inherited from previous "administrations" and so I dont feel so bad that it sucks. I will clean it up and be a heroin!

    Thanks for the input.
    -wc

  7. #7
    Join Date
    Nov 2003
    Location
    Mumbai, India
    Posts
    92
    Hi RobP and WiccaChic,

    I would differ and disagree on RobP's statement:
    "Remember the nunique is not for the head column but for the whole index."

    I would like to point-out that nunique values are separate for each indices. That means if a table has 3 indices it would have 3 different nunique values. But the nunique value is referred and holds valid for the first column of the index *only* and not to all the columns that makeup a composite index.

    Update statistics statement updates nunique column value and optimizer uses this value for internal purpose of getting the best out of a table & run a query as efficiently as possible.

    Regards,
    Shriyan

  8. #8
    Join Date
    Mar 2004
    Location
    Netherlands
    Posts
    183

    Cool

    Hi there,

    Gladly Shriyan corrected me, the nunique indeed only holds the number of unique values that heads the index (1st column). I'm terribly sorry for this mistake.

    Must make sure I'm awake next time I'll try to answer some questions,

    Rob Prop

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •