Results 1 to 5 of 5
  1. #1
    Join Date
    Feb 2005
    Posts
    3

    Unanswered: CBO estimates Bitmap Index retrieval at 100 times Btree cost.

    The actual problem we have is that the bitmap indexes we have defined are not being used
    when they should be. I have investigated further and found the reason to be that the CBO
    is saying that the 'index rowid table access' for a bitmap is estimated at (typically) 100
    times the cost of retrieval for a btree. This is clearly rediculous and running the query
    proves this as the bitmap access is roughly the same actual cost as the btree.

    I want to point out various possibilities that I have explored.

    1. Statistics have been collected. The example uses stats at table, index and column level
    to prove that they do not make a difference.
    2. The data is skewed which affects performance. My example deliberately chooses a uniform
    distribution of the search key.
    3. The clustering of the data affects performance making index retrieval ineffective. My
    example deliberately has perfect data clustering (all rows with the same search key are
    next to eachother).
    4. Multiblock read count means superior io for tablescans, so the CBO is deliberately
    avoiding use of indexes. The question is not tablescan v bitmap index, but btree v bitmap
    index. I have given our system statistics on multiblock reads only so the example can be
    reproduced.

    This code was run on VMS ORACLE 9.2.0.5 and should be reproducable on any oracle system
    that supports bitmap indexes.


    Part 1 Example Table and Index Creation
    =======================================

    A table is created with 500,000 rows. 2 identical columns are defined having values
    between 1 and 500. Values that are the same are clustered together in the file, making the
    column a good candidate for an index..


    SQL> set autotrace off
    SQL> CREATE TABLE t1
    2 NOLOGGING
    3 AS
    4 SELECT
    5 ROWNUM ID,
    6 TRUNC(ROWNUM/1000)+1 btree_col,
    7 TRUNC(ROWNUM/1000)+1 bitmap_col,
    8 RPAD('x',80) padding
    9 FROM all_objects ao1, all_objects ao2
    10 WHERE ROWNUM <= 500000;

    Table created.

    The table is analysed to ensure the cbo (cost based optimizer) comes into play and indexes
    are built on the identical columns, one btree the other bitmap.


    SQL> analyze table t1 compute statistics;

    Table analyzed.

    SQL> create index t1_btree on t1(btree_col);

    Index created.

    SQL> create bitmap index t1_bitmap on t1(bitmap_col);

    Index created.

    SQL> spool off

    Part 2 Index retrieval costed with Table stats only and Actual Costs of Retrieval
    ================================================== ===============================


    We now use the CBO to cost 2 very simple queries to retrieve values. The hints are
    used to ensure that the CBO is used, and the index is chosen, in case the cost of a
    tablescan read is very cheap in relation to an index read. On our system (see later) the
    cost of a tablescan block read is approximately 1/4 of an index driven block read.

    Through the Btree the estimated cost is 4.

    SQL> set autotrace traceonly explain
    SQL> SELECT /*+ choose index(t1_btree,t1) */
    2 MAX (ID)
    3 FROM t1
    4 WHERE btree_col = 1;

    Execution Plan
    ----------------------------------------------------------
    0 SELECT STATEMENT Optimizer=HINT: CHOOSE (Cost=4 Card=1 Bytes
    =7)

    1 0 SORT (AGGREGATE)
    2 1 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=4 Card=998 B
    ytes=6986)

    3 2 INDEX (RANGE SCAN) OF 'T1_BTREE' (NON-UNIQUE) (Cost=2
    Card=998)

    Through the Bitmap index the estimated cost is a staggering 320 80 TIMES THE BTREE ESTIMATE!


    SQL> SELECT /*+ choose index(t1_bitmap,t1) */
    2 MAX (ID)
    3 FROM t1
    4 WHERE bitmap_col = 1;

    Execution Plan
    ----------------------------------------------------------
    0 SELECT STATEMENT Optimizer=HINT: CHOOSE (Cost=320 Card=1 Byt
    es=7)

    1 0 SORT (AGGREGATE)
    2 1 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=320 Card=998
    Bytes=6986)

    3 2 BITMAP CONVERSION (TO ROWIDS)
    4 3 BITMAP INDEX (SINGLE VALUE) OF 'T1_BITMAP'

    Lets now look at the actual cost of these queries. Both run very quickly and have been
    run twice to show the effect of caching. The crucial figure that the CBO has been trying
    to estimate is the number of consistent gets.

    For the btree retrieval actual cost is 18 consistent gets.


    SQL> set autotrace on stat
    SQL> SELECT /*+ choose index(t1_btree,t1) */
    2 MAX (ID)
    3 FROM t1
    4 WHERE btree_col = 1;

    MAX(ID)
    ----------
    999


    Statistics
    ----------------------------------------------------------
    0 recursive calls
    0 db block gets
    18 consistent gets
    3 physical reads
    0 redo size
    307 bytes sent via SQL*Net to client
    426 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

    SQL> SELECT /*+ choose index(t1_btree,t1) */
    2 MAX (ID)
    3 FROM t1
    4 WHERE btree_col = 1;

    MAX(ID)
    ----------
    999


    Statistics
    ----------------------------------------------------------
    0 recursive calls
    0 db block gets
    18 consistent gets
    0 physical reads
    0 redo size
    307 bytes sent via SQL*Net to client
    426 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

    For the bitmap retrieval the actual cost is actually 16 consistent gets ONE TWENTIETH
    OF THE CBO ESTIMATE. (and actually less than the btree retrieval).

    SQL> SELECT /*+ choose index(t1_bitmap,t1) */
    2 MAX (ID)
    3 FROM t1
    4 WHERE bitmap_col = 1;

    MAX(ID)
    ----------
    999


    Statistics
    ----------------------------------------------------------
    0 recursive calls
    0 db block gets
    16 consistent gets
    1 physical reads
    0 redo size
    307 bytes sent via SQL*Net to client
    426 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

    SQL> SELECT /*+ choose index(t1_bitmap,t1) */
    2 MAX (ID)
    3 FROM t1
    4 WHERE bitmap_col = 1;

    MAX(ID)
    ----------
    999


    Statistics
    ----------------------------------------------------------
    0 recursive calls
    0 db block gets
    16 consistent gets
    0 physical reads
    0 redo size
    307 bytes sent via SQL*Net to client
    426 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    1 rows processed

    SQL> spool off
    Part 3 - The effect of Index Statistics on Cost Estimates
    ================================================== =======

    We Analyze both indexes


    SQL> analyze index t1_btree compute statistics
    2 ;

    Index analyzed.

    SQL> analyze index t1_bitmap compute statistics;

    Index analyzed.

    The bitmap index analysis has some statistics that are clearly unusual. It is saying that
    only one data block has to be read in order to retrieve all the values for a particular
    search key. This is clearly rediculous when you compare it with the sensible btree value.

    These stats would tend to make bitmap retrieval more favourable. In fact the CBO completely
    ignores stats generated on the bitmap index. I tried setting the values to the sensible ones
    for the btree using DBMS_STATS.set_index_stats, but that made no difference.



    SQL> SELECT ui.index_name, ui.avg_data_blocks_per_key, ui.num_rows
    2 FROM user_indexes ui
    3 WHERE ui.index_name LIKE 'T1%';

    INDEX_NAME AVG_DATA_BLOCKS_PER_KEY NUM_ROWS
    ------------------------------ ----------------------- ----------
    T1_BITMAP 1 501
    T1_BTREE 13 500000


    Here we can see the btree cost estimate has become closer to the true estimate thanks to
    the index stats.


    SQL> set autotrace traceonly explain
    SQL> /* Formatted on 2005/02/19 17:13 (Formatter Plus v4.8.5) */
    SQL> SELECT /*+ choose index(t1_btree,t1) */
    2 MAX (ID)
    3 FROM t1
    4 WHERE btree_col = 1;

    Execution Plan
    ----------------------------------------------------------
    0 SELECT STATEMENT Optimizer=HINT: CHOOSE (Cost=20 Card=1 Byte
    s=7)

    1 0 SORT (AGGREGATE)
    2 1 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=20 Card=998
    Bytes=6986)

    3 2 INDEX (RANGE SCAN) OF 'T1_BTREE' (NON-UNIQUE) (Cost=6
    Card=998)

    The estimated cost of Bitmap is UNAFFECTED BY INDEX STATS.


    SQL> SELECT /*+ choose index(t1_bitmap,t1) */
    2 MAX (ID)
    3 FROM t1
    4 WHERE bitmap_col = 1;

    Execution Plan
    ----------------------------------------------------------
    0 SELECT STATEMENT Optimizer=HINT: CHOOSE (Cost=320 Card=1 Byt
    es=7)

    1 0 SORT (AGGREGATE)
    2 1 TABLE ACCESS (BY INDEX ROWID) OF 'T1' (Cost=320 Card=998
    Bytes=6986)

    3 2 BITMAP CONVERSION (TO ROWIDS)
    4 3 BITMAP INDEX (SINGLE VALUE) OF 'T1_BITMAP'

    More stuff in attached file like parameter settings.

  2. #2
    Join Date
    Dec 2004
    Location
    vienna, at
    Posts
    27
    hi

    when you look at your create statement:

    Code:
    SQL> CREATE TABLE t1
    2 NOLOGGING
    3 AS
    4 SELECT
    5 ROWNUM ID,
    6 TRUNC(ROWNUM/1000)+1 btree_col,
    7 TRUNC(ROWNUM/1000)+1 bitmap_col,
    8 RPAD('x',80) padding
    9 FROM all_objects ao1, all_objects ao2
    10 WHERE ROWNUM <= 500000;
    the btree_col contains all values from 1 - 501!!
    so there are 500 bitmaps for 1 column
    you only should use a bitmap index where you have a low number
    of possible values!
    phh

  3. #3
    Join Date
    Feb 2005
    Posts
    3
    Thanks phh for your reply. Here's details of points raised so far. Gonna go down 10053 route. Will keep you posted...


    Replies to Points Raised so far
    ===============================

    1. Cost estimate is not measuring consistent gets

    Cost estimate is approximately number of logical reads (consistent gets in this case). There is
    a CPU element, but in this io based example it is negligible. The optimizer does not know the

    effect of caching so in effect assumes the worst case that each logical read results in a physical

    read.

    2. The two queries are not identical.

    The two queries are identical in all respects bar the point in question: one is using a btree and
    the other a bitmap.

    3. The hints are not the best I could use.

    I have made a syntax mistake in that the hints should be INDEX(t1 t1_btree) etc (got the order

    reversed). Sorry about that. However the CBO shows it has done what I intended it to in the
    output explaining its plan.

    The index_combine hint has been suggested as a better hint for bitmap indexes and certainly it is
    recommended by Oracle. This is much appreciated and I will use it in future. However I used the
    hint and got the same CBO cost on the bitmap retrieval.

    4. I should be using DBMS_STATS instead of ANALYZE.

    I have tried this to see if it makes any difference. DBMS_STATS calculates exactly the same stats
    as ANALYZE and there is no difference in the results. I used ANALYZE because more people are
    familiar with it. DBMS_STATS is recommended by Oracle because it is more efficient, as it performs
    the same tasks as ANALYZE but in parallel.

    5. The search key in my bitmap index has 501 distinct values and is not of low enough cardinality
    to be a bitmap index.

    The bitmap index works fine with 501 distinct values (it outperforms the btree in the query, even
    though the CBO estimates it at 80 times slower).

    The point is what is low cardinality? 2 values? 100 values? I would refer anyone interested in this

    point to Jonathan Lewis' excellent article on this matter.
    http://www.dbazine.com/jlewis3.shtml

    6. I should investigate using 10053 event.

    Good point! I shall be making this my next investigation.

  4. #4
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    This might be useful from metalink

    The bitmap index costing algorithms are changed recently. Use event 10170 to enable old costing method (which will be less) which assumes all the rows are available in the required number of blocks.

    New model assumes 80% rows are in the calculated blocks and rest of the 20% rows are split across all the blocks (which inflates the cost). Bitmap index may be used if you use the event 10170.

    Alan

  5. #5
    Join Date
    Feb 2005
    Posts
    3
    Thank you Alan for your reply the first which has made a change. Setting
    10170 on, approximately halved the cost of the CBO estimate to 166. Unfortunately this still
    means bitmap estimated cost is 40 to 50 times what it should be.

    It looks like what Oracle are trying to do is incorporate an idea of clustering
    factor into their bitmap estimate formula.

    I honestly do not know why after all this time Oracle are having such a problem with their CBO
    since the formulae are relatively simple and mathematically immutable.

    For both bitmap and btree

    Cost = Index Read Cost + Data Read Cost

    For the bitmap, used appropriately, the index read cost will be less than the btree.
    Now look at the data read cost.

    If we only have table statistics:
    ---------------------------------

    For both bitmap and btree, assuming we have ordered rowids within each index value, the average

    predicted data read cost will be between;

    lower limit = no of blocks in table / no of distinct values (assumes perfect clustering)

    and

    upper limit = no of blocks in table. (assumes worst case clustering)

    In the example I have provided the number of blocks is 6829. So an average indexed retrieval
    should be estimated at between 6829/501 = 13.6 and 6829 according to how the data is clustered.

    Note the estimates actually are 320 for bitmap and 4 for the btree. The bitmap is absurdly high
    and the btree is MATHEMATICALLY impossibly low. (There may be skewed data but we are still trying
    to find an average result, so the skewing will balance itself out on estimates made across all
    values).

    If we have index statistics:
    -----------------------------

    The index statistics actually count the average number of blocks per value by going through the
    index and looking at the rowids the values point to.

    Average Predicted Data Read cost = index stat of average number of blocks per value.(we now know
    the clustering!)

    This value will be somewhere between the lower and upper limit above.

    In the example index stats are 13 (reflects the perfect clustering I deliberately chose). But they
    are complete nonsense for the bitmap index (this needs to be fixed Oracle!).

    Note the new estimate for the btree data retrieval is 14 and the overall estimate 20 is very close
    to the actual cost of 18. The CBO simply ignores index stats for bitmaps.

    COME ON ORACLE GET YOUR ACT TOGETHER. LARRY SORT IT OUT! ITS NOT ROCKET SCIENCE.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •