| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

09-29-11, 20:28
|
|
Registered User
|
|
Join Date: Jul 2004
Posts: 256
|
|
|
Indexes - low card skewed dist
|
|
9.1 on Win
Hey guys,
I've got a fairly awful query (that typically I have virtually no power to change) the worst part of the Access Plan shows a tablescan on a 410k row table which plugs into a join.
The predicates are TYPEID = 1 and STATUS <> 4
The distro/card for these are as follows
STATUS
------------ -----------
1 457
2 89
3 440
4 229
5 401530
7 23
8 7277
TYPEID count
------------- -----------
1 360998
2 33609
3 14065
4 5
5 1368
The Card of their combination is:
-----------
360770
I know low card indexes are not desireable but I'm wondering if the fact the data is so skewed can be used to advantage somehow (even though ~80% of the rows fulfil the criteria).
I've tried a few different indexing options and runstats with dist and sampling etc. to no avail... I can't avoid the table scan...
Any tips?
|
|

09-29-11, 21:06
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,193
|
|
Quote:
|
... a 410k row table which plugs into a join.
|
So, there must be column(s) of the table other than status or typeid in ON condition to join with other table(s).
Try an index including the join column(s), typeid, and status in this sequence.
|
|

09-29-11, 21:16
|
|
Registered User
|
|
Join Date: May 2003
Location: USA
Posts: 5,196
|
|
|
|
Unless the data in the table is in physical sequence by TYPEID, STATUS (or at least TYPEID by itself, or STATUS by itself), then that means it is very likely that there will be one qualifying row in every single table page. Remember that DB2 does I/O at the page level, not the row level, so if DB2 determines that every page will have at least one qualifying row, it is usually faster for DB2 to do a table scan.
To get the table in a particular physical sequence, you could define a clustering index that matches the sequence you want. DB2 "tries" to maintain the sequence when inserting if there is room on the correct page or a nearby page, but the sequence is only guaranteed after a reorg if you specify the clustering index in the reorg command (although if you do an inplace reorg, you don't have to specify the clustering index in the reorg command). But if you have a clustering index defined, it can put extra overhead on the system for inserts, and the physical order is usually not exactly maintained if you have a lot of inserts.
Another option to avoid a table scan would be if you had one index with all the columns that are needed by the query, then DB2 could possibly do an complete index scan (sort of like a table scan, but of the index) not using the b-tree, and that might be slightly faster than a table scan. This will be indicated with index-only access in the explain.
Table scans are not always bad, and can be the fastest access for a particular situation, which may be why DB2 has chosen it.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
|
|

09-29-11, 21:49
|
|
Registered User
|
|
Join Date: Jul 2004
Posts: 256
|
|
@Tonkuma - Yea there's another ID field used for the join that has a much better cardinality but including that in the index doesn't help. I think Marcus is right about there being rows on all the pages... I was thinking about a clustering index ... need to figure out how often this table gets written to...
The main reason that it's such a bad query is that the final predicate is a wildcard comparison agains about 20 VARCHAR fields concatenated together... awful stuff.
|
|

09-29-11, 21:52
|
|
Registered User
|
|
Join Date: Jul 2004
Posts: 256
|
|
Quote:
Originally Posted by Marcus_A
Table scans are not always bad, and can be the fastest access for a particular situation, which may be why DB2 has chosen it.
|
For sure, I trust the optimiser
This query is the main search people use and it takes several seconds to run each time... dropping that down would have a big psychological impact with the users... but if it can't be done...
I'll try a cluster but I reckon it'll only shave a fraction off it... we'll see
|
|

09-29-11, 23:58
|
|
Registered User
|
|
Join Date: May 2009
Posts: 472
|
|
Meehange, when you mentioned you tried Indexing some of the columns, what where they and what order were they in? Composite Indexes only match equality predicates until they get to a Range predicate. If you Indexed with TYPEID, STATUS, ID then DB2 would match on 1 column (TYPEID) and do an index scan on STATUS. It wouldn't match on ID (but it may still use it). Instead, an Index of TYPEID, ID, STATUS may work better.
====
After looking at the first post a little closer, I see you are matching on TYPEID = 1 which is 360,998 out of 410,00 rows. Even with STATUS, you are still accessing 360,770 out of 410,000 rows. With those types of numbers, you will get a Tablespace scan pretty much all the time.
The clustering index should help, but I think the index should be on ID (the joined column), TYPEID and STATUS.
|
Last edited by Stealth_DBA; 09-30-11 at 00:05.
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|