If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Clustering indexes

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 09-22-05, 17:14
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Clustering indexes

What would be a good choice for a clustering index in the following setup:

Code:
Table OFFICE

Column		Comment
==============================================
OFFICE_ID	PK, populated by GENERATE_UNIQUE()
FIRM_ID		FK refferring to the FIRM table
BIZ_STATE	OPEN, CLOSED, etc.
...		Some other columns

Table ADDRESS 

Column		Comment
==============================================
ADDRESS_ID	PK, populated by GENERATE_UNIQUE()
OFFICE_ID	FK referring to OFFICE
BIZ_STATE	ACTIVE, HISTORICAL, etc.
CITY
STREET
...		Some other columns
There can few entires in ADDRESS for each OFFICE_ID (e.g. if an office moves to another address) but only one will be ACTIVE at a time.

Both tables have less than 100K rows and are growing at a rate of 10-15% a year.

The following indexes are defined on ADDRESS by my predecessors:

ADDRESS_ID - primary key (clustering index)
OFFICE_ID, STATE
CITY, STREET - for searching

Typical queries, in the order of decreasing frequency:
- show all information for an office, including its current address
- find an office by its city/street/...
- find all OPEN (CLOSED, etc) offices for a firm in a city

Declaring the primary key in ADDRESS as a clustering index doesn't make any sense to me. Since it's an artificial value there are no queries that would fetch multiple addresses by the primary key. I don't think that table is EVER accessed by PK. The only thing that the clustering index does is that it clumps the most recently updated addresses, regardless of their OFFICE_IDs, at the end of the table; however, we never search for "addresses updated within a specific date range" so this isn't of any use either.

What would have been the reason for declaring the primary key as a clustering index in this case? May be I'm just not seeing it?

What do you think would be a better clustering key? The one on CITY, STREET? May be a new one on STATE only? Given the table sizes, is it going to make any difference at all?

Thanks

Nick

PS. It's DB2 v8.1.5 on AIX
Reply With Quote
  #2 (permalink)  
Old 09-22-05, 22:27
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
1. - show all information for an office, including its current address

If the OFFICE_ID is supplied for the above query, then I assume it should be unique. I would probably cluster OFFICE by FIRM, assuming that there are multiple offices for a firm.

2. - find an office by its city/street/...
- find all OPEN (CLOSED, etc) offices for a firm in a city

Both of these suggest that ADDRESS should be clustered by a CITY, STREET index.

Unfortunately, I don't know what you mean by STATE. Is that state like California or STATE like ACTIVE. In neither case would you not want a single column index for STATE becasue the cardinality is too low. If STATE is like California and it is always used in a query with the city, the clustering index for ADDRESS should probably be STATE, CITY, STREET. If sometimes the STATE is not supplied in the query, then leave it off the index.

No table should be clustered by a single column PK that is populated by GENERATE_UNIQUE() or any other similar non-meaningful number. The reason it was defined as the clustering index is probably that the previous DBA was incompetent. The only exception would be DB2 for z/OS and where there was no need for a clustering index, but DB2 for z/OS uses the first index created by default as the clustering index (this is not the case for DB2 for Linux, UNIX, and Windows).
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #3 (permalink)  
Old 09-23-05, 00:27
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by Marcus_A

Unfortunately, I don't know what you mean by STATE. Is that state like California or STATE like ACTIVE. In neither case would you not want a single column index for STATE becasue the cardinality is too low.
It's STATE like ACTIVE, and we're typically looking for ACTIVE, ignoring other states. My thinking was that if they were all stored together physically we'd minimize IO by not having to sort through other states. May be it should be included with the city/street columns.

Quote:
No table should be clustered by a single column PK that is populated by GENERATE_UNIQUE() or any other similar non-meaningful number.
Well I guess there may be cases where it would make some sense but not likely in our setup. Thanks for confirming my suspicions.
Reply With Quote
  #4 (permalink)  
Old 09-23-05, 01:56
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
I would not bother separating the data by STATE unless the active ones are less than 10% of the table.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On