What would be a good choice for a clustering index in the following setup:
Code:
Table OFFICE
Column Comment
==============================================
OFFICE_ID PK, populated by GENERATE_UNIQUE()
FIRM_ID FK refferring to the FIRM table
BIZ_STATE OPEN, CLOSED, etc.
... Some other columns
Table ADDRESS
Column Comment
==============================================
ADDRESS_ID PK, populated by GENERATE_UNIQUE()
OFFICE_ID FK referring to OFFICE
BIZ_STATE ACTIVE, HISTORICAL, etc.
CITY
STREET
... Some other columns
There can few entires in ADDRESS for each OFFICE_ID (e.g. if an office moves to another address) but only one will be ACTIVE at a time.
Both tables have less than 100K rows and are growing at a rate of 10-15% a year.
The following indexes are defined on ADDRESS by my predecessors:
ADDRESS_ID - primary key (clustering index)
OFFICE_ID, STATE
CITY, STREET - for searching
Typical queries, in the order of decreasing frequency:
- show all information for an office, including its current address
- find an office by its city/street/...
- find all OPEN (CLOSED, etc) offices for a firm in a city
Declaring the primary key in ADDRESS as a clustering index doesn't make any sense to me. Since it's an artificial value there are no queries that would fetch multiple addresses by the primary key. I don't think that table is EVER accessed by PK. The only thing that the clustering index does is that it clumps the most recently updated addresses, regardless of their OFFICE_IDs, at the end of the table; however, we never search for "addresses updated within a specific date range" so this isn't of any use either.
What would have been the reason for declaring the primary key as a clustering index in this case? May be I'm just not seeing it?
What do you think would be a better clustering key? The one on CITY, STREET? May be a new one on STATE only? Given the table sizes, is it going to make any difference at all?
Thanks
Nick
PS. It's DB2 v8.1.5 on AIX