Results 1 to 4 of 4
  1. #1
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1

    Unanswered: Clustering indexes

    What would be a good choice for a clustering index in the following setup:

    Code:
    Table OFFICE
    
    Column		Comment
    ==============================================
    OFFICE_ID	PK, populated by GENERATE_UNIQUE()
    FIRM_ID		FK refferring to the FIRM table
    BIZ_STATE	OPEN, CLOSED, etc.
    ...		Some other columns
    
    Table ADDRESS 
    
    Column		Comment
    ==============================================
    ADDRESS_ID	PK, populated by GENERATE_UNIQUE()
    OFFICE_ID	FK referring to OFFICE
    BIZ_STATE	ACTIVE, HISTORICAL, etc.
    CITY
    STREET
    ...		Some other columns
    There can few entires in ADDRESS for each OFFICE_ID (e.g. if an office moves to another address) but only one will be ACTIVE at a time.

    Both tables have less than 100K rows and are growing at a rate of 10-15% a year.

    The following indexes are defined on ADDRESS by my predecessors:

    ADDRESS_ID - primary key (clustering index)
    OFFICE_ID, STATE
    CITY, STREET - for searching

    Typical queries, in the order of decreasing frequency:
    - show all information for an office, including its current address
    - find an office by its city/street/...
    - find all OPEN (CLOSED, etc) offices for a firm in a city

    Declaring the primary key in ADDRESS as a clustering index doesn't make any sense to me. Since it's an artificial value there are no queries that would fetch multiple addresses by the primary key. I don't think that table is EVER accessed by PK. The only thing that the clustering index does is that it clumps the most recently updated addresses, regardless of their OFFICE_IDs, at the end of the table; however, we never search for "addresses updated within a specific date range" so this isn't of any use either.

    What would have been the reason for declaring the primary key as a clustering index in this case? May be I'm just not seeing it?

    What do you think would be a better clustering key? The one on CITY, STREET? May be a new one on STATE only? Given the table sizes, is it going to make any difference at all?

    Thanks

    Nick

    PS. It's DB2 v8.1.5 on AIX

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    1. - show all information for an office, including its current address

    If the OFFICE_ID is supplied for the above query, then I assume it should be unique. I would probably cluster OFFICE by FIRM, assuming that there are multiple offices for a firm.

    2. - find an office by its city/street/...
    - find all OPEN (CLOSED, etc) offices for a firm in a city

    Both of these suggest that ADDRESS should be clustered by a CITY, STREET index.

    Unfortunately, I don't know what you mean by STATE. Is that state like California or STATE like ACTIVE. In neither case would you not want a single column index for STATE becasue the cardinality is too low. If STATE is like California and it is always used in a query with the city, the clustering index for ADDRESS should probably be STATE, CITY, STREET. If sometimes the STATE is not supplied in the query, then leave it off the index.

    No table should be clustered by a single column PK that is populated by GENERATE_UNIQUE() or any other similar non-meaningful number. The reason it was defined as the clustering index is probably that the previous DBA was incompetent. The only exception would be DB2 for z/OS and where there was no need for a clustering index, but DB2 for z/OS uses the first index created by default as the clustering index (this is not the case for DB2 for Linux, UNIX, and Windows).
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Marcus_A

    Unfortunately, I don't know what you mean by STATE. Is that state like California or STATE like ACTIVE. In neither case would you not want a single column index for STATE becasue the cardinality is too low.
    It's STATE like ACTIVE, and we're typically looking for ACTIVE, ignoring other states. My thinking was that if they were all stored together physically we'd minimize IO by not having to sort through other states. May be it should be included with the city/street columns.

    No table should be clustered by a single column PK that is populated by GENERATE_UNIQUE() or any other similar non-meaningful number.
    Well I guess there may be cases where it would make some sense but not likely in our setup. Thanks for confirming my suspicions.

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    I would not bother separating the data by STATE unless the active ones are less than 10% of the table.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •