Results 1 to 5 of 5
  1. #1
    Join Date
    May 2013
    Posts
    8

    Unanswered: How to cluster an why

    Consider the following table with columns and potential column cardinality:

    Address

    Country (card - 10)
    State (card - 300)
    City (card - 1000)
    Street (card - 100,000)
    House number (card - 10,000)
    House description

    The logical, unique composite key is Country thru House number. Allow me some liberty with the definition of these columns.

    State is meaningless without Country
    City is meaningless without State
    Street is meaningless without City
    House number is meaningless without Street

    In other words, any query that qualifies on a particular key column, in order to make sense from a business perspective, requires qualification on the column above it.

    Without knowing the specifics for embedded/static queries, what should the clustering index look like? - what column sequence? Why?

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    You might want to cluster such a table "if" there was a high likelihood that a SQL query would want to retrieve multiple rows of addresses that were located near each other on a map. If the rows were clustered, then you would reduce the number of pages that DB2 needs to get to satisfy the query (DB2 does logical and physical I/O at the page level).

    One example would be if you had a travel website and wanted to show a list of hotels that were located near a certain location.

    However, if your application almost always retrieves only one specific address at at time, then having a clustering index may not be beneficial and will incur extra overhead during inserts and reorgs.

    Although it is true that "a query that qualifies on a particular key column, in order to make sense from a business perspective, requires qualification on the column above it," when talking about unique keys used in business rule logic, a country code and zip code/postal code might suffice for clustering purposes (non unique clustering index). But I don't see zip code/postal code as one of the columns. Of course this depends on the granularity of addresses and whether your database had every single address in it (and if so, then zip code might be too broad for clustering).

    So the bottom line is that a clustering index may not be same as a business logical key (unique key), but still may be useful. In other cases, a unique key (usually PK) might be the best choice for a clustering index (assuming there are multiple columns in the PK). It is highly unlikely (in a good database design) that a PK with a single column would be a clustering index.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    May 2013
    Posts
    8
    Marcus,
    Thank your for the reply.

    What I am getting at is that often times you hear that it is a good idea to cluster on an index that has a high firstkey card. I dont necessarily agree with this especially if there will be only 1 index (the clustering index) to start with. I believe the argument for using a key with high cardinality first is that there may be 1 or 2 more non-leaf pages to be retrieved than would be retrieved if a low card column were first in the index. So, in a case like this, my belief is that a primary/clustering index in a sequence that matches the logical hierarchy is the correct design - even if Country has only a card of 10.

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by mikerexx View Post
    Marcus,
    Thank your for the reply.

    What I am getting at is that often times you hear that it is a good idea to cluster on an index that has a high firstkey card. I dont necessarily agree with this especially if there will be only 1 index (the clustering index) to start with. I believe the argument for using a key with high cardinality first is that there may be 1 or 2 more non-leaf pages to be retrieved than would be retrieved if a low card column were first in the index. So, in a case like this, my belief is that a primary/clustering index in a sequence that matches the logical hierarchy is the correct design - even if Country has only a card of 10.
    I don't know who told you that a clustering index should always have a high firstkey card, since that is obviously not true. The purpose of designating an index as clustering is to group rows close together on a data page that would likely be retrieved together in order to reduce the number of get pages. For that purpose, firstkey card (in a multi-column index) is irrelevant.

    For DB2 LUW, designation of a clustering index is optional. For DB2 z/OS if no clustering index is designated, the first index created is the clustering index (unless DB2 z/OS has changed in recent releases, since I admit I am a bit out of date on that platform).
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  5. #5
    Join Date
    May 2013
    Posts
    8
    Sorry for not specifying - my platform is Z/OS.

    I agree with what you are saying. I just wanted another perspective. Thanks.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •