Results 1 to 14 of 14
  1. #1
    Join Date
    Jan 2003
    Posts
    74

    Unanswered: About cluster index

    Hello everyone!

    I know that the DB2 will automatically create a unique index for the columns that I use primary key statement on when I create a new table. But I want DB2 to automatically create a CLUSTER index instead of a unique index when I use primary key statement on creating a table. Who can tell me how I can do that?

  2. #2
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650

    Re: About cluster index

    You will have to manually create the clustered unique index and then alter the table to have a primary key ...

    Cheers

    Sathyaram

    Originally posted by zhouhaiming
    Hello everyone!

    I know that the DB2 will automatically create a unique index for the columns that I use primary key statement on when I create a new table. But I want DB2 to automatically create a CLUSTER index instead of a unique index when I use primary key statement on creating a table. Who can tell me how I can do that?

  3. #3
    Join Date
    Jun 2002
    Location
    UK
    Posts
    525
    Isn't it good practice to always create the index for a primary key prior to assigning the key? At least this way you can have a sensible reference to your primary index rather than some irrelevant, internally generated value.

  4. #4
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650
    I agree, Damian ...



    Originally posted by Damian Ibbotson
    Isn't it good practice to always create the index for a primary key prior to assigning the key? At least this way you can have a sensible reference to your primary index rather than some irrelevant, internally generated value.

  5. #5
    Join Date
    Sep 2003
    Posts
    85
    To sathyaram_s:
    You will have to manually create the clustered unique index and then alter the table to have a primary key ...

    Is this really so. If I've initially defined e PK but then discover the issue about the requirement to issue a CREATE UNIQUE INDEX xxx ON table (column(s)) CLUSTER, then can I really issue afterwards ALTER TABLE table ADD CONSTRAINT constraint-name PRIMARY KEY (same above columns)? Won't that try to create an index which I assume we have already done via the CREATE UNIQUE INDEX ... CLUSTER?
    Ruby

  6. #6
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    If a unique index already exists on the primary key columns, DB2 will not try to create another index when defining the primary key with alter.

  7. #7
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    "I want DB2 to automatically create a CLUSTER index instead of a unique index when I use primary key statement on creating a table."

    Just remember that CLUSTER and UNIQUE are not mutually exclusive. An index can be both clustered and unique, although only one index for a table can be defined as the clustering index (since it determines the order of the table rows).

    Care should be taken in choosing the clustering index, and one should not always choose the primary key. Here are some very rough guidelines:

    Choosing a clustering index
    Generally speaking a primary key which consists of a single unique number should not be the clustering index. For example, on the sample employee table, employee_id should not be the clustering index. Either last_name, or dept_no would be better choices.

    However, on tables where the primary key consists of multiple columns, then the primary key is often a good candidate for clustering. For example, on the order-item table (primary key is order_no, item_no), then the primary key would be a good candidate for the clustering index to keep the rows of the table in that order during INSERTs or after a REORG.

  8. #8
    Join Date
    Mar 2003
    Posts
    343
    It is a good idea to understand what purpose the clustering index serves. What you are attempting to do is get more pages into the bufferpool, each having more rows that serves queries. Say, the pagesize is 8k and there are 40 rows per page. If the data is well clustered and 200 rows serve the query, as a best case, 5 pages should serve this purpose. If each row that serves the query is on a different page, then the bufferpool is flooded with 200 8k pages and depending on the number of concurrent queries running, that may be an issue. By clustering, you may reduce the number of pages which are needed to maybe 10-20 pages which is much better than 200. Thus there is more space in the BP to serve other queries, also the BP isn;'t flushed of other data which could result in more logical reads and hence better BP hit ratios.

    How good the clustering will be will be will depend on the PCTFREE of the table and also data purging - if data is only added, then the PCTFREE on each page will eventually fill up and the clusterratio will go down. Then you must reorg to get the clustering back in.

    I have heard that even dates are not good clustering indexes because
    a] they are unending and
    b] they do not empty out with purging

    In Marcus' eg, I would choose dept_no because employees will be spread across departments - but I guess you also need to know how the data is queried.

    Hope this helps.

  9. #9
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    "In Marcus' eg, I would choose dept_no because employees will be spread across departments - but I guess you also need to know how the data is queried."

    At most places I have worked, the employee data is queried both ways, by department or by last name. Managers and admin people use the department-employee browse often, but everyone uses the last name search (just supplying the first few characters) to look up phone numbers. So it is not unusual to have to decide which is most important or more frequently used.

  10. #10
    Join Date
    Mar 2003
    Posts
    343
    To me that would make employee_name a good candidate for a non-unique index - but again, I have not done much work on that side so I kind of extrapolated from cust_nm and region. I completely agree with Marcus - one needs to know the business and the different ways people query the data. However, keep in mind that every purpose cannot be served - if there's a huge vol of data, I tend to try and serve 60-70% usage. I am interested in what other people do. Since the volumes I deal with are enormous (a small table ~100 GB) frequent reorgs are not an option since the outage turns out to be too large.

  11. #11
    Join Date
    Jan 2003
    Posts
    1,605
    Hi,

    1. Are there any recommendations for parameters "Percentage of free space to be left on index pages (PCTFREE)" and "Percentage on minimum amount of used space to be left on index pages (MINPCTUSED)"?
    2. Is there any formule to estimate this two values?
    3. What is the difference with above two parameters?

    Thanks,
    Grofaty
    Last edited by grofaty; 09-29-03 at 07:36.

  12. #12
    Join Date
    Jan 2003
    Posts
    1,605
    Hi,

    I have one more question:
    4. I have three indexes on table. How can I found out with one is clustered? Is there any command?

    I tried to use:
    DESCRIBE INDEXES FOR TABLE tabname
    but this does not help.

    I have also try to see the sysibm.sysindexes table but I can't find the required information.
    I tried to use:
    DESCRIBE TABLE SYSIBM.SYSINDEXES.

    My system:
    db2 v7.2 fixpack 5 on Windows 2000

    Thanks,
    Grofaty

  13. #13
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    1. Percent Free - This should be determined by examining the insert, update, and delete activity on the indexes along with the REORG frequency. Index rows are always kept in sequence of the index column(s). Room needs to be reserved on each index page to allow for new entries. If an index page is completely full and DB2 needs to add another entry on that page, DB2 will perform an index page split by creating a new page and putting half the entries from the old page on the new one. Page splits are not good for performance and they create half empty pages.

    But another consideration is whether the index entries will be added at the end of the index or added randomly within the range of existing values. This will not always be the same for each index on the table. If the index entries are almost always added at the end of the range of index values, then little or no freespace needs to be defined.

    MINPCTUSED is used for requesting online reorgs of indexes. Here is the quote from the SQL Guide:

    “Indicates whether indexes are reorganized online and the threshold for the minimum percentage of space used on an index leaf page If after a key is deleted from an index leaf page, the percentage of space used on the page is at or below integer percentage, an attempt is made to merge the remaining keys on this page with those of a neighboring page. If there is sufficient space on one of these pages, the merge is performed and one of the pages is deleted. The value of integer can be from 0 to 99. However, a value of 50 or below is recommended for performance reasons.”

    2. Here is an example for calculating percent free for an index. If a table increases size by inserting 10% of the existing number of rows each week (and 10% more index entries are required in the index), and your REORG frequency is weekly, then 10% freespace is required to avoid index page splits to make room for new index entries (lets ignore updates and deletes for simplicity). I practical terms, it is best to allow a margin of error and use 12-15% freespace for this example. But if the REORG were only done monthly, then about 45% of freespace would be required.

    Specify MINPCTUSED if you want to use online reorgs. A value of 50 or below is recommended for performance reasons.

    3. I am not sure about your question. But I don’t think the two parameters are directly related except that if you set the percent free properly and can do off-line reorgs at a reasonable interval, then you probably don't need on-line index reorgs. If you are getting a lot of index page splits (or lots of deletes), then on-line index reorgs will clean these up, but there will be a lot of DB2 activity going on to do this.

    4. There is a column on syscat.indexes called INDEXTYPE CHAR(4) that has the type of index. ‘CLUS’ = Clustering, ‘REG ’ = Regular

  14. #14
    Join Date
    Jan 2003
    Posts
    1,605
    Hi,

    Marcus_A, thanks a lot. It helps me very much.

    4. The INDEXTYPE column also exist in syibm.sysindexes. I have just overlooked it.

    Grofaty
    Last edited by grofaty; 09-29-03 at 07:48.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •