Results 1 to 6 of 6
  1. #1
    Join Date
    Jul 2002
    Location
    ATL
    Posts
    170

    Unanswered: Partition Key Design

    What are the procedures to be followed when you design a partition key in EEE.
    Is there any document for the design procedures.


    Thanks
    Prasad

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    The "Administration Guide: Planning" has a discussion of this topic.

    It is a good idea to force related data that is frequently joined to be on the same partition.

    For example, on an order table, the primary key would be order_mumber. This would make a good partitioning key for this table.

    The order_item table has a primary key of (order_number, item_number). But the partitioning key of this table should be order_number (not the full primary key). This forces the related rows of the order table and the order_item table to be on the same partition, which allows them to be joined efficiently (without sending data to another partition to be joined).

    The fact that the primary key is not always the best partitioning key (although sometimes it is) is the hardest concept for people to understand, and by far the biggest mistake made in parallel database implementation.

  3. #3
    Join Date
    Jul 2002
    Location
    ATL
    Posts
    170

    Partition Key

    Thank You Marcus.
    Administration Guide really does not provide much details in terms of picking up a Partition key.
    Iam also looking at the performance issues when picking up a partion key.
    I was wondering what situations you might pick up primary key as partition key and in what situations you will pick up non primary key and what situations you will pick up multiple columns for a partion key and what are its advantages and disadvantages.
    Does the partition key design be different in OLAP and OLTP environments


    Thanks
    Prasad

  4. #4
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650

    Re: Partition Key

    As Marcus has explained, the primary consideration when choosing a partioning key should be to avoid/minimize data shipping ...

    There are no hard and fast rules for choosing the partitioning key ..

    For example, if for two large tables , say 90%, of theie workload join on 3 columns , make these three columns as the partitioning key , irrespective of what the primary key column(s) is ....

    From my point of view, OLTP or OLAP, the considerations remain the same ... Ask yourself which queries are critical,important and design your partitioning key based on this ..


    HTH

    Sathyaram

    Originally posted by koganti
    Thank You Marcus.
    Administration Guide really does not provide much details in terms of picking up a Partition key.
    Iam also looking at the performance issues when picking up a partion key.
    I was wondering what situations you might pick up primary key as partition key and in what situations you will pick up non primary key and what situations you will pick up multiple columns for a partion key and what are its advantages and disadvantages.
    Does the partition key design be different in OLAP and OLTP environments


    Thanks
    Prasad
    Visit the new-look IDUG Website , register to gain access to the excellent content.

  5. #5
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    If you have an OLAP environment, you should consider using DB2 ESE V8 with the Partitioning Option to have the option of using MDC (Multi Dimensional Clustering) that is available in version 8. Since you mentioned EEE in your original post, I was assuming you were using DB2 EEE version 7.

    As mentioned in the “DB2 Administration Guide: Planning” (Version 7) page 104, one of the main factors is to use a column with high cardinality that will distribute the data evenly across the partitions when that column is hashed. As previously mentioned, getting frequently joined tables on the same partition (for related rows) is also important. These are the two main considerations.

    But to answer your question requires looking at the entire database design in total and not just any one table. But if you happen to define a primary key through declared RI, the partitioning index must be a column that is contained in the primary (but not necessarily the whole primary key).

    If I had to pick one rule of thumb, without seeing the database design, it would be to use one column of the primary key (whether or not RI is explicitly defined), and that column would be the one with the most unique values (usually the first column if the primary key has multiple columns).

    Keep in mind that other indexes can be created on the table. Also there are decisions about clustering (or MDC with DB2 version 8) that need to be decided.

  6. #6
    Join Date
    Jul 2002
    Location
    ATL
    Posts
    170
    Thanks A Lot for you all
    That was valuble information.
    Iam trying to understand how the data be distributed across the partions if the partition key has multiple columns.


    Thanks
    Prasad

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •