Page 1 of 2 12 LastLast
Results 1 to 15 of 23
  1. #1
    Join Date
    Jan 2008
    Posts
    45

    Unanswered: composite index and btree physical

    Hi friends,


    Could some one tell me as to how the composite index is stored in a btree..

    Suppose there is an index on (employeeid,firstname,lastname); what will be the keys in each node of the index.I read somewhere that only employeeid will be stored in nonleaf nodes and the whole combination is stored in leaf nodes...Is that true??


    What is the significance of order of the columns in a composite index??

  2. #2
    Join Date
    Apr 2008
    Posts
    51
    I believe the entire key is stored in both leaf and non-leaf pages. As for order of columns, this is very important for performance. Can make a difference between matching and non-matching index scan. More specific predicates(equal, between, greater than, less than) should be targeted to columns on the front of the key, whereas less specific predicates (LIKE '%ABC%') should be more likely to occur toward the back of the key. A lot has been written on this topic. Search IBM documentation. Start with DB2 Admin - Performance.
    db2topgun.com

  3. #3
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    SuperKuper is correct. The entire key is stored in both leaf and non-leaf pages.

    The order of the columns depends on the number of predicates supplied in the SQL that match the index columns. If there are 3 columns in the index and the SQL supplies all 3 in the predicate, then it does not matter which column is first in the index.

    If you have a 3 column index and the SQL supplies columns 1 and 3 in the predicate, that is probably not as good as supplying columns 1 and 2 (although it depends on the cardinality of each of the columns).

    If you have a 3 column index and the SQL supplies columns 2 and 3 in the predicate, then DB2 cannot use the b-tree and will be forced to scan all the leaf pages (in a similar way that a table scan works).
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  4. #4
    Join Date
    Jan 2008
    Posts
    45
    Thanks for the info..So, if there is 3 column index on (empid,lastname,firstname) and the predicate refers to empid and firstname how does the scan happen??

  5. #5
    Join Date
    Apr 2008
    Posts
    51
    In this case it is matching index scan on one column (emplid) and non-matching thereafter. Will find all leaf pages corresponding to given emplid and scan them to match on the 3rd column of the index.
    db2topgun.com

  6. #6
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by db2rocks
    Thanks for the info..So, if there is 3 column index on (empid,lastname,firstname) and the predicate refers to empid and firstname how does the scan happen??
    Since empid is unique, it does not matter that firstname is non-matching.

    For the employee table, you will want the following indexes:

    1. empid - unique index (primary key)
    2. last name, first name
    3. deptno
    4. social security number - not in the sample table, but would exist in the real world for a US company. This will not be unique if you employ illegal aliens (some large companies have incredibly claimed that they did not know multiple employees were using the same social security number).

    I doubt that it would be a common situation where someone would search the employee table with only a first name and they did not know at least the first few characters of the last name. If I am wrong, then a separate index on first name might be warranted, but it is not a good idea to reward sloppy user procedures with fast response time.

    In the above case, you would want to define one of the above indexes as clustered. You have two options (index 2 or index 3). It depends on which one you want to give the best query performance to (a search of employees by department, or a search by last name--including leftmost partial last name). It makes no sense to cluster on index 1 or 4, since one employee number (or social security number) is not logically related to another number.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  7. #7
    Join Date
    Apr 2008
    Posts
    51
    Somehow I missed in the original statement of the problem (by db2rocks today at 15:33) that employeeid is unique (oops!).
    Or do we automatically assume that if a column name ends on two letters "ID" - it is unique?
    Even if that were the case, that would not make it "primary" because primary key only has meaning in the context of RI.
    db2topgun.com

  8. #8
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by SuperKuper
    Or do we automatically assume that if a column name ends on two letters "ID" - it is unique?
    I don't think we automatically assume that. But in the case of an employee table, if the empid is not unique, then there are serious problems with the design.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  9. #9
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by SuperKuper
    Even if that were the case, that would not make it "primary" because primary key only has meaning in the context of RI.
    What do you mean with that?
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  10. #10
    Join Date
    Apr 2008
    Posts
    51
    The concept of primary/foreign keys was introduced in DB2 MVS 2.1 (late 1980s) only in connection with initial implementation of referential integrity (RI). Outside of RI the term "primary key" has no meaning, regardless of what most DB2 application programmers out there think (a DBA should know better). On the other hand, you are welcome to define a unique index (or several unique indices) on a table or even a unique index that only includes columns that do not allow null values. You can even ALTER TABLE ADD PRIMARY KEY, but such action will be superfluous unless and until RI is further specified.
    db2topgun.com

  11. #11
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    I look at this the other way around: unique indexes are an abomination.

    There should be a clear separation between external schema, conceptual schema, and internal schema.

    At the internal schema, the DBA manages things like tablespaces, containers, indexes, MQTs, ... In other words: everything that has no impact on the semantics of the data. The conceptual level contains things like tables, triggers, routines. A constraint (unique constraint, check constraint, or referential constraint) defines the semantics of the data to be stored in a table. Therefore, it belongs to the conceptual level. The external level specifies dedication views on the database for each application and is comprised of a set of views.

    The fundamental idea behind this is that you can change things at one level and don't impact any other. Of course, this is not always possible if you think about views. But most DBMS are pretty good at separating internal and conceptual level (if not necessarily at the syntax level).

    Also, implementing a unique constraint does by no means require an index. It is usually done this way because it is the most efficient approach, but you could also guarantee uniqueness with triggers, for example, and then create an index on the column for performance reasons.

    p.s: The concept of referential constraints originates from the relational model (see Codd's 12 rules). DB2 MVS may just have been the first product to implement them.
    p.p.s: A primary key is just a unique constraint with a special name. But there is nothing special about it, compared to other unique constraints.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  12. #12
    Join Date
    Apr 2008
    Posts
    51
    Quote Originally Posted by stolze
    p.p.s: A primary key is just a unique constraint with a special name. But there is nothing special about it, compared to other unique constraints.
    Actually, not. You can have as many UNIQUE constraints as you need, but only one PRIMARY KEY.

    => db2 alter table a.b add constraint c14 primary key"(c1,c2)"
    DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned:
    SQL0624N Table "A.B" already has a "PRIMARY" key. SQLSTATE=42889
    db2topgun.com

  13. #13
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Ok, besides the name "primary key" vs. "unique", that's the only difference. For consistency reasons, it would make much more sense to use unique constraints only and avoid the term primary key. But that's a historical problem already...
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  14. #14
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    A table should (and does) have only one primary key. So by that token, a primary key is different than a unique contraint. This is a basic rule of normalization.

    For example, the primary key in a employee table should be a non-meaningful unique number, whereas there may exist other unique constraints defined (such as social security number in the USA) that would not be ideal for the primary key.

    One diffference between a unique constraint and unique index (alone without a unique constraint) in DB2 is that a unique constraint can be the parent of foreign key.

    Also, in DB2 for LUW, where we cannont alter an index, the index (even for PK or Unique Constraint) should be defined before the constraint is defined to allow for customization of the index properties.
    Last edited by Marcus_A; 04-03-08 at 01:31.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  15. #15
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by Marcus_A
    A table should (and does) have only one primary key. So by that token, a primary key is different than a unique contraint. This is a basic rule of normalization.

    For example, the primary key in a employee table should be a non-meaningful unique number, whereas there may exist other unique constraints defined (such as social security number in the USA) that would not be ideal for the primary key.
    Why would the SSN "not be ideal"? Anyway, that's a discussion about surrogate keys and natural keys. A completely different issue, mostly bordering on religion...

    Here I just wanted to express that a primary key is a unique key and that there is no technical reason why the term "primary key" is needed. Just because it is named "primary" and because only one such constraint can exist on a table, it doesn't make it any different from other unique constraints, does it?

    If you go on to higher normalization levels, you will only have a single unique constraint per table. (But no one is using those levels anyway because it is typically impractical and not well performing.)

    One diffference between a unique constraint and unique index (alone without a unique constraint) in DB2 is that a unique constraint can be the parent of foreign key.
    Right. That's because an index is at the "internal level" whereas constraints are at the "conceptual level". Unique indexes merely exist to enforce unique constraints.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •