If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > composite index and btree physical

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-01-08, 16:33
db2rocks db2rocks is offline
Registered User
 
Join Date: Jan 2008
Posts: 45
composite index and btree physical

Hi friends,


Could some one tell me as to how the composite index is stored in a btree..

Suppose there is an index on (employeeid,firstname,lastname); what will be the keys in each node of the index.I read somewhere that only employeeid will be stored in nonleaf nodes and the whole combination is stored in leaf nodes...Is that true??


What is the significance of order of the columns in a composite index??
Reply With Quote
  #2 (permalink)  
Old 04-01-08, 16:49
SuperKuper SuperKuper is offline
Registered User
 
Join Date: Apr 2008
Posts: 51
I believe the entire key is stored in both leaf and non-leaf pages. As for order of columns, this is very important for performance. Can make a difference between matching and non-matching index scan. More specific predicates(equal, between, greater than, less than) should be targeted to columns on the front of the key, whereas less specific predicates (LIKE '%ABC%') should be more likely to occur toward the back of the key. A lot has been written on this topic. Search IBM documentation. Start with DB2 Admin - Performance.
__________________
db2topgun.com
Reply With Quote
  #3 (permalink)  
Old 04-01-08, 18:00
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
SuperKuper is correct. The entire key is stored in both leaf and non-leaf pages.

The order of the columns depends on the number of predicates supplied in the SQL that match the index columns. If there are 3 columns in the index and the SQL supplies all 3 in the predicate, then it does not matter which column is first in the index.

If you have a 3 column index and the SQL supplies columns 1 and 3 in the predicate, that is probably not as good as supplying columns 1 and 2 (although it depends on the cardinality of each of the columns).

If you have a 3 column index and the SQL supplies columns 2 and 3 in the predicate, then DB2 cannot use the b-tree and will be forced to scan all the leaf pages (in a similar way that a table scan works).
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #4 (permalink)  
Old 04-01-08, 18:22
db2rocks db2rocks is offline
Registered User
 
Join Date: Jan 2008
Posts: 45
Thanks for the info..So, if there is 3 column index on (empid,lastname,firstname) and the predicate refers to empid and firstname how does the scan happen??
Reply With Quote
  #5 (permalink)  
Old 04-01-08, 20:49
SuperKuper SuperKuper is offline
Registered User
 
Join Date: Apr 2008
Posts: 51
In this case it is matching index scan on one column (emplid) and non-matching thereafter. Will find all leaf pages corresponding to given emplid and scan them to match on the 3rd column of the index.
__________________
db2topgun.com
Reply With Quote
  #6 (permalink)  
Old 04-01-08, 23:20
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
Quote:
Originally Posted by db2rocks
Thanks for the info..So, if there is 3 column index on (empid,lastname,firstname) and the predicate refers to empid and firstname how does the scan happen??
Since empid is unique, it does not matter that firstname is non-matching.

For the employee table, you will want the following indexes:

1. empid - unique index (primary key)
2. last name, first name
3. deptno
4. social security number - not in the sample table, but would exist in the real world for a US company. This will not be unique if you employ illegal aliens (some large companies have incredibly claimed that they did not know multiple employees were using the same social security number).

I doubt that it would be a common situation where someone would search the employee table with only a first name and they did not know at least the first few characters of the last name. If I am wrong, then a separate index on first name might be warranted, but it is not a good idea to reward sloppy user procedures with fast response time.

In the above case, you would want to define one of the above indexes as clustered. You have two options (index 2 or index 3). It depends on which one you want to give the best query performance to (a search of employees by department, or a search by last name--including leftmost partial last name). It makes no sense to cluster on index 1 or 4, since one employee number (or social security number) is not logically related to another number.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #7 (permalink)  
Old 04-01-08, 23:59
SuperKuper SuperKuper is offline
Registered User
 
Join Date: Apr 2008
Posts: 51
Somehow I missed in the original statement of the problem (by db2rocks today at 15:33) that employeeid is unique (oops!).
Or do we automatically assume that if a column name ends on two letters "ID" - it is unique?
Even if that were the case, that would not make it "primary" because primary key only has meaning in the context of RI.
__________________
db2topgun.com
Reply With Quote
  #8 (permalink)  
Old 04-02-08, 00:03
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
Quote:
Originally Posted by SuperKuper
Or do we automatically assume that if a column name ends on two letters "ID" - it is unique?
I don't think we automatically assume that. But in the case of an employee table, if the empid is not unique, then there are serious problems with the design.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #9 (permalink)  
Old 04-02-08, 05:03
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Quote:
Originally Posted by SuperKuper
Even if that were the case, that would not make it "primary" because primary key only has meaning in the context of RI.
What do you mean with that?
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #10 (permalink)  
Old 04-02-08, 09:38
SuperKuper SuperKuper is offline
Registered User
 
Join Date: Apr 2008
Posts: 51
The concept of primary/foreign keys was introduced in DB2 MVS 2.1 (late 1980s) only in connection with initial implementation of referential integrity (RI). Outside of RI the term "primary key" has no meaning, regardless of what most DB2 application programmers out there think (a DBA should know better). On the other hand, you are welcome to define a unique index (or several unique indices) on a table or even a unique index that only includes columns that do not allow null values. You can even ALTER TABLE ADD PRIMARY KEY, but such action will be superfluous unless and until RI is further specified.
__________________
db2topgun.com
Reply With Quote
  #11 (permalink)  
Old 04-02-08, 13:46
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
I look at this the other way around: unique indexes are an abomination.

There should be a clear separation between external schema, conceptual schema, and internal schema.

At the internal schema, the DBA manages things like tablespaces, containers, indexes, MQTs, ... In other words: everything that has no impact on the semantics of the data. The conceptual level contains things like tables, triggers, routines. A constraint (unique constraint, check constraint, or referential constraint) defines the semantics of the data to be stored in a table. Therefore, it belongs to the conceptual level. The external level specifies dedication views on the database for each application and is comprised of a set of views.

The fundamental idea behind this is that you can change things at one level and don't impact any other. Of course, this is not always possible if you think about views. But most DBMS are pretty good at separating internal and conceptual level (if not necessarily at the syntax level).

Also, implementing a unique constraint does by no means require an index. It is usually done this way because it is the most efficient approach, but you could also guarantee uniqueness with triggers, for example, and then create an index on the column for performance reasons.

p.s: The concept of referential constraints originates from the relational model (see Codd's 12 rules). DB2 MVS may just have been the first product to implement them.
p.p.s: A primary key is just a unique constraint with a special name. But there is nothing special about it, compared to other unique constraints.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #12 (permalink)  
Old 04-02-08, 14:07
SuperKuper SuperKuper is offline
Registered User
 
Join Date: Apr 2008
Posts: 51
Quote:
Originally Posted by stolze
p.p.s: A primary key is just a unique constraint with a special name. But there is nothing special about it, compared to other unique constraints.
Actually, not. You can have as many UNIQUE constraints as you need, but only one PRIMARY KEY.

=> db2 alter table a.b add constraint c14 primary key"(c1,c2)"
DB21034E The command was processed as an SQL statement because it was not a valid Command Line Processor command. During SQL processing it returned:
SQL0624N Table "A.B" already has a "PRIMARY" key. SQLSTATE=42889
__________________
db2topgun.com
Reply With Quote
  #13 (permalink)  
Old 04-02-08, 14:15
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Ok, besides the name "primary key" vs. "unique", that's the only difference. For consistency reasons, it would make much more sense to use unique constraints only and avoid the term primary key. But that's a historical problem already...
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #14 (permalink)  
Old 04-02-08, 18:52
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
A table should (and does) have only one primary key. So by that token, a primary key is different than a unique contraint. This is a basic rule of normalization.

For example, the primary key in a employee table should be a non-meaningful unique number, whereas there may exist other unique constraints defined (such as social security number in the USA) that would not be ideal for the primary key.

One diffference between a unique constraint and unique index (alone without a unique constraint) in DB2 is that a unique constraint can be the parent of foreign key.

Also, in DB2 for LUW, where we cannont alter an index, the index (even for PK or Unique Constraint) should be defined before the constraint is defined to allow for customization of the index properties.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390

Last edited by Marcus_A; 04-03-08 at 00:31.
Reply With Quote
  #15 (permalink)  
Old 04-03-08, 04:50
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Quote:
Originally Posted by Marcus_A
A table should (and does) have only one primary key. So by that token, a primary key is different than a unique contraint. This is a basic rule of normalization.

For example, the primary key in a employee table should be a non-meaningful unique number, whereas there may exist other unique constraints defined (such as social security number in the USA) that would not be ideal for the primary key.
Why would the SSN "not be ideal"? Anyway, that's a discussion about surrogate keys and natural keys. A completely different issue, mostly bordering on religion...

Here I just wanted to express that a primary key is a unique key and that there is no technical reason why the term "primary key" is needed. Just because it is named "primary" and because only one such constraint can exist on a table, it doesn't make it any different from other unique constraints, does it?

If you go on to higher normalization levels, you will only have a single unique constraint per table. (But no one is using those levels anyway because it is typically impractical and not well performing.)

Quote:
One diffference between a unique constraint and unique index (alone without a unique constraint) in DB2 is that a unique constraint can be the parent of foreign key.
Right. That's because an index is at the "internal level" whereas constraints are at the "conceptual level". Unique indexes merely exist to enforce unique constraints.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On