| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

10-26-09, 00:25
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 42
|
|
|
create index on which column?
|
|
Hi, have a question about how to create index on which column?
for example, below sql:
Code:
select
c1
,d1
,c4
,d5
from
tableC
inner join tableD
on( c1 = d1
and c2 = d2
and c3 = d3 )
where
c4 = ?
and d4 = ?
want to build index(s) on tableC or tableD to improve performance.
Suppose:
1) tableC lots of records, tabldD few records
I think it is definitely to create a index on tableC.
But on which column?
all columns in the join predicate and in where clause?
like:
a)create index idx.c on tableC( c1, c2, c3, c4 )
b)create index idx.c on tableC( c2, c3, c1, c4 )
would a) and b) be the same?
As I know, decide which column to be the first column in the index creation command is very important. But the 2nd place, 3rd place... dont matter.
That means, a) and b) is different.
IS MY UNDERSTANDING CORRECT?
So, how to decide which column to be the 1st place in the index creation command?
In my understanding, if a column contains the most different values, then this column shoud be the 1st in index.
for example, if c1 is containing the customer number, c2 only containning the values of male or female.
I think the 1st column shoule be c1, right?
Please advise if I am correct on this!
2)tableC lots of records, tableD lots of records.
I believe i need to build two indexs, one on tableC and one tableD following previous assumption.
If my previous assumption is correct, I think its quite easy to build these two indexes.
|
|

10-26-09, 06:40
|
|
Registered User
|
|
Join Date: May 2009
Posts: 472
|
|
wilsonsv, When I create Indexes I usually start with the Equal predicates in the Where Clause. These will get exact matches with an index lookup. I also consider Sort columns (ORDER BY, GROUP BY, DISTINCT). Join predicates are also important.
Note: the order of C1, C2, C3 would be from most distinct values to least distinct values. Example, a 1,000 row table with 1,000 distinct values would be the most distinct while 1 distinct value would be least distinct.
I would try indexes on:
C4, C1, C2, C3
D4, D1, D2, D3
If C1, C2, C3 and/or D1, D2, D3 are Unique, I would try:
C1, C2, C3, C4
D1, D2, D3, D4
You can try 2 indexes on each table with:
C4
C1, C2, C3
D4
D1, D2, D3
This may take advantage of RID anding / oring to filter rows.
One thing you can try, is to create many different indexes, run an Explain and see what DB2 uses.
|
|

10-26-09, 09:37
|
|
Registered User
|
|
Join Date: Dec 2005
Posts: 273
|
|
|
|
I'd suggest indexes on
C4, C1, C2, C3
D4, D1, D2, D3, D5
that allows DB2 index-only-access on both tables.
Column C4 resp. D4 in first place.
Because for one of the two tables only
c4 = ? resp.
d4 = ?
can be used to match the index.
Indexes in both tables allows the optimizer to choose either of the tables as the outer table and still have matching index access.
|
|

10-26-09, 10:49
|
|
Registered User
|
|
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
|
|
And I suggest that you run you workload through the Design Advisor.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
|
|

10-26-09, 13:20
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 42
|
|
Quote:
|
Originally Posted by stolze
And I suggest that you run you workload through the Design Advisor.
|
I agree Design Advisor is a great help to newbie.
But I doubt it would give the most efficient indexes.
Cause I am the one who knows exactly which column would store the most distinct values and which is not.
So...here is my post asking how to build the indexes.
Anyway, thx very much for all your sharing.
Suppose:
Code:
select
c1
,d1
,c4
,d5
from
tableC
inner join tableD
on( c2 = d2
and c3 = d3 )
where
c1 = ?
and c4 = ?
and d1 = ?
and d4 = ?
There are two predicates for each table.
As I know, DB2 will ALWAYS filter the rows using the where clause predicates before joining tables. Thus, I believe below sequence it's correct.
c1, c4, c2, c3 (considering c1 the most distinct and c4 the least)
|
|

10-26-09, 13:34
|
|
Registered User
|
|
Join Date: May 2009
Posts: 472
|
|
Quote:
|
As I know, DB2 will ALWAYS filter the rows using the where clause predicates before joining tables.
|
wilsonfv, I would not put ALWAYS in that statement. In general, what DB2 does is:
Filter rows on one table.
Join to the other table (probably with a Nested Loop Join)
Check filtering on other table.
It uses Statistics on the tables / columns to determine the first table.
There are some cases where DB2 will:
Filter on on table
Filter on the other table
Do a Merge Scan (or Hybrid) Join of the two filtered results.
The Nested Loop Join is more common than the Merge Scan Join (from what I have experienced). There are other types of joins (especially in the LUW world) that may filter on both tables first and then Join.
|
|

10-26-09, 14:05
|
|
Registered User
|
|
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
|
|
Quote:
|
Originally Posted by wilsonfv
I agree Design Advisor is a great help to newbie.
But I doubt it would give the most efficient indexes.
|
I agree. However, such a question from the OP suggests that the basics in understanding how indexes work, are not there yet. So it is good to have a starting point _and_ then understand how indexes work.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
|
|

10-27-09, 12:09
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 42
|
|
Quote:
|
Originally Posted by Stealth_DBA
The Nested Loop Join is more common than the Merge Scan Join (from what I have experienced). There are other types of joins (especially in the LUW world) that may filter on both tables first and then Join.
|
Ya, as I know, there are three kinds of operation for joining tables.
Code:
Nested-Loop
Join A nested-loop join is performed in one of the following two ways: - Scanning the inner table for each accessed row of the outer table
- Performing an index lookup on the inner table for each accessed row of the outer table
Merge Join
Merge join, sometimes known as merge scan join or sort merge join, requires a predicate of the form table1.column = table2.column. This is called an equality join predicate. Merge join requires ordered input on the joining columns, either through index access or by sorting. A merge join cannot be used if the join column is a LONG field column or a large object (LOB) column.
Hash Join
A hash join requires one or more predicates of the form table1.columnX = table2.columnY, for which the column types are the same. For columns of type CHAR, the length must be the same. For columns of type DECIMAL, the precision and scale must be the same. The column type cannot be a LONG field column, or a large object (LOB) column.
Which is better, i cant tell.
But as I know, better to avoid hash join.
(PS: the color for hash join in SQL Explain is brown, I think DB2 is giving a message that this operation is bad.  )
Now, I have another question.
Once DB2 have filtered records and joined the tables with indexes scan, before return the result, I think DB2 still need to access the tables (consider not all data is in the index) to retrieve the corresponding records' columns, RIGHT?
Could this access be called as 'table scan'?
If this is true, could I say that table space is unavoidable?
(Why I am asking such a question is because my project has to follow a SQL coding standard which says that all table scan must eliminate in SQL Explain. But I think table scan is unavoidable, DB2 dont care about this is a table scan or index scan as long as this operation is faster, then DB2 will use it.)
|
Last edited by wilsonfv; 10-27-09 at 12:27.
|

10-27-09, 12:43
|
|
Registered User
|
|
Join Date: May 2009
Posts: 472
|
|
wilsonfv, If an Index is used, you don't have a table space scan. If access to the table is needed, the Index contains the RID of the row and gets the page the row is on.
You only have a table space scan when an index on that table is not used.
By the way, while it is a good practice to eliminate as many table space scans as possible, there are times when you can't (and shouldn't) eliminate them.
And example of a can't is a table with a large number of rows and you are going to return a very large percentage of the rows. In this case, DB2 can decide that using an index would be more work since most of the rows are being accessed anyway. (this is a very general statement and there are other factors involved. Just trying to keep it simple for the explanation.)
An example of a shouldn't is a very small table. If there is only one page that contains rows and there is an Index on this table (probably just to enforce uniqueness), it would require 2 or 3 get pages to use the Index (Index pages and table page) while a table space scan (of 1 page) only requires 1 get page.
|
|

10-27-09, 17:10
|
|
Registered User
|
|
Join Date: Feb 2009
Posts: 114
|
|
Quote:
|
Originally Posted by stolze
And I suggest that you run you workload through the Design Advisor.
|
Does BLINK use Design Advisor?
|
|

10-28-09, 08:14
|
|
Registered User
|
|
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
|
|
Quote:
|
Originally Posted by db2dummy1
Does BLINK use Design Advisor?
|
No need for that. IBM Smart Analytics Optimizer (aka DWA aka BLINK) doesn't have indexes or MQTs or different tablespaces or ... You merely identify the tables and the references between those tables (references are comparable to foreign keys and define the join paths). You create the mart with those basic definitions, load the data and that's it.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|