If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > ORGANIZE BY DIMENSIONS performance problem

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-28-07, 08:48
Wim Wim is offline
Registered User
 
Join Date: Nov 2004
Posts: 1,279
ORGANIZE BY DIMENSIONS performance problem

Hi,

I have used the ORGANIZE BY DIMENSIONS option for the first time in our data warehouse.
Code:
:CREATE TABLE myTable ( 
   ID_COMPANY INTEGER NOT NULL, 
   YEAR_FISCAL SMALLINT NOT NULL, 
   .... ) 
ORGANIZE BY DIMENSIONS (YEAR_FISCAL) ;
With our current data warehouse set-up, all the OLTP production database tables are restored in the data warehouse database. In multiple steps those tables are processed and denormalised.

We already noticed considerable performance problems in our first step, due only to the introduction of the ORGANIZE BY DIMENSIONS option. The first step went from 2 hours to process all the tables to being FORCEd while still running after 24 hours.

Who has observed something similar? What can be done about it ?
__________________
With kind regards . . . . . SQL Server 2000/2005/2008/2008 R2 Earned beers: 16
Wim
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages
Reply With Quote
  #2 (permalink)  
Old 03-28-07, 10:09
dbamota dbamota is offline
Registered User
 
Join Date: Sep 2003
Posts: 237
My understanding is in MULTIdimesionalClustering, when you organize by(a,b,c) , for each unique abc comination you get a block of contiguous pages; when you query, values of a,b,c point you to the right block;every record in that block has the same value for a,b,c; while designing you have to choose a,b,c such that any combination of abc will have quite a few records to fill the block; in general MDCs are sparsely populated; they waste a lot of space; but make the queries fast; your processing time has probably increased because of this; how many records do you have for each fiscal-year? you will be better off with partitioned table(V9) or UNION ALL VIEW in db2 V8 since you really do not have MULTIdimensions.
__________________
mota
Reply With Quote
  #3 (permalink)  
Old 03-28-07, 19:03
samlightstone samlightstone is offline
Registered User
 
Join Date: Nov 2005
Posts: 4
You wern;t explcit but are the performance problems showing up in queryexecution time? If so, I'd recommend:

1. Make sure statistics are up to date foir all the tables
2. If the problem persists with updated statistics, what is your optiization level? Try increasing or decreasing it.
3. If the problem is still around, check the query EXPLAIN to see if there has been a radical query execution plan change.
4. The implication in your post is that you are clustering by year. Double check thatthe MDC tabkle size is only very slightly larger than it's non-MDC version. If the MDC version is much larger then you have too many MDC cels, and need to pick a caorser dimension for clustering.
Reply With Quote
  #4 (permalink)  
Old 03-29-07, 08:18
Wim Wim is offline
Registered User
 
Join Date: Nov 2004
Posts: 1,279
I don't want to create a real MULTI dimensional cluster. If I would combine it with another column, I would end up with 1 or a few records per block, speaking of a LARGE data warehouse. And most of the queries afterwards involve calculations grouped by year. That's why I made the decision to include only the YEAR in the dimension.

The performance problems occur early on, in the very first step of the transform phase, when we want to copy + denormalise the content from the original OLTP-tables (without ORGANIZE BY DIMENSIONS option) to the data warehouse tables (with ORGANIZE BY DIMENSIONS option).
It seems the overhead of determining the YEAR and assigning a free block for that record is the origin of the performance problem, I think. The first transform phase normally finishes within 2 hours, now we stopped it after 24 hours and it still was not finished.

The OLTP-tables contain up to 2 million rows.
__________________
With kind regards . . . . . SQL Server 2000/2005/2008/2008 R2 Earned beers: 16
Wim
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages
Reply With Quote
  #5 (permalink)  
Old 03-29-07, 08:41
sathyaram_s sathyaram_s is offline
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,534
How do you 'copy' ... LOAD or INSERTs ?

Sathyaram
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #6 (permalink)  
Old 03-29-07, 11:10
Wim Wim is offline
Registered User
 
Join Date: Nov 2004
Posts: 1,279
The data warehouse tables are populated with INSERT FROM the OLTP-tables.
__________________
With kind regards . . . . . SQL Server 2000/2005/2008/2008 R2 Earned beers: 16
Wim
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages
Reply With Quote
  #7 (permalink)  
Old 03-31-07, 12:24
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
You may want to try a LOAD from a cursor. LOAD writes directly to the data pages. It's much faster because you avoid all the overhead associated with searching for free space in pages, firing of triggers, etc.

Another alternative is to deactivate logging for the target tables. That should also improve performance significantly.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On