Results 1 to 7 of 7
  1. #1
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4

    Unanswered: ORGANIZE BY DIMENSIONS performance problem

    Hi,

    I have used the ORGANIZE BY DIMENSIONS option for the first time in our data warehouse.
    Code:
    :CREATE TABLE myTable ( 
       ID_COMPANY INTEGER NOT NULL, 
       YEAR_FISCAL SMALLINT NOT NULL, 
       .... ) 
    ORGANIZE BY DIMENSIONS (YEAR_FISCAL) ;
    With our current data warehouse set-up, all the OLTP production database tables are restored in the data warehouse database. In multiple steps those tables are processed and denormalised.

    We already noticed considerable performance problems in our first step, due only to the introduction of the ORGANIZE BY DIMENSIONS option. The first step went from 2 hours to process all the tables to being FORCEd while still running after 24 hours.

    Who has observed something similar? What can be done about it ?
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  2. #2
    Join Date
    Sep 2003
    Posts
    237
    My understanding is in MULTIdimesionalClustering, when you organize by(a,b,c) , for each unique abc comination you get a block of contiguous pages; when you query, values of a,b,c point you to the right block;every record in that block has the same value for a,b,c; while designing you have to choose a,b,c such that any combination of abc will have quite a few records to fill the block; in general MDCs are sparsely populated; they waste a lot of space; but make the queries fast; your processing time has probably increased because of this; how many records do you have for each fiscal-year? you will be better off with partitioned table(V9) or UNION ALL VIEW in db2 V8 since you really do not have MULTIdimensions.
    mota

  3. #3
    Join Date
    Nov 2005
    Posts
    4
    You wern;t explcit but are the performance problems showing up in queryexecution time? If so, I'd recommend:

    1. Make sure statistics are up to date foir all the tables
    2. If the problem persists with updated statistics, what is your optiization level? Try increasing or decreasing it.
    3. If the problem is still around, check the query EXPLAIN to see if there has been a radical query execution plan change.
    4. The implication in your post is that you are clustering by year. Double check thatthe MDC tabkle size is only very slightly larger than it's non-MDC version. If the MDC version is much larger then you have too many MDC cels, and need to pick a caorser dimension for clustering.

  4. #4
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    I don't want to create a real MULTI dimensional cluster. If I would combine it with another column, I would end up with 1 or a few records per block, speaking of a LARGE data warehouse. And most of the queries afterwards involve calculations grouped by year. That's why I made the decision to include only the YEAR in the dimension.

    The performance problems occur early on, in the very first step of the transform phase, when we want to copy + denormalise the content from the original OLTP-tables (without ORGANIZE BY DIMENSIONS option) to the data warehouse tables (with ORGANIZE BY DIMENSIONS option).
    It seems the overhead of determining the YEAR and assigning a free block for that record is the origin of the performance problem, I think. The first transform phase normally finishes within 2 hours, now we stopped it after 24 hours and it still was not finished.

    The OLTP-tables contain up to 2 million rows.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  5. #5
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650
    How do you 'copy' ... LOAD or INSERTs ?

    Sathyaram
    Visit the new-look IDUG Website , register to gain access to the excellent content.

  6. #6
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    The data warehouse tables are populated with INSERT FROM the OLTP-tables.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  7. #7
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    You may want to try a LOAD from a cursor. LOAD writes directly to the data pages. It's much faster because you avoid all the overhead associated with searching for free space in pages, firing of triggers, etc.

    Another alternative is to deactivate logging for the target tables. That should also improve performance significantly.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •