Results 1 to 5 of 5
  1. #1
    Join Date
    Nov 2007
    Posts
    72

    Unanswered: insert slow in DPF environment

    Hi ,

    i created a table in tablespace spread across 24 partitions with range partitioning done only for the month of january in that tablespacce i am getting a speed of 1000 records / per minute .before that i created the same table on node num 0 that time i was getting speed of 1 million records/per minute ..i am inserting 400 million records for 1st january 2nd and 3rd.

    i am calling a stored procedure to insert records .Please tell me how i can speed up the process..here is my table ddl .

    Create table Maxrecon.Test_load (subscriber_arrangement_id Bigint, Start_date date, Event_id smallint, COl1 smallint, Col2 Smallint) DISTRIBUTE BY HASH("SUBSCRIBER_ARRANGEMENT_ID") PARTITION BY RANGE("START_DATE") (PART "JANUARY10" STARTING('2010-01-01') ENDING('2010-01-31') IN "MAXTEMP") ORGANIZE BY ( START_DATE,EVENT_ID)

    Regards,
    Saurabh Narula

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    The main problem is probably that you are not using the Load Command, which is important with DPF to get maximum performance (parallel bulk loads). If you could do a load from cursor in the stored procedure, I would try that, otherwise probably try to format a flat file to be loaded using the Load command.

    I notice that you are using the following types of partitioning:

    • DPF (DISTRIBUTE BY HASH)
    • Table Partitioning (PARTITION BY RANGE)
    • MDC (ORGANIZE BY)


    I can't see your other Keys and Indexes, but if you want partitioned indexes, the Table Partitioning column must be included in all unique indexes (PK's, etc).

    Do you really need the MDC? My gut feel is that you have way too much data for each date to make that part of a MDC, and since you are already doing Table Partitioning by Date, what exactly is the point of the MDC?.

    Instead, I would create a PK of (Start_date, subscriber_arrangement_id) in that order (you need start_date in the PK to have partitioned indexes). Then create another index on Event_id (if it has sufficient cardinality), and a third (non-unique) index on subscriber_arrangement_id (cannot be unique if you want partitioned indexes). Forget about MDC for this table.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Aug 2008
    Location
    Toronto, Canada
    Posts
    2,367
    If you can't use load, maybe import with buffered inserts? Take a look at the insert related slides at the end of this presentation:
    http://public.dhe.ibm.com/software/d...203-slides.pdf

  4. #4
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    Quote Originally Posted by BELLO4KA View Post
    If you can't use load, maybe import with buffered inserts? Take a look at the insert related slides at the end of this presentation:
    http://public.dhe.ibm.com/software/d...203-slides.pdf
    Nice oopsie in that presentation on slide #22.
    Like Canada belgium (Belgium - Wikipedia, the free encyclopedia) does not have its own language. Unlike Canada this is a very sensible and painful issue causing a lot of political trouble for decades now So IBM: I would block that presentation for belguim IP numbers

  5. #5
    Join Date
    Oct 2004
    Location
    DELHI INDIA
    Posts
    338
    Saurav,
    you are giving a comparison figure between the INSERTs on the table on Node-0 with inserting into the same table on multi-node. Did you try to see similar difference in the throughput rate while you were trying to LOAD a data file into the same table (partitioned and then to the non-partitioned one) ? If you observe that even LOAD command is also running too slow on the partitioned table compared to the non-partitioned one, it means first the bottleneck in your DPF set-up (related to the table) needs to be fixed. Its possible that the single partition table is on a different tablespace and bufferpool where only that table exists, whereas the partitioned table is on a Bufferpool which is being used by multiple tables at the sametime etc. Other stuffs which you need to check is whether your data is evenly distributed across all partition, or its skewed on a partitcular node due to wrong selection of the partitioning key, similarly check for page-cleaners and related values like FCM, container distribution across file-system etc. Let us know with your input so that others can help you on this with further suggestion after understanding the real situation here.
    Last edited by JAYANTA_DATTA; 12-27-10 at 14:58. Reason: typo

    Jayanta Datta
    DB2 UDB DBA
    IBM India, Global Delivery
    New Delhi

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •