Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Aug 2010
    Posts
    17

    Unanswered: How to speed up "Load"

    Hi All,

    I have following load script, which takes about 6 hours to load 200M rows. Can we optimize it?

    load from myTable.DAT of
    del modified by usedefaults nochardel keepblanks totalfreespace=0 coldel~ decpt. savecount
    1000000 messages load.log replace into myDB.myTable NONRECOVERABLE

    Any suggestion on load script or database buffer pool, DB config, Thanks in advance!

    Mike

  2. #2
    Join Date
    Aug 2008
    Location
    Toronto, Canada
    Posts
    2,367
    Take a look at:
    IBM DB2 9.5 Information Center for Linux, UNIX, and Windows


    If a table has indexes, some of the things you can do is to tune sortheap / sheapthres, increase tempsapce's bufferpool (in case db2 need to spill sort when building indexes), use alternate page cleaning, enable intra_parallel (not needed if using v9.5 + , check: IBM DB2 9.5 Information Center for Linux, UNIX, and Windows)


    The following white paper has a lot of good info. It was written for v8 but most of the info can still be used for v9.x :
    ftp://ftp.software.ibm.com/software/...loaderperf.pdf

  3. #3
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    why is it you are copying to another schema? are you making some change to the table structure which requires you to recreate the table? Are you going to update both of them to keep them in synch? What about using an alias to point to original table?
    Dave

  4. #4
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    need more info. how about the DDL of the new table: is there a cluster index? or a MDC definition? If so, is the inputfile sorted? how many indexes are defined on that table?
    6 hours for 200.000.000 rows sound reasonable for an intel platform (PC). Is that the case here?

    b.t.w. how long did the export take to create the inputfile?

  5. #5
    Join Date
    Aug 2010
    Posts
    17
    Here is the DDL for the table:
    CREATE TABLE "TEST "."TestTable" (
    "DOC_ID" BIGINT NOT NULL ,
    "INIT_ID1" CHAR(4) NOT NULL ,
    "INIT_ID2" DATE NOT NULL ,
    "INIT_ID3" SMALLINT NOT NULL ,
    "INIT_ID4" INTEGER NOT NULL ,
    ....

    "RECTTM" TIMESTAMP NOT NULL WITH DEFAULT CURRENT TIMESTAMP )
    IN "tbs1234" INDEX IN "tbs1234" ;

    CREATE INDEX "TEST "."INDEX1" ON "TEST "."TestTable"
    ("INIT_ID1" ASC,
    "INIT_ID2" ASC,
    "INIT_ID3" ASC,
    "INIT_ID4" ASC ) PCTFREE 0 ALLOW REVERSE SCANS;


    CREATE UNIQUE INDEX "TEST "."INDEXP" ON "TEST "."TestTable"
    ("DOC_ID" ASC) PCTFREE 0 CLUSTER ALLOW REVERSE SCANS;

    And I have following SQL for set integraty, which takes even more than 10 hours to completed. Any suggestion to improve the performance on this SQL?

    SET INTEGRITY FOR TEST.TestTable IMMEDIATE CHECKED NOT INCREMENTAL FOR EXCEPTION IN TEST.TestTable USE TEST.TestTableExcep

    Thanks any your help!

    Mike

  6. #6
    Join Date
    Oct 2004
    Location
    DELHI INDIA
    Posts
    338
    Hi,

    Saw the problem you are facing here. Having CLUSTERed index can always slow down the INDEX building process after the LOAD. But to know exactly what is going on, can you please help to enclose the following here:

    1> Reset all your monitors. Take snapshots during the beginning, middle and end of execution of your load command on DATABASE and BUFFERPOOL. Also mention us here the bufferpool to which the particular table in question is associated.

    2> Take a snapshot of your DBCFG parameters and post it.

    Looking into the above two, it can be easier to achieve a faster resolution here. I am assuming your operating system and hardware is healthy ( you have already ensured by looking into CPU & IO utilization).

    Jayanta Datta
    DB2 UDB DBA
    IBM India, Global Delivery
    New Delhi

  7. #7
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by JAYANTA_DATTA View Post
    Saw the problem you are facing here. Having CLUSTERed index can always slow down the INDEX building process after the LOAD.
    Can you explain why this is so?

    I believe that a clustered index specifies the order of rows in the associated table, not the order of the index. Index rows are always in exactly the correct sequence regardless of whether the index is defined as clustered. So I don't understand how having a clustered index makes the index build process any more time/resource consuming.

    But even if you were concerned about the table build process during a LOAD for a table with a a clustered index, DB2 does not try to maintain table row sequence anyway during a LOAD (unlike INSERT or IMPORT). Even when there is a clustering index during an INSERT or IMPORT, the exact order is not maintained, only the correct page is attempted, and only if there is enough room on that page (or a nearby page) for the new table row (does not do page splits like with index pages). Exact sequenceing of table rows (per the clustering index) happens during a reorg.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  8. #8
    Join Date
    Oct 2004
    Location
    DELHI INDIA
    Posts
    338
    Regarding CLUSTERED Index, yes, the negative impact is mainly discussed always when people are doing Bulk Insert, regarding INDEX building time, its experienced in few environments to drop it first, load the data and then recreate your clustered index with sufficient freespace ( >= 10% if possible) makes the load faster (for large tables). The point referred to clustered index in my last post was referring to this fact.
    Last edited by JAYANTA_DATTA; 12-19-10 at 00:33.

    Jayanta Datta
    DB2 UDB DBA
    IBM India, Global Delivery
    New Delhi

  9. #9
    Join Date
    Aug 2008
    Location
    Toronto, Canada
    Posts
    2,367
    Quote Originally Posted by mail4mz View Post
    Here is the DDL for the table:
    CREATE TABLE "TEST "."TestTable" (
    "DOC_ID" BIGINT NOT NULL ,
    "INIT_ID1" CHAR(4) NOT NULL ,
    "INIT_ID2" DATE NOT NULL ,
    "INIT_ID3" SMALLINT NOT NULL ,
    "INIT_ID4" INTEGER NOT NULL ,
    ....

    "RECTTM" TIMESTAMP NOT NULL WITH DEFAULT CURRENT TIMESTAMP )
    IN "tbs1234" INDEX IN "tbs1234" ;

    CREATE INDEX "TEST "."INDEX1" ON "TEST "."TestTable"
    ("INIT_ID1" ASC,
    "INIT_ID2" ASC,
    "INIT_ID3" ASC,
    "INIT_ID4" ASC ) PCTFREE 0 ALLOW REVERSE SCANS;


    CREATE UNIQUE INDEX "TEST "."INDEXP" ON "TEST "."TestTable"
    ("DOC_ID" ASC) PCTFREE 0 CLUSTER ALLOW REVERSE SCANS;

    And I have following SQL for set integraty, which takes even more than 10 hours to completed. Any suggestion to improve the performance on this SQL?

    SET INTEGRITY FOR TEST.TestTable IMMEDIATE CHECKED NOT INCREMENTAL FOR EXCEPTION IN TEST.TestTable USE TEST.TestTableExcep

    Thanks any your help!

    Mike


    Check if there are any warnings in the db2diag.log during load / index build execution (ie. messages about reducing sortheap). Does db2 perform a delete phase during this load operation? Delete can be slow. How long does it take to perform: load without the indexes and then creating the indexes?


    For the set integrity statement, if you're sure that that data is valid (ie. not violations), then you can try "immediate unchecked". See the warnings in the manual about this option before using it.

  10. #10
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by JAYANTA_DATTA View Post
    Regarding CLUSTERED Index, yes, the negative impact is mainly discussed always when people are doing Bulk Insert, regarding INDEX building time, its experienced in few environments to drop it first, load the data and then recreate your clustered index with sufficient freespace ( >= 10% if possible) makes the load faster (for large tables). The point referred to clustered index in my last post was referring to this fact.
    I thought we are talking about LOAD command, and not sure what you mean by "Bulk Insert".

    I still do not understand what a clustered index has to do with it. Can you explain this?

    Yes, it is faster to do a LOAD without any indexes (regardless of whether any of the indexes are clustered).
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  11. #11
    Join Date
    Oct 2004
    Location
    DELHI INDIA
    Posts
    338
    Hi M.A.,

    Some quote from DB2 library, bulk insert means inserts of million of records in a single UOW. The context discussed here after its observed an Clustered Index with 0 PCTFREE.

    "If you have clustered index, the performance of the insert will be very negative impact, this is not surprising, because the purpose of using the clustered index is inserted through the extra work to do to improve the query (ie select) performance.If you do need a clustered index, you can be sure there is enough free space to minimize the impact on the insert: Use ALTER TABLE to increase PCTFREE, and then use the REORG reserved free space.However, if you allow too much free space exists, it may lead to additional queries need to read the page, this fact greatly in violation of the intention of using the clustered index.Another option is to remove the bulk insert clustered index before, and then re-create the clustered index, and perhaps this is the best method (the cost of creating a cluster index with the cost of creating the index is almost routine, not great, justinsert extra overhead). "

    I will be more interested to see the output of snapshots and configuration parameter during the Load to really find any potential bottleneck.

    Jayanta Datta
    DB2 UDB DBA
    IBM India, Global Delivery
    New Delhi

  12. #12
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by JAYANTA_DATTA View Post
    Hi M.A.,

    Some quote from DB2 library, bulk insert means inserts of million of records in a single UOW. The context discussed here after its observed an Clustered Index with 0 PCTFREE.

    "If you have clustered index, the performance of the insert will be very negative impact, this is not surprising, because the purpose of using the clustered index is inserted through the extra work to do to improve the query (ie select) performance.If you do need a clustered index, you can be sure there is enough free space to minimize the impact on the insert: Use ALTER TABLE to increase PCTFREE, and then use the REORG reserved free space.However, if you allow too much free space exists, it may lead to additional queries need to read the page, this fact greatly in violation of the intention of using the clustered index.Another option is to remove the bulk insert clustered index before, and then re-create the clustered index, and perhaps this is the best method (the cost of creating a cluster index with the cost of creating the index is almost routine, not great, justinsert extra overhead). "

    I will be more interested to see the output of snapshots and configuration parameter during the Load to really find any potential bottleneck.
    This discussion is about the LOAD command, not bulk inserts in a single UOW.

    If you read my first post in this thread carefully, you will see that I explained how a clustered index affects where the table row is placed for INSERTs or with the IMPORT command, but a clustered index does not affect LOADs.

    So the conclusion is (unless you provide further evidence) that having a clustered index (versus a non-clustered index) has no impact on performance of a LOAD. Obviously if there are no indexes, a LOAD will run faster because no Index Build Phase is required. However some indexes could be faster than others if the new rows loaded have higher index values than the old index values and if they are loaded in sequence (such as an index on current_timestamp), but this has nothing to do with clustered status since that affects table row order (which is ignored by LOAD command) and not index row order (indexes are always maintained by DB2 in the correct sequence, although for a LOAD it is done during Index Build Phase and not as the rows are loaded).
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  13. #13
    Join Date
    Oct 2004
    Location
    DELHI INDIA
    Posts
    338
    whatever you have mentioned in your conclusion is quite good and there should n't be any double opinion.

    About slowness while loading with clustered index, its not generic or nowhere documented, for bulk-insert it's a fact. In few of the environments we have experienced similar situation slowness with load while having clustered index, and faster througput without them ( I have mentioned above about the * environment * part). So, I specifically asked him the snapshots and parameter details in my first post itself so that we can do some quick comparison and could come to some kind of conclusion.

    I hope, we are not mixing up bulk-insert with load anymore (which was brought up while discussing clustering). Lets see some input from the thread requester here so that we can get into some detail of the problem here. It's going to help many other people around as this is one of the most common issues people face while doing LOAD with longer duration of execution.

    Jayanta Datta
    DB2 UDB DBA
    IBM India, Global Delivery
    New Delhi

  14. #14
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    When you have a cluster-index, you should consider a PCTFREE percentage greater than 0. Not only on the INDEX but also on the table itself to reserve space. Also, as I mentioned before: if the input is sorted on the same key-value as the cluster, the LOAD does not have to sort. If not.......

  15. #15
    Join Date
    Aug 2010
    Posts
    17
    Hi All,

    Here is the log of the LOAD command. The load took about 7 hours, the rebuild index took about 2 hours. Any idea to speed it up?

    Thanks

    Mike

    select current timestamp AS Start_Time from sysibm.sysdummy1

    START_TIME
    --------------------------
    2010-12-19-09.00.25.510050

    1 record(s) selected.


    load from /temp/TestTable.DAT of del modified by usedefaults no
    chardel keepblanks totalfreespace=0 coldel~ decpt. savecount 1000000 messages /temp/load.log replace into test.testTable NONRECOVERABLE

    Number of rows read = 167829642
    Number of rows skipped = 0
    Number of rows loaded = 167829642
    Number of rows rejected = 0
    Number of rows deleted = 1997
    Number of rows committed = 167829642


    select current timestamp AS End_Time from sysibm.sysdummy1

    END_TIME
    --------------------------
    2010-12-19-18.40.13.335969

    1 record(s) selected.
    SQL3500W The utility is beginning the "LOAD" phase at time "12/19/2010
    09:00:30.793157".

    SQL3519W Begin Load Consistency Point. Input record count = "0".

    SQL3520W Load Consistency Point was successful.
    "PGIRA36L.log" [Read only] 739 lines, 22921 characters

    SQL3520W Load Consistency Point was successful.

    SQL3519W Begin Load Consistency Point. Input record count = "1000834".

    SQL3520W Load Consistency Point was successful.

    SQL3519W Begin Load Consistency Point. Input record count = "2001736".

    ....

    SQL3500W The utility is beginning the "BUILD" phase at time "12/19/2010
    16:29:41.928483".

    SQL3213I The indexing mode is "REBUILD".

    SQL3515W The utility has finished the "BUILD" phase at time "12/19/2010
    18:38:16.842199".

    SQL3500W The utility is beginning the "DELETE" phase at time "12/19/2010
    18:38:17.582209".

    SQL3509W The utility has deleted "1997" rows from the table.

    SQL3515W The utility has finished the "DELETE" phase at time "12/19/2010
    18:40:03.153953".


    Number of rows read = 167829642
    Number of rows skipped = 0
    Number of rows loaded = 167829642
    Number of rows rejected = 0
    Number of rows deleted = 1997

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •