| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |
|

12-12-10, 00:13
|
|
Registered User
|
|
Join Date: Aug 2010
Posts: 17
|
|
|
How to speed up "Load"
|
|
Hi All,
I have following load script, which takes about 6 hours to load 200M rows. Can we optimize it?
load from myTable.DAT of
del modified by usedefaults nochardel keepblanks totalfreespace=0 coldel~ decpt. savecount
1000000 messages load.log replace into myDB.myTable NONRECOVERABLE
Any suggestion on load script or database buffer pool, DB config, Thanks in advance!
Mike
|
|

12-12-10, 11:06
|
|
∞∞∞∞∞∞
|
|
Join Date: Aug 2008
Location: Toronto, Canada
Posts: 1,816
|
|
|
|

12-13-10, 09:53
|
|
Registered User
|
|
Join Date: Dec 2007
Location: Richmond, VA
Posts: 782
|
|
|
|
why is it you are copying to another schema? are you making some change to the table structure which requires you to recreate the table? Are you going to update both of them to keep them in synch? What about using an alias to point to original table?
Dave
|
|

12-14-10, 06:18
|
|
Registered User
|
|
Join Date: Jan 2009
Location: Zoetermeer, Holland
Posts: 555
|
|
need more info. how about the DDL of the new table: is there a cluster index? or a MDC definition? If so, is the inputfile sorted? how many indexes are defined on that table?
6 hours for 200.000.000 rows sound reasonable for an intel platform (PC). Is that the case here?
b.t.w. how long did the export take to create the inputfile?
|
|

12-17-10, 11:05
|
|
Registered User
|
|
Join Date: Aug 2010
Posts: 17
|
|
Here is the DDL for the table:
CREATE TABLE "TEST "."TestTable" (
"DOC_ID" BIGINT NOT NULL ,
"INIT_ID1" CHAR(4) NOT NULL ,
"INIT_ID2" DATE NOT NULL ,
"INIT_ID3" SMALLINT NOT NULL ,
"INIT_ID4" INTEGER NOT NULL ,
....
"RECTTM" TIMESTAMP NOT NULL WITH DEFAULT CURRENT TIMESTAMP )
IN "tbs1234" INDEX IN "tbs1234" ;
CREATE INDEX "TEST "."INDEX1" ON "TEST "."TestTable"
("INIT_ID1" ASC,
"INIT_ID2" ASC,
"INIT_ID3" ASC,
"INIT_ID4" ASC ) PCTFREE 0 ALLOW REVERSE SCANS;
CREATE UNIQUE INDEX "TEST "."INDEXP" ON "TEST "."TestTable"
("DOC_ID" ASC) PCTFREE 0 CLUSTER ALLOW REVERSE SCANS;
And I have following SQL for set integraty, which takes even more than 10 hours to completed. Any suggestion to improve the performance on this SQL?
SET INTEGRITY FOR TEST.TestTable IMMEDIATE CHECKED NOT INCREMENTAL FOR EXCEPTION IN TEST.TestTable USE TEST.TestTableExcep
Thanks any your help!
Mike
|
|

12-18-10, 09:10
|
|
Registered User
|
|
Join Date: Oct 2004
Location: DELHI INDIA
Posts: 336
|
|
Hi,
Saw the problem you are facing here. Having CLUSTERed index can always slow down the INDEX building process after the LOAD. But to know exactly what is going on, can you please help to enclose the following here:
1> Reset all your monitors. Take snapshots during the beginning, middle and end of execution of your load command on DATABASE and BUFFERPOOL. Also mention us here the bufferpool to which the particular table in question is associated.
2> Take a snapshot of your DBCFG parameters and post it.
Looking into the above two, it can be easier to achieve a faster resolution here. I am assuming your operating system and hardware is healthy ( you have already ensured by looking into CPU & IO utilization).
__________________
Jayanta Datta
DB2 UDB DBA
IBM India, Global Delivery
New Delhi
|
|

12-18-10, 10:41
|
|
Registered User
|
|
Join Date: May 2003
Location: USA
Posts: 5,198
|
|
Quote:
Originally Posted by JAYANTA_DATTA
Saw the problem you are facing here. Having CLUSTERed index can always slow down the INDEX building process after the LOAD.
|
Can you explain why this is so?
I believe that a clustered index specifies the order of rows in the associated table, not the order of the index. Index rows are always in exactly the correct sequence regardless of whether the index is defined as clustered. So I don't understand how having a clustered index makes the index build process any more time/resource consuming.
But even if you were concerned about the table build process during a LOAD for a table with a a clustered index, DB2 does not try to maintain table row sequence anyway during a LOAD (unlike INSERT or IMPORT). Even when there is a clustering index during an INSERT or IMPORT, the exact order is not maintained, only the correct page is attempted, and only if there is enough room on that page (or a nearby page) for the new table row (does not do page splits like with index pages). Exact sequenceing of table rows (per the clustering index) happens during a reorg.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
|
|

12-18-10, 14:24
|
|
Registered User
|
|
Join Date: Oct 2004
Location: DELHI INDIA
Posts: 336
|
|
Regarding CLUSTERED Index, yes, the negative impact is mainly discussed always when people are doing Bulk Insert, regarding INDEX building time, its experienced in few environments to drop it first, load the data and then recreate your clustered index with sufficient freespace ( >= 10% if possible) makes the load faster (for large tables). The point referred to clustered index in my last post was referring to this fact.
__________________
Jayanta Datta
DB2 UDB DBA
IBM India, Global Delivery
New Delhi
|
Last edited by JAYANTA_DATTA; 12-18-10 at 23:33.
|

12-18-10, 21:42
|
|
∞∞∞∞∞∞
|
|
Join Date: Aug 2008
Location: Toronto, Canada
Posts: 1,816
|
|
Quote:
Originally Posted by mail4mz
Here is the DDL for the table:
CREATE TABLE "TEST "."TestTable" (
"DOC_ID" BIGINT NOT NULL ,
"INIT_ID1" CHAR(4) NOT NULL ,
"INIT_ID2" DATE NOT NULL ,
"INIT_ID3" SMALLINT NOT NULL ,
"INIT_ID4" INTEGER NOT NULL ,
....
"RECTTM" TIMESTAMP NOT NULL WITH DEFAULT CURRENT TIMESTAMP )
IN "tbs1234" INDEX IN "tbs1234" ;
CREATE INDEX "TEST "."INDEX1" ON "TEST "."TestTable"
("INIT_ID1" ASC,
"INIT_ID2" ASC,
"INIT_ID3" ASC,
"INIT_ID4" ASC ) PCTFREE 0 ALLOW REVERSE SCANS;
CREATE UNIQUE INDEX "TEST "."INDEXP" ON "TEST "."TestTable"
("DOC_ID" ASC) PCTFREE 0 CLUSTER ALLOW REVERSE SCANS;
And I have following SQL for set integraty, which takes even more than 10 hours to completed. Any suggestion to improve the performance on this SQL?
SET INTEGRITY FOR TEST.TestTable IMMEDIATE CHECKED NOT INCREMENTAL FOR EXCEPTION IN TEST.TestTable USE TEST.TestTableExcep
Thanks any your help!
Mike
|
Check if there are any warnings in the db2diag.log during load / index build execution (ie. messages about reducing sortheap). Does db2 perform a delete phase during this load operation? Delete can be slow. How long does it take to perform: load without the indexes and then creating the indexes?
For the set integrity statement, if you're sure that that data is valid (ie. not violations), then you can try "immediate unchecked". See the warnings in the manual about this option before using it.
|
|

12-18-10, 23:39
|
|
Registered User
|
|
Join Date: May 2003
Location: USA
Posts: 5,198
|
|
Quote:
Originally Posted by JAYANTA_DATTA
Regarding CLUSTERED Index, yes, the negative impact is mainly discussed always when people are doing Bulk Insert, regarding INDEX building time, its experienced in few environments to drop it first, load the data and then recreate your clustered index with sufficient freespace ( >= 10% if possible) makes the load faster (for large tables). The point referred to clustered index in my last post was referring to this fact.
|
I thought we are talking about LOAD command, and not sure what you mean by "Bulk Insert".
I still do not understand what a clustered index has to do with it. Can you explain this?
Yes, it is faster to do a LOAD without any indexes (regardless of whether any of the indexes are clustered).
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
|
|

12-19-10, 06:08
|
|
Registered User
|
|
Join Date: Oct 2004
Location: DELHI INDIA
Posts: 336
|
|
Hi M.A.,
Some quote from DB2 library, bulk insert means inserts of million of records in a single UOW. The context discussed here after its observed an Clustered Index with 0 PCTFREE.
"If you have clustered index, the performance of the insert will be very negative impact, this is not surprising, because the purpose of using the clustered index is inserted through the extra work to do to improve the query (ie select) performance.If you do need a clustered index, you can be sure there is enough free space to minimize the impact on the insert: Use ALTER TABLE to increase PCTFREE, and then use the REORG reserved free space.However, if you allow too much free space exists, it may lead to additional queries need to read the page, this fact greatly in violation of the intention of using the clustered index.Another option is to remove the bulk insert clustered index before, and then re-create the clustered index, and perhaps this is the best method (the cost of creating a cluster index with the cost of creating the index is almost routine, not great, justinsert extra overhead). "
I will be more interested to see the output of snapshots and configuration parameter during the Load to really find any potential bottleneck.
__________________
Jayanta Datta
DB2 UDB DBA
IBM India, Global Delivery
New Delhi
|
|

12-19-10, 09:49
|
|
Registered User
|
|
Join Date: May 2003
Location: USA
Posts: 5,198
|
|
Quote:
Originally Posted by JAYANTA_DATTA
Hi M.A.,
Some quote from DB2 library, bulk insert means inserts of million of records in a single UOW. The context discussed here after its observed an Clustered Index with 0 PCTFREE.
"If you have clustered index, the performance of the insert will be very negative impact, this is not surprising, because the purpose of using the clustered index is inserted through the extra work to do to improve the query (ie select) performance.If you do need a clustered index, you can be sure there is enough free space to minimize the impact on the insert: Use ALTER TABLE to increase PCTFREE, and then use the REORG reserved free space.However, if you allow too much free space exists, it may lead to additional queries need to read the page, this fact greatly in violation of the intention of using the clustered index.Another option is to remove the bulk insert clustered index before, and then re-create the clustered index, and perhaps this is the best method (the cost of creating a cluster index with the cost of creating the index is almost routine, not great, justinsert extra overhead). "
I will be more interested to see the output of snapshots and configuration parameter during the Load to really find any potential bottleneck.
|
This discussion is about the LOAD command, not bulk inserts in a single UOW.
If you read my first post in this thread carefully, you will see that I explained how a clustered index affects where the table row is placed for INSERTs or with the IMPORT command, but a clustered index does not affect LOADs.
So the conclusion is (unless you provide further evidence) that having a clustered index (versus a non-clustered index) has no impact on performance of a LOAD. Obviously if there are no indexes, a LOAD will run faster because no Index Build Phase is required. However some indexes could be faster than others if the new rows loaded have higher index values than the old index values and if they are loaded in sequence (such as an index on current_timestamp), but this has nothing to do with clustered status since that affects table row order (which is ignored by LOAD command) and not index row order (indexes are always maintained by DB2 in the correct sequence, although for a LOAD it is done during Index Build Phase and not as the rows are loaded).
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
|
|

12-19-10, 10:43
|
|
Registered User
|
|
Join Date: Oct 2004
Location: DELHI INDIA
Posts: 336
|
|
whatever you have mentioned in your conclusion is quite good and there should n't be any double opinion.
About slowness while loading with clustered index, its not generic or nowhere documented, for bulk-insert it's a fact. In few of the environments we have experienced similar situation slowness with load while having clustered index, and faster througput without them ( I have mentioned above about the * environment * part). So, I specifically asked him the snapshots and parameter details in my first post itself so that we can do some quick comparison and could come to some kind of conclusion.
I hope, we are not mixing up bulk-insert with load anymore (which was brought up while discussing clustering). Lets see some input from the thread requester here so that we can get into some detail of the problem here. It's going to help many other people around as this is one of the most common issues people face while doing LOAD with longer duration of execution.
__________________
Jayanta Datta
DB2 UDB DBA
IBM India, Global Delivery
New Delhi
|
|

12-20-10, 07:39
|
|
Registered User
|
|
Join Date: Jan 2009
Location: Zoetermeer, Holland
Posts: 555
|
|
When you have a cluster-index, you should consider a PCTFREE percentage greater than 0. Not only on the INDEX but also on the table itself to reserve space. Also, as I mentioned before: if the input is sorted on the same key-value as the cluster, the LOAD does not have to sort. If not.......
|
|

12-21-10, 14:51
|
|
Registered User
|
|
Join Date: Aug 2010
Posts: 17
|
|
Hi All,
Here is the log of the LOAD command. The load took about 7 hours, the rebuild index took about 2 hours. Any idea to speed it up?
Thanks
Mike
select current timestamp AS Start_Time from sysibm.sysdummy1
START_TIME
--------------------------
2010-12-19-09.00.25.510050
1 record(s) selected.
load from /temp/TestTable.DAT of del modified by usedefaults no
chardel keepblanks totalfreespace=0 coldel~ decpt. savecount 1000000 messages /temp/load.log replace into test.testTable NONRECOVERABLE
Number of rows read = 167829642
Number of rows skipped = 0
Number of rows loaded = 167829642
Number of rows rejected = 0
Number of rows deleted = 1997
Number of rows committed = 167829642
select current timestamp AS End_Time from sysibm.sysdummy1
END_TIME
--------------------------
2010-12-19-18.40.13.335969
1 record(s) selected.
SQL3500W The utility is beginning the "LOAD" phase at time "12/19/2010
09:00:30.793157".
SQL3519W Begin Load Consistency Point. Input record count = "0".
SQL3520W Load Consistency Point was successful.
"PGIRA36L.log" [Read only] 739 lines, 22921 characters
SQL3520W Load Consistency Point was successful.
SQL3519W Begin Load Consistency Point. Input record count = "1000834".
SQL3520W Load Consistency Point was successful.
SQL3519W Begin Load Consistency Point. Input record count = "2001736".
....
SQL3500W The utility is beginning the "BUILD" phase at time "12/19/2010
16:29:41.928483".
SQL3213I The indexing mode is "REBUILD".
SQL3515W The utility has finished the "BUILD" phase at time "12/19/2010
18:38:16.842199".
SQL3500W The utility is beginning the "DELETE" phase at time "12/19/2010
18:38:17.582209".
SQL3509W The utility has deleted "1997" rows from the table.
SQL3515W The utility has finished the "DELETE" phase at time "12/19/2010
18:40:03.153953".
Number of rows read = 167829642
Number of rows skipped = 0
Number of rows loaded = 167829642
Number of rows rejected = 0
Number of rows deleted = 1997
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|