If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > db2 delete -- Interesting Question

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old
Registered User
 
Join Date: Mar 2003
Posts: 69
db2 delete -- Interesting Question

Hi All,
I have a very interesting question. I have a table with around 50 million records. Out of that, 30 million records can be deletable. Now my delete from that table query always fails saying the "time out " and can not able to delete any of the records. So I wrote a visual basic program that can execute in the night and i am trying to delete 10 thousand records at shot. So I am able to delete some 50 - 80 thousand records in the whole night ( as i can not do it in the morning time in production). But again the next day it is getting added with some 40 thousand to 60 thousand records. So what is the best way I need to go for deletion ?

- Thanks,
Anto
Reply With Quote
  #2 (permalink)  
Old
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 5,399
Re: db2 delete -- Interesting Question

Quote:
Originally posted by antodomnic
I have a table with around 50 million records. Out of that, 30 million records can be deletable. Now my delete from that table query always fails saying the "time out " and can not able to delete any of the records.
Some options that come to mind:

- define the table as NOT LOGGED INITIALLY. Deactivate logging before you start deleting records;
- export the records you would like to keep; then LOAD ... REPLACE them back into the table;
- drop indexes on the table before deleting records; re-create indexes when you're done;

Also, if you have any ON DELETE CASCADE constraints on that table removing them will greatly improve performance.
Reply With Quote
  #3 (permalink)  
Old
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,649
Re: db2 delete -- Interesting Question

IMHO, n_i's second option is the best for your scenario ... If you are deleting 60% of a 50 million row table, there will be a huge number of empty/partially filled pages which will be a huge hit on performance ...

It depends on the refrential constraints you have on the table ... If you don't have any, my option will be to create another table with a similar structure, export the 20million records you want and load into the new table , test the data ... When you are happy, drop the original table and rename the new table .... You might need to re-create aliases, views, triggers, etc ... This will be easy to accompolish and also post-change performance will be good ....

BTW, 50-80K records in a night(how many hours?) seems to be a small number ...

Alternatively, you can write a stored procedure to delete the records, committing, may be, every 10000 records ... Here's a sample :
http://dbforums.com/showthread.php?t...17#post1715617

Cheers

Sathyaram



Quote:
Originally posted by n_i
Some options that come to mind:

- define the table as NOT LOGGED INITIALLY. Deactivate logging before you start deleting records;
- export the records you would like to keep; then LOAD ... REPLACE them back into the table;
- drop indexes on the table before deleting records; re-create indexes when you're done;

Also, if you have any ON DELETE CASCADE constraints on that table removing them will greatly improve performance.
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #4 (permalink)  
Old
Registered User
 
Join Date: Mar 2003
Posts: 343
Alternatively, create another table with the same partitioning key but without any constraints or indexes(including PK) and in the same schema but using not logged initially. Alter the session, activate not logged initially and insert into the new table the rows you want to keep. Since this is a collocated not logged non-constrained insert it should be very fast. Then rename first the current table to maybe _old and then the _new table to the current name and follow Satyarams's other instructions. For the renames. ensure that the tables are in the same schema.

We have done this and moved millions(in the 100's) of rows in 2-5 hours. It also reorgs the table. You get to keep the original for verification and do not have to wait for the export.

Hope this helps
Reply With Quote
  #5 (permalink)  
Old
Registered User
 
Join Date: Jun 2003
Location: Canada
Posts: 37
Re: db2 delete -- Interesting Question

If you want to avoid having to run a separate RUNSTATS, collect some inline stats from LOAD/REORG. This tends to be faster than running RUNSTATS as an add-on job.

Cheers,

Julius
Reply With Quote
  #6 (permalink)  
Old
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,649
Re: db2 delete -- Interesting Question

Remember, collecting statistics is possible only if you use the REPLACE Option ...

Also, you can create indexes on the table before loading the data(if you are using LOAD) ... This will perform better than adding indexes after the LOAD

Cheers

Sathyaram


Quote:
Originally posted by jsasvari
If you want to avoid having to run a separate RUNSTATS, collect some inline stats from LOAD/REORG. This tends to be faster than running RUNSTATS as an add-on job.

Cheers,

Julius
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #7 (permalink)  
Old
Registered User
 
Join Date: Mar 2003
Posts: 343
If this is a large table, I disagree that indexes should be built before loading the data. It is true that the index is physically created after the load, but what I have noticed is that the sorting required to build the index takes place while the load is in progress. And for a large table with multiple indexes, it is a significant overhead. One example is we have a table which currently has 3.8B rows across 34 logical nodes. In september, 2002, it had approx. 2.3 B rows and we needed to migrate the rows over to a new table to reorg it. I thought it would be good to keep the PK on the new table for the reasons mentioned above. It took extremely long to load(order of days). Following some issues in May we had to export and load the whole DW - at that time this table had about 3.4B rows. This time we dropped all indexes including PK and the table took between 6 and 8 hours to load. I subsequently learnt that it was the index sort which affects the perfomance.

Hope this helps.
Reply With Quote
  #8 (permalink)  
Old
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,649
I'm quiet surprised that the sort for index building 'starts' before the load phase completes ... (though have never checked if there is a sort during LOAD Phase) Maybe that is how it behaves for loads of the size you mention .... I have used LOAD for a table with about 15 million records and I observed that using LOAD on a table with indexes was more efficient than LOAD followed by index creation .....

As you will know, index building in any case(in the LOAD or CREATE INDEX) will be sorting the data ...

Cheers

Sathyaram


Quote:
Originally posted by cchattoraj
If this is a large table, I disagree that indexes should be built before loading the data. It is true that the index is physically created after the load, but what I have noticed is that the sorting required to build the index takes place while the load is in progress. And for a large table with multiple indexes, it is a significant overhead. One example is we have a table which currently has 3.8B rows across 34 logical nodes. In september, 2002, it had approx. 2.3 B rows and we needed to migrate the rows over to a new table to reorg it. I thought it would be good to keep the PK on the new table for the reasons mentioned above. It took extremely long to load(order of days). Following some issues in May we had to export and load the whole DW - at that time this table had about 3.4B rows. This time we dropped all indexes including PK and the table took between 6 and 8 hours to load. I subsequently learnt that it was the index sort which affects the perfomance.

Hope this helps.
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #9 (permalink)  
Old
Registered User
 
Join Date: Mar 2003
Posts: 343
It sorts while the load takes place - I agree that while loading 15 million rows it may not be a significant overhead(we do that everyday), especially if the load is the only process running at the time.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On