If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Chunked deletes?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-06-11, 16:08
rocker86 rocker86 is offline
Registered User
 
Join Date: Jul 2009
Posts: 37
Chunked deletes?

Hi,
I'm trying to implement chunking in one of our delete statements and am currently stuck. Here's what my tables look like:

Table A
id....name


Table A1.......................TableA2.................. ..TableA3
id......name.................. id.....name................id......name


So there are things happening over the course of a process which delete rows from Tables A1,A2,A3. And now my job is to delete those rows in table A, whose ids are not present in any of the table A1-A3.
To do it in chunks and avoid the transaction logs from exploding, here's the SQL I wrote:


delete from A where id not in(
select id from A where id not in(
( select id from A1 union
select id from A2 union
select id from A3 union
select id from A4) fetch first 1000 rows only)


This query however is taking ages to run a few hundred thousand rows which need deletion. Performance wise, is there scope for re-doing the SQL in a more efficient way?
Would really appreciate some help
Thanks.

Last edited by rocker86; 02-06-11 at 16:35.
Reply With Quote
  #2 (permalink)  
Old 02-06-11, 17:03
rocker86 rocker86 is offline
Registered User
 
Join Date: Jul 2009
Posts: 37
Also, would replacing the 'not in' clause with a minus help here? As far as I know, if the inner most select is an uncorrelated subquery, it still is only executed once, which is exactly how a minus would run.
Reply With Quote
  #3 (permalink)  
Old 02-06-11, 18:24
Stealth_DBA Stealth_DBA is offline
Registered User
 
Join Date: May 2009
Posts: 472
rocker86, As a guess, I think it is fetching all rows from A1, then all rows from A2, doing a sort to remove duplicates (because of UNION), then all rows from A3, doing a sort to remove duplicates, then A4 and doing a sort to remove duplicates. Depending on the total number of rows in A1, A2, A3, and A4, this could be using up a lot of sort memory (and may be overflowing). This is all before the SELECT...FIRST 1000 ROWS ONLY is processed.

Besides, I don't think you will get the results you are looking for. If A contains ID values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and A1-A4 contains 2, 4, 6, 8, 10 and you change the Fetch to 3 rows, (to match the small set of data), what will happen is:

Select A1-A4 Union Subquery will return 2, 4, 6, 8, 10.
Select ID From A where not in ... Fetch First 3 Rows Only will return (probably) 2, 4, 6.
Delete from A where not in this last list of 3 rows will deleted 1, 3, 5, 7, 8, 9, 10. (since none of those numbers are in the list of 3 values).

However, 8 and 10 ARE in A1-A4. But they will still be deleted from A.
Reply With Quote
  #4 (permalink)  
Old 02-06-11, 18:41
rocker86 rocker86 is offline
Registered User
 
Join Date: Jul 2009
Posts: 37
Thanks for the reply.
And thanks for the correction as well! I think the intended SQL was this:

delete from A where id in(
select id from A where id not in(
select id from A1 union
select id from A2 union
select id from A3 union
select id from A4) fetch first 1000 rows only)

So based on the data you mentioned, the inner most unions would give 2,4,6,8
the 'not in' subselect would then give me 1,3,5,7,9. With a fetch size of 3, the delete would first remove 1,3,5 followed by 7,9

Last edited by rocker86; 02-06-11 at 19:53.
Reply With Quote
  #5 (permalink)  
Old 02-06-11, 20:04
Stealth_DBA Stealth_DBA is offline
Registered User
 
Join Date: May 2009
Posts: 472
You need to make sure you look at the same range of data in all the tables (so you don't delete something outside the range).

If ID has to not exists in all 4 A1-A4 tables this could work.

You will need to come up with the X and Y values. For example, if ID is a Numeric, you could start with 1 and 1000. Then use 1001 and 2000 for the next execution.
Code:
DELETE
FROM A
WHERE ID BETWEEN X AND Y
  AND ID NOT IN
               (SELECT ID
                FROM A1
                WHERE ID BETWEEN X AND Y
                  UNION
                SELECT ID
                FROM A2
                WHERE ID BETWEEN X AND Y
                  UNION
                SELECT ID
                FROM A3
                WHERE ID BETWEEN X AND Y
                  UNION
                SELECT ID
                FROM A4
                WHERE ID BETWEEN X AND Y
               )

OR

DELETE
FROM A
WHERE ID BETWEEN X AND Y
  AND ID NOT IN
               (SELECT ID
                FROM A1
                WHERE ID BETWEEN X AND Y
               )
  AND ID NOT IN
               (SELECT ID
                FROM A2
                WHERE ID BETWEEN X AND Y
               )
  AND ID NOT IN
               (SELECT ID
                FROM A3
                WHERE ID BETWEEN X AND Y
               )
  AND ID NOT IN
               (SELECT ID
                FROM A4
                WHERE ID BETWEEN X AND Y
               )
Reply With Quote
  #6 (permalink)  
Old 02-07-11, 05:44
rocker86 rocker86 is offline
Registered User
 
Join Date: Jul 2009
Posts: 37
Stealth_DBA,
your solution looks good, only except the fact that I would like to utilize the most out of the single transaction chunk of 1-1000. Only in your case, there might be chance that within that range of thousand, I might only be deleting a few handful of rows. Isn't there a independent paged way to delete without doing self joins and then using a fetch first or row_number?
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On