| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

02-06-11, 16:08
|
|
Registered User
|
|
Join Date: Jul 2009
Posts: 37
|
|
|
Chunked deletes?
|
|
Hi,
I'm trying to implement chunking in one of our delete statements and am currently stuck. Here's what my tables look like:
Table A
id....name
Table A1.......................TableA2.................. ..TableA3
id......name.................. id.....name................id......name
So there are things happening over the course of a process which delete rows from Tables A1,A2,A3. And now my job is to delete those rows in table A, whose ids are not present in any of the table A1-A3.
To do it in chunks and avoid the transaction logs from exploding, here's the SQL I wrote:
delete from A where id not in(
select id from A where id not in(
( select id from A1 union
select id from A2 union
select id from A3 union
select id from A4) fetch first 1000 rows only)
This query however is taking ages to run a few hundred thousand rows which need deletion. Performance wise, is there scope for re-doing the SQL in a more efficient way?
Would really appreciate some help
Thanks.
|
Last edited by rocker86; 02-06-11 at 16:35.
|

02-06-11, 17:03
|
|
Registered User
|
|
Join Date: Jul 2009
Posts: 37
|
|
Also, would replacing the 'not in' clause with a minus help here? As far as I know, if the inner most select is an uncorrelated subquery, it still is only executed once, which is exactly how a minus would run.
|
|

02-06-11, 18:24
|
|
Registered User
|
|
Join Date: May 2009
Posts: 472
|
|
|
|
rocker86, As a guess, I think it is fetching all rows from A1, then all rows from A2, doing a sort to remove duplicates (because of UNION), then all rows from A3, doing a sort to remove duplicates, then A4 and doing a sort to remove duplicates. Depending on the total number of rows in A1, A2, A3, and A4, this could be using up a lot of sort memory (and may be overflowing). This is all before the SELECT...FIRST 1000 ROWS ONLY is processed.
Besides, I don't think you will get the results you are looking for. If A contains ID values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and A1-A4 contains 2, 4, 6, 8, 10 and you change the Fetch to 3 rows, (to match the small set of data), what will happen is:
Select A1-A4 Union Subquery will return 2, 4, 6, 8, 10.
Select ID From A where not in ... Fetch First 3 Rows Only will return (probably) 2, 4, 6.
Delete from A where not in this last list of 3 rows will deleted 1, 3, 5, 7, 8, 9, 10. (since none of those numbers are in the list of 3 values).
However, 8 and 10 ARE in A1-A4. But they will still be deleted from A.
|
|

02-06-11, 18:41
|
|
Registered User
|
|
Join Date: Jul 2009
Posts: 37
|
|
Thanks for the reply.
And thanks for the correction as well! I think the intended SQL was this:
delete from A where id in(
select id from A where id not in(
select id from A1 union
select id from A2 union
select id from A3 union
select id from A4) fetch first 1000 rows only)
So based on the data you mentioned, the inner most unions would give 2,4,6,8
the 'not in' subselect would then give me 1,3,5,7,9. With a fetch size of 3, the delete would first remove 1,3,5 followed by 7,9
|
Last edited by rocker86; 02-06-11 at 19:53.
|

02-06-11, 20:04
|
|
Registered User
|
|
Join Date: May 2009
Posts: 472
|
|
You need to make sure you look at the same range of data in all the tables (so you don't delete something outside the range).
If ID has to not exists in all 4 A1-A4 tables this could work.
You will need to come up with the X and Y values. For example, if ID is a Numeric, you could start with 1 and 1000. Then use 1001 and 2000 for the next execution.
Code:
DELETE
FROM A
WHERE ID BETWEEN X AND Y
AND ID NOT IN
(SELECT ID
FROM A1
WHERE ID BETWEEN X AND Y
UNION
SELECT ID
FROM A2
WHERE ID BETWEEN X AND Y
UNION
SELECT ID
FROM A3
WHERE ID BETWEEN X AND Y
UNION
SELECT ID
FROM A4
WHERE ID BETWEEN X AND Y
)
OR
DELETE
FROM A
WHERE ID BETWEEN X AND Y
AND ID NOT IN
(SELECT ID
FROM A1
WHERE ID BETWEEN X AND Y
)
AND ID NOT IN
(SELECT ID
FROM A2
WHERE ID BETWEEN X AND Y
)
AND ID NOT IN
(SELECT ID
FROM A3
WHERE ID BETWEEN X AND Y
)
AND ID NOT IN
(SELECT ID
FROM A4
WHERE ID BETWEEN X AND Y
)
|
|

02-07-11, 05:44
|
|
Registered User
|
|
Join Date: Jul 2009
Posts: 37
|
|
Stealth_DBA,
your solution looks good, only except the fact that I would like to utilize the most out of the single transaction chunk of 1-1000. Only in your case, there might be chance that within that range of thousand, I might only be deleting a few handful of rows. Isn't there a independent paged way to delete without doing self joins and then using a fetch first or row_number?
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|