Results 1 to 6 of 6
  1. #1
    Join Date
    Jul 2009
    Posts
    44

    Unanswered: Chunked deletes?

    Hi,
    I'm trying to implement chunking in one of our delete statements and am currently stuck. Here's what my tables look like:

    Table A
    id....name


    Table A1.......................TableA2.................. ..TableA3
    id......name.................. id.....name................id......name


    So there are things happening over the course of a process which delete rows from Tables A1,A2,A3. And now my job is to delete those rows in table A, whose ids are not present in any of the table A1-A3.
    To do it in chunks and avoid the transaction logs from exploding, here's the SQL I wrote:


    delete from A where id not in(
    select id from A where id not in(
    ( select id from A1 union
    select id from A2 union
    select id from A3 union
    select id from A4) fetch first 1000 rows only)


    This query however is taking ages to run a few hundred thousand rows which need deletion. Performance wise, is there scope for re-doing the SQL in a more efficient way?
    Would really appreciate some help
    Thanks.
    Last edited by rocker86; 02-06-11 at 17:35.

  2. #2
    Join Date
    Jul 2009
    Posts
    44
    Also, would replacing the 'not in' clause with a minus help here? As far as I know, if the inner most select is an uncorrelated subquery, it still is only executed once, which is exactly how a minus would run.

  3. #3
    Join Date
    May 2009
    Posts
    508
    Provided Answers: 1
    rocker86, As a guess, I think it is fetching all rows from A1, then all rows from A2, doing a sort to remove duplicates (because of UNION), then all rows from A3, doing a sort to remove duplicates, then A4 and doing a sort to remove duplicates. Depending on the total number of rows in A1, A2, A3, and A4, this could be using up a lot of sort memory (and may be overflowing). This is all before the SELECT...FIRST 1000 ROWS ONLY is processed.

    Besides, I don't think you will get the results you are looking for. If A contains ID values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 and A1-A4 contains 2, 4, 6, 8, 10 and you change the Fetch to 3 rows, (to match the small set of data), what will happen is:

    Select A1-A4 Union Subquery will return 2, 4, 6, 8, 10.
    Select ID From A where not in ... Fetch First 3 Rows Only will return (probably) 2, 4, 6.
    Delete from A where not in this last list of 3 rows will deleted 1, 3, 5, 7, 8, 9, 10. (since none of those numbers are in the list of 3 values).

    However, 8 and 10 ARE in A1-A4. But they will still be deleted from A.

  4. #4
    Join Date
    Jul 2009
    Posts
    44
    Thanks for the reply.
    And thanks for the correction as well! I think the intended SQL was this:

    delete from A where id in(
    select id from A where id not in(
    select id from A1 union
    select id from A2 union
    select id from A3 union
    select id from A4) fetch first 1000 rows only)

    So based on the data you mentioned, the inner most unions would give 2,4,6,8
    the 'not in' subselect would then give me 1,3,5,7,9. With a fetch size of 3, the delete would first remove 1,3,5 followed by 7,9
    Last edited by rocker86; 02-06-11 at 20:53.

  5. #5
    Join Date
    May 2009
    Posts
    508
    Provided Answers: 1
    You need to make sure you look at the same range of data in all the tables (so you don't delete something outside the range).

    If ID has to not exists in all 4 A1-A4 tables this could work.

    You will need to come up with the X and Y values. For example, if ID is a Numeric, you could start with 1 and 1000. Then use 1001 and 2000 for the next execution.
    Code:
    DELETE
    FROM A
    WHERE ID BETWEEN X AND Y
      AND ID NOT IN
                   (SELECT ID
                    FROM A1
                    WHERE ID BETWEEN X AND Y
                      UNION
                    SELECT ID
                    FROM A2
                    WHERE ID BETWEEN X AND Y
                      UNION
                    SELECT ID
                    FROM A3
                    WHERE ID BETWEEN X AND Y
                      UNION
                    SELECT ID
                    FROM A4
                    WHERE ID BETWEEN X AND Y
                   )
    
    OR
    
    DELETE
    FROM A
    WHERE ID BETWEEN X AND Y
      AND ID NOT IN
                   (SELECT ID
                    FROM A1
                    WHERE ID BETWEEN X AND Y
                   )
      AND ID NOT IN
                   (SELECT ID
                    FROM A2
                    WHERE ID BETWEEN X AND Y
                   )
      AND ID NOT IN
                   (SELECT ID
                    FROM A3
                    WHERE ID BETWEEN X AND Y
                   )
      AND ID NOT IN
                   (SELECT ID
                    FROM A4
                    WHERE ID BETWEEN X AND Y
                   )

  6. #6
    Join Date
    Jul 2009
    Posts
    44
    Stealth_DBA,
    your solution looks good, only except the fact that I would like to utilize the most out of the single transaction chunk of 1-1000. Only in your case, there might be chance that within that range of thousand, I might only be deleting a few handful of rows. Isn't there a independent paged way to delete without doing self joins and then using a fetch first or row_number?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •