Results 1 to 11 of 11
  1. #1
    Join Date
    May 2002
    Posts
    62

    Unanswered: Delete query to remove duplicate rows

    Hi,

    I have a query which is taking hours to complete. It is a self join on one table. Can some one please help me to tune this, so that it is completed in an efficient time. The table has only 1000 rows. This query runs on SQL server 7.0 (Windows) from a stored procedure.

    The query is below.

    delete a
    from wtn a, wtn b
    where a.code = b.code
    and a.num = b.num
    and a.unit = 'DLMMLT'
    and b.unit != 'DLMMLT'

    The data is like the following:

    Table wtn
    ---------

    code num unit
    ---- ---- ----
    123456 2345671990 ifty
    123456 2345671990 DLMMLT
    456123 4567891230 jkty
    456123 4567891230 DLMMLT

    All I need to do is to delete the duplicate rows with "unit = DLMMLT". The duplicate rows should be identified with "code" and "num" columns. The primary key includes all the three columns. One index exists for all the three columns.

    I need this urgent. Any help is highly appreciated.
    Thanx..
    -Bheemsen

  2. #2
    Join Date
    Dec 2001
    Location
    Toronto, Canada
    Posts
    335
    Try this
    Code:
    DELETE a
    FROM wtn a
    WHERE EXISTS
          (
          SELECT  *
          FROM    wtn b
          WHERE   a.code = b.code 
          AND     a.num  = b.num 
          AND     a.unit = 'DLMMLT' 
          AND     b.unit != 'DLMMLT'
          )
    MCDBA

  3. #3
    Join Date
    May 2002
    Posts
    62
    Thanx I will give it a try. In my post, I was worng on number of rows in the table. The table has 15 million rows. But the rows to be deleted are around 1000.

    Thanx..
    -Bheemsen

  4. #4
    Join Date
    May 2002
    Posts
    62
    achorozy,

    It did not help. But, I had created a clustered index on "unit" column now. So the query is using index scan. But I am still thinking that the query can be rewritten in a good way.

    thanx..
    -Bheemsen

  5. #5
    Join Date
    Feb 2002
    Posts
    2,232
    How long does it take to delete 1 record from this table using explicit values of a row that you know needs to be deleted ?

  6. #6
    Join Date
    May 2002
    Posts
    62
    I did not try deleting just one record. It is a production query. The no. of rows to be deleted varies every day. There may be around 1000 to be deleted every day as I said in my previuos notes.

    Thanx..
    -Bheemsen

  7. #7
    Join Date
    Feb 2002
    Posts
    2,232
    I understand that you want to delete x number of rows, but to establish a baseline, a good start is to see how long 1 record takes to delete -- It may not be that you are deleting 1000 records that only takes a long time to complete - 1 record could take a long(respectively) time as well - which would mean that your query is ok (especially if you use explicit values of the record that you know needs to be deleted and not try to pull the information by trying to delete all the records).

  8. #8
    Join Date
    May 2002
    Posts
    62
    rnealejr,

    Sorry to get back to you so late. The reason was that the query tooks so long time to run. I ran the query on a mirror database to production with 18 millim rows. To delete 6 rows, the query took
    21 hrs 31 min 34 sec.

    I badly need help in tuning this query to reduce the time of execution.
    I am going to rerun the same query with a clustered index created on unit.

    Thanx..
    -Bheemsen

  9. #9
    Join Date
    Feb 2002
    Posts
    2,232
    I am still curious to see what happens when you delete 1 row explicity - with hard-coded values in your where clause:

    e.g.

    delete from wtn
    where wtn.code = 123456 and
    wtn.num = 2345671990 and
    wtn.unit = 'ifty'

    What is your table structure (include name, type, size) and your indexes (type, columns) and any other dependencies ?

  10. #10
    Join Date
    May 2002
    Posts
    62
    One row got deleted with hard coded values in less than one second.
    I tested the original delete query after creating the clustered index on 'unit'. The original delete query tihis time surprisingly finished in just 54 secs. But this may not happen always since there is lot of update/insert/delete activity on the table.

    That's why I wanted to tune the query too, besides having a clustered index.

    The table strucutre is as follows:

    name type size
    ------- -------- -------
    code char 6
    num decimal 9
    unit varchar 8

    one full-text index exists on all the three columns.
    Primary key includes all three columns.
    No other dependencies.

    thanx..
    -Bheemsen

  11. #11
    Join Date
    Jan 2002
    Posts
    12
    TRY THIS OUT


    DELETE FROM WTN
    WHERE UNIT IN
    (SELECT UNIT FROM WTN GROUP BY UNIT HAVING COUNT(UNIT)>1)
    NivasKhan

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •