Results 1 to 9 of 9
  1. #1
    Join Date
    Mar 2007
    Posts
    25

    Unanswered: Deleting duplicate rows from a table

    Hi Experts,

    I want to delete duplicate rows from a table. I can do it with a help of a temporary or an intermediate table.

    Is there any other way to accomplish the same task.

    Thanks in advance.

    Regards,
    Poornmima.

  2. #2
    Join Date
    May 2005
    Location
    South Africa
    Posts
    1,365
    Provided Answers: 1
    If you only have a few duplicates use
    set rowcount 1
    then delete the duplicate until no more duplicates e.g.
    Code:
    select 
    1 c1 into #t1 union all select 
    1 union all select 
    2 union all select 
    3 union all select 
    3 union all select 
    3 union all select 
    3
    
    select c1, count(*) cnt -- count duplicate
    from #t1
    group by c1
    having count(*)>1
    
    set rowcount 1
    select convert(varchar(26),getdate(),109),"Delete started" -- set @@rowcount
    while @@rowcount=1
    begin
      delete #t1
      from #t1 a
      where exists 
      (select 1 from #t1 b
       where a.c1=b.c1
       group by b.c1
       having count(*)>1
      )
    end
    set rowcount 0
    select convert(varchar(26),getdate(),109),"Delete completed"
    
    select * from #t1
    drop table #t1

  3. #3
    Join Date
    Mar 2007
    Posts
    25
    Hi,

    Thanks Pdreyer.

    I can also do the following with the help of temporary tables.

    Select distinct * into #temp from employee

    truncate table employee

    insert into employee select * from #temp

    But can you help me to come up with a solution which does not make use of a temporary table.

    Regards,
    Poornima.

  4. #4
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Poornima.

    What's the problem with using temporary tables? is it because the original table is too large to fit in the temp db?

    If you don't have too many dups then you could implement the first solution with variables so you loop round selecting out the key for an item that is duplicated then set the rowcount to 1 and delete rows with that key - then repeat until no dups found.

    Another solution like your own would be to select distinct all the rows into a new permanent table, then drop the original table, add the indexes and finally rename it the same as your old table.

    Mike

  5. #5
    Join Date
    May 2005
    Location
    South Africa
    Posts
    1,365
    Provided Answers: 1
    If you have a lot of duplicates then
    select distinct into new and then rename will be the best
    but since you insist on no temporary or intermediate table
    here's another slow way
    Code:
    declare c1 cursor for select c1 from #t1
    go
    declare @c1 int, @c1prev int
    set nocount on
    select @c1prev=min(c1)-1 from #t1
    open c1
    fetch c1 into @c1
    while @@sqlstatus=0
    begin
      if @c1=@c1prev
        delete #t1 where current of c1
      set @c1prev=@c1
      fetch c1 into @c1
    end
    set nocount off
    select c1 from #t1 
    close c1
    deallocate cursor c1

  6. #6
    Join Date
    Mar 2007
    Posts
    25
    Hi Pdreyer,

    Thanks for the piece of information.

    But if the rows are not duplicated in order the above procedure will again be in need of a temporary table to order the rows and perform deletion of duplicate rows. I mean if the rows are like

    id id1
    1 1
    2 2
    3 3
    1 1
    2 2

    Regards,
    Poornima

  7. #7
    Join Date
    May 2005
    Location
    South Africa
    Posts
    1,365
    Provided Answers: 1
    Yes, I should have included an order by on the query.
    But your tempdb need to be big enough to hold a sorted copy of your table

    Something else you can try.
    bcp out the table to a fifo file that gzip the data (I assume you won't have enough space otherwise).
    Recreate the table and add an index with ignore_dup_row or ignore_dup_key.
    Then bcp the data back from the gzip pipe.
    The ignore_dup_row option eliminates duplicates from a batch of data. ignore_dup_row cancels any insert or update that would create a duplicate row, but does not roll back the entire transaction.

    I guess you also never reorg your tables or rebuild clustered indexes for performance as this would also require free space equal to the largest table in your database.

  8. #8
    Join Date
    Mar 2007
    Posts
    25
    Hi Pdreyer,

    bcp idea sounds to be good if I manage to have enough space in the database.

    Regards,
    Poornima.

  9. #9
    Join Date
    Apr 2003
    Posts
    64
    Is there an identity column in the table (generally a good idea)? If so, you can remove all rows, which differ in nothing but the identity value. Assuming the identity column is named MYID and the equivalence-columns are COL1, COL2, ... COLN, try:
    Code:
            delete from TABLE where MYID in
                    (select MYID from TABLE group by COL1, COL2, ... COLN
                            having MYID > min(MYID))
    The above will remove records, whose identity column has the larger than minimum value among those, whose other columns are the same.

    The disadvantage is, you need to list all columns explicitly. The advantage is, you have a chance to skip the columns which might differ, but which should not prevent a row from being deleted anyway (because its other fields match).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •