Results 1 to 12 of 12
  1. #1
    Join Date
    Aug 2006
    Posts
    39

    Unanswered: delete duplicates

    Hi,
    Trying to delete duplicated records.
    Not sure why this query does not work. It just gives the message "Processing query" and nothing seems to happen. Even left it for a few mins but nothing seem to have happened. Please note that only one of the records should remains.
    P.S. There may be up to several duplicated records and i need to only keep one of them.
    Thanks

    DELETE FROM Test.tbleTest WHERE rowid not in
    (SELECT MIN(rowid)
    FROM Test.tbleTest
    GROUP BY TCOUNTRY, TDATATYPE, DTEFFECTIVEDATE, IFAMILYSIZE, TGRADE, DAMOUNT;
    Last edited by fmardani; 11-13-06 at 06:10.

  2. #2
    Join Date
    Aug 2003
    Location
    Guwahati, India
    Posts
    33

    How big is your table !!

    Hi there,

    You have written group by a lot of columns. If the no of records are more then oracle will take a lot of time !! you should always use as miminum as possible in grouping queries !!
    G J Shankar Nath
    IT Consultant, Vedswasti Services Pvt. Ltd
    Guwahati-781007
    India

  3. #3
    Join Date
    Jan 2004
    Location
    Croatia, Europe
    Posts
    4,094
    Provided Answers: 4
    I don't know why wouldn't your query delete duplicate records. Here's an example; first, create working table and add a duplicate record into it:
    Code:
    SQL> create table brisime as select * From dept;
    
    Table created.
    
    SQL> select * from brisime;
    
        DEPTNO DNAME          DLOKAC
    ---------- -------------- -------------
            50 MARKETING      ZAGREB
            10 PRODAJA        BOSTON
            20 NABAVA         NEW YORK
            30 ERC            PARIS
    
    SQL> insert into brisime values (30, 'ERC', 'PARIS');
    
    1 row created.
    
    SQL> commit;
    
    Commit complete.
    
    SQL> select * From brisime;
    
        DEPTNO DNAME          DLOKAC
    ---------- -------------- -------------
            50 MARKETING      ZAGREB
            10 PRODAJA        BOSTON
            20 NABAVA         NEW YORK
            30 ERC            PARIS
            30 ERC            PARIS
    Delete one of "30-ERC-PARIS' records:
    Code:
    SQL> delete from brisime
      2  where rowid not in (select min(rowid) from brisime
      3                      group by deptno, dname, dlokac);
    
    1 row deleted.
    
    SQL> select * from brisime;
    
        DEPTNO DNAME          DLOKAC
    ---------- -------------- -------------
            50 MARKETING      ZAGREB
            10 PRODAJA        BOSTON
            20 NABAVA         NEW YORK
            30 ERC            PARIS
    
    SQL> rollback;
    
    Rollback complete.
    Here's another way to do that:
    Code:
    SQL> delete from brisime a
      2  where rowid > (select min(rowid) from brisime b
      3                 where b.deptno = a.deptno);
    
    1 row deleted.
    
    SQL> select * From brisime;
    
        DEPTNO DNAME          DLOKAC
    ---------- -------------- -------------
            50 MARKETING      ZAGREB
            10 PRODAJA        BOSTON
            20 NABAVA         NEW YORK
            30 ERC            PARIS
    
    SQL>
    There are different approaches to this problem; see some of them here.

  4. #4
    Join Date
    May 2004
    Posts
    95
    I usually use this:

    Code:
    DELETE FROM Test.tbleTest 
    WHERE Test.tbleTest.column_id IN
    (SELECT MAX(Test.tbleTest.column_id)
    FROM Test.tbleTest
    HAVING COUNT(1) > 1
    GROUP BY TCOUNTRY, TDATATYPE, DTEFFECTIVEDATE, IFAMILYSIZE, TGRADE, DAMOUNT;

  5. #5
    Join Date
    Jun 2003
    Location
    West Palm Beach, FL
    Posts
    2,713

    Talking


    Using GROUP BY is futile and wastefull in a query where you return MIN(ROWID)!

    Also you need to specify the INDEX (or PRIMARY KEY) that defines the duplicate.

    Replace the GROUP BY with the "key" columns kinda like this:
    Code:
    Delete From Test.Tbletest D
            Where Rowid Not In (
               Select Min(Rowid) From Test.Tbletest B
                Where B.Tcountry	       = D.Tcountry
    	      And B.Tdatatype	       = D.Tdatatype
    	      And B.Dteffectivedate    = D.Dteffectivedate
    	      And B.Ifamilysize        = D.Ifamilysize
    	      And B.Tgrade	       = D.Tgrade )

    PS: It may work faster if you create a temporary index on those columns.


    The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

  6. #6
    Join Date
    Jun 2004
    Location
    Liverpool, NY USA
    Posts
    2,509
    Use the second method.

    Code:
    delete from brisime a2
    where rowid > (select min(rowid) 
                         from brisime b 
                         where b.deptno = a.deptno);
    If you have a lot of rows, and the testing column is indexed, this will normally run a lot faster. comparing a value (rowid) using in against a selection that could contain millions of rows takes a long time for each row tested.
    Bill
    You do not need a parachute to skydive. You only need a parachute to skydive twice.

  7. #7
    Join Date
    Jun 2003
    Location
    West Palm Beach, FL
    Posts
    2,713

    Talking


    Yes, true the best would be like this:
    Code:
    Delete From Test.Tbletest D
            Where Rowid > (
               Select Min(Rowid) From Test.Tbletest B
                Where B.Tcountry	       = D.Tcountry
    	      And B.Tdatatype	       = D.Tdatatype
    	      And B.Dteffectivedate    = D.Dteffectivedate
    	      And B.Ifamilysize        = D.Ifamilysize
    	      And B.Tgrade	       = D.Tgrade )


    The person who says it can't be done should not interrupt the person doing it. -- Chinese proverb

  8. #8
    Join Date
    Jan 2004
    Location
    Croatia, Europe
    Posts
    4,094
    Provided Answers: 4
    A Croatian proverb: He who reads posts #3 sees the same queries he posts a month or so later.

  9. #9
    Join Date
    Sep 2004
    Location
    London, UK
    Posts
    565
    Well this is a month late, but shouldn't it just have been:

    Code:
    DELETE tbletest
    WHERE  rowid IN
           ( SELECT LEAD(rowid) OVER
                    ( PARTITION BY tcountry, tdatatype, dteffectivedate, ifamilysize, tgrade, damount
                      ORDER BY NULL )
             FROM   tbletest );
    Last edited by WilliamR; 12-27-06 at 09:36.

  10. #10
    Join Date
    May 2004
    Location
    Dominican Republic
    Posts
    721
    WilliamR, that doesn't even compile.

  11. #11
    Join Date
    Jan 2004
    Location
    Croatia, Europe
    Posts
    4,094
    Provided Answers: 4
    Oh come on, don't be so cruel! All it misses is a FROM keyword, but once you have it, it deletes duplicate record.

  12. #12
    Join Date
    Sep 2004
    Location
    London, UK
    Posts
    565
    Thanks Littlefoot, yes I forgot the FROM clause.

    My point was really about using analytics.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •