Results 1 to 3 of 3
  1. #1
    Join Date
    Jan 2012

    Unanswered: Deleting duplicates - has my predecessor done wrong?

    Hi, first post, and also new to Oracle, so double-scary...

    I've inherited an Oracle database and on looking at one procedure I'm seeing SQL which is intended to remove duplicates from a table.

    Effectively it:
    - declares a CURSOR to find the duplicates, which selects four fields and a count(*), uses a HAVING COUNT(*) > 1 and a GROUP BY on the same four fields
    - Loops the cursor, copying the first instance of each duplicate using "rownum < 2" into a temp table
    - delete ALL duplicates in source table
    - re-INSERT the uniques/originals back into source table from the temp table

    To me (MS SQL experience) this looks quite inefficient: a quick google yields lots of single-statement methods of reducing duplicates down to unique records (e.g. Delete duplicate rows from Oracle tables)

    I guess my question to the Oracle community is, since I'm a beginner to Oracle, am I looking at a really inefficient bit of SQL, or did the developer before me know some secret performance reason why this "select, copy out, delete, copy back again" method was adopted?


  2. #2
    Join Date
    Oct 2002
    Cape Town, South Africa

    Thumbs up

    RBAR methods are always inefficient. Yes, your predecessor may have produced some inefficient code. I think the Burleson's implementations you linked to are far more performant in every way. You should do it that way in future.

    Should you change the current implementation? It's probably quite a significant risk. If it is in fact doing what it is supposed to and the performance hit is not hurting anything badly, then I would say leave it. There may be some sneaky reason why the other guy did it that way (or maybe not).

    The change will require a lot of testing to make sure your new way does not bring in bugs.

  3. #3
    Join Date
    Nov 2003
    Provided Answers: 23
    Quote Originally Posted by clumsybiker View Post
    am I looking at a really inefficient bit of SQL
    Yes you are.
    That approach is extremely bad and slow.

    The examples in the link you have shown are way better than the cursor approach.

    The "Use self-join to delete duplicate rows" is probably the best one.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts