Unanswered: Deleting duplicates - has my predecessor done wrong?
Hi, first post, and also new to Oracle, so double-scary...
I've inherited an Oracle database and on looking at one procedure I'm seeing SQL which is intended to remove duplicates from a table.
- declares a CURSOR to find the duplicates, which selects four fields and a count(*), uses a HAVING COUNT(*) > 1 and a GROUP BY on the same four fields
- Loops the cursor, copying the first instance of each duplicate using "rownum < 2" into a temp table
- delete ALL duplicates in source table
- re-INSERT the uniques/originals back into source table from the temp table
To me (MS SQL experience) this looks quite inefficient: a quick google yields lots of single-statement methods of reducing duplicates down to unique records (e.g. Delete duplicate rows from Oracle tables)
I guess my question to the Oracle community is, since I'm a beginner to Oracle, am I looking at a really inefficient bit of SQL, or did the developer before me know some secret performance reason why this "select, copy out, delete, copy back again" method was adopted?
RBAR methods are always inefficient. Yes, your predecessor may have produced some inefficient code. I think the Burleson's implementations you linked to are far more performant in every way. You should do it that way in future.
Should you change the current implementation? It's probably quite a significant risk. If it is in fact doing what it is supposed to and the performance hit is not hurting anything badly, then I would say leave it. There may be some sneaky reason why the other guy did it that way (or maybe not).
The change will require a lot of testing to make sure your new way does not bring in bugs.