If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Oracle > Deleting duplicates - has my predecessor done wrong?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-13-12, 06:38
clumsybiker clumsybiker is offline
Registered User
 
Join Date: Jan 2012
Posts: 1
Deleting duplicates - has my predecessor done wrong?

Hi, first post, and also new to Oracle, so double-scary...

I've inherited an Oracle database and on looking at one procedure I'm seeing SQL which is intended to remove duplicates from a table.

Effectively it:
- declares a CURSOR to find the duplicates, which selects four fields and a count(*), uses a HAVING COUNT(*) > 1 and a GROUP BY on the same four fields
- Loops the cursor, copying the first instance of each duplicate using "rownum < 2" into a temp table
- delete ALL duplicates in source table
- re-INSERT the uniques/originals back into source table from the temp table

To me (MS SQL experience) this looks quite inefficient: a quick google yields lots of single-statement methods of reducing duplicates down to unique records (e.g. Delete duplicate rows from Oracle tables)

I guess my question to the Oracle community is, since I'm a beginner to Oracle, am I looking at a really inefficient bit of SQL, or did the developer before me know some secret performance reason why this "select, copy out, delete, copy back again" method was adopted?

Thanks,
Clumsy
Reply With Quote
  #2 (permalink)  
Old 01-13-12, 06:50
dayneo dayneo is offline
Registered User
 
Join Date: Oct 2002
Location: Cape Town, South Africa
Posts: 161
Thumbs up

RBAR methods are always inefficient. Yes, your predecessor may have produced some inefficient code. I think the Burleson's implementations you linked to are far more performant in every way. You should do it that way in future.

Should you change the current implementation? It's probably quite a significant risk. If it is in fact doing what it is supposed to and the performance hit is not hurting anything badly, then I would say leave it. There may be some sneaky reason why the other guy did it that way (or maybe not).

The change will require a lot of testing to make sure your new way does not bring in bugs.
Reply With Quote
  #3 (permalink)  
Old 01-13-12, 06:52
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,408
Quote:
Originally Posted by clumsybiker View Post
am I looking at a really inefficient bit of SQL
Yes you are.
That approach is extremely bad and slow.

The examples in the link you have shown are way better than the cursor approach.

The "Use self-join to delete duplicate rows" is probably the best one.
Reply With Quote
Reply

Tags
duplicate records, inefficient

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On