If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Microsoft SQL Server > locating duplicate records in a table

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-09-12, 11:39
rkrell rkrell is offline
Registered User
 
Join Date: Feb 2012
Location: Sears Tower, Chicago, Illinois
Posts: 3
Question locating duplicate records in a table

I have a table that has duplicate records with the exception of the ID and I am trying to write a while loop that would go through the table, locate all duplicate records based on a field called LASTNAME.
I am fairly new to T-SQL and found this forum while looking for an answer.
Can anyone point me in the right direction please?

Regards
Richard Krell
Reply With Quote
  #2 (permalink)  
Old 02-09-12, 13:35
PracticalProgram PracticalProgram is offline
Registered User
 
Join Date: Sep 2001
Location: Chicago, Illinois, USA
Posts: 551
This will give you a list of all of the LASTNAME records that are duplicated.

Code:
select  LASTNAME
        count(*) CountOfLASTNAME
from    YourTable
group
by      LASTNAME
having  count(*)<>1
__________________
Ken

Maverick Software Design

(847) 864-3600 x2
Reply With Quote
  #3 (permalink)  
Old 02-09-12, 15:22
rkrell rkrell is offline
Registered User
 
Join Date: Feb 2012
Location: Sears Tower, Chicago, Illinois
Posts: 3
Better Explanation

Thanks for that script Ken but I guess I didn't describe my problem well enough. My Bad!!

Someone on my team took a copy of a table and merged it into the original duplicating every record in the original table. Then there were many more updates made to the table before this error was discovered. So my problem is that I have the original table with a copy merged and also many new records.

I am trying to find a way to search the table locating all duplicate records which can be identified by their ID. Then drop those duplicates with the higher ID's. See below for example:

C:\Users\rkrell>SQLCMD
1> USE WEBDATA
2> GO
Changed database context to 'webdata'.
1> SELECT ID, LASTNAME, FIRSTNAME, USERKEY
2> FROM USER_ACCOUNT
3> WHERE USERKEY='RKRELL'
4> GO
ID LASTNAME FIRSTNAME USERKEY
----------- -------------------- -------------------- --------------
637 KRELL RICHARD RKRELL
1316 KRELL RICHARD RKRELL

(2 rows affected)

I do apologize for wasting your time and hope this explains my issue better. If not please let me know.
Reply With Quote
  #4 (permalink)  
Old 02-09-12, 15:39
PracticalProgram PracticalProgram is offline
Registered User
 
Join Date: Sep 2001
Location: Chicago, Illinois, USA
Posts: 551
Something like this? The LASTNAME, FIRSTNAME and USERKEY may be redundant, but that will be for you to figure-out.

Don't run this without testing it. First run the inner-most SELECT and see if the results are reasonable. Then run the next-higher SELECT and check those results. Then, run the SELECT with the * and see if those are the records you are expecting to delete. If so, you can then run the whole thing.

Code:
delete  YT
--select  *
from    YourTable YT
inner
join    (
        select	Step1.LASTNAME
                ,Step1.FIRSTNAME
                ,Step1.USERKEY
                max(ID) MaxOfID
        from    (
                select  LASTNAME
                        ,FIRSTNAME
                        ,USERKEY
                        count(*) CountOfLASTNAME
                from    YourTable
                group
                by      LASTNAME
                        ,FIRSTNAME
                        ,USERKEY
                having  count(*)<>1
                ) Step1
        group
        by      Step1.LASTNAME
                ,Step1.FIRSTNAME
                ,Step1.USERKEY
                ) Step2 on
                    YT.ID=Step2.MaxOfID
__________________
Ken

Maverick Software Design

(847) 864-3600 x2
Reply With Quote
  #5 (permalink)  
Old 02-09-12, 16:11
Wim Wim is offline
Registered User
 
Join Date: Nov 2004
Posts: 1,280
This is another way for deleting those doubles:
Code:
WITH CTE AS
(SELECT USER_ACCOUNT.ID, 
	USER_ACCOUNT.LASTNAME,
	ROW_NUMBER() OVER (PARTITION BY USER_ACCOUNT.LASTNAME ORDER BY USER_ACCOUNT.ID ASC) aS RowNum
FROM USER_ACCOUNT
	INNER JOIN (SELECT LASTNAME
			FROM USER_ACCOUNT
			GROUP BY LASTNAME
			HAVING count(*) > 1) AS T ON
		USER_ACCOUNT.LASTNAME = T.LASTNAME
)
DELETE UA
FROM USER_ACCOUNT AS UA
	INNER JOIN CTE ON
		CTE.ID = UA.ID
WHERE CTE.RowNum > 1
__________________
With kind regards . . . . . SQL Server 2000/2005/2008/2008 R2 Earned beers: 16
Wim
Beware of bugs in the above code; I have only proved it correct, not tried it. -- Donald Knuth
Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages
Reply With Quote
  #6 (permalink)  
Old 02-09-12, 16:43
rkrell rkrell is offline
Registered User
 
Join Date: Feb 2012
Location: Sears Tower, Chicago, Illinois
Posts: 3
Talking Resolved

I want to thank everyone for you assistance in this problem. I used Wim's example and it worked great so thank you Wim.

Again thanks to both of you.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On