If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Microsoft SQL Server > Delete Duplicate Records-without primary key

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old
Registered User
 
Join Date: Feb 2012
Posts: 7
Delete Duplicate Records-without primary key

Hi All ,

I want to delete the duplicate record from a table keeping 1 record aside.

my base table is-info

id name class
2 abc 6a
3 abc 6a
4 abc 6a
1 abc 6a
2 abc 6a
4 abc 6a
4 abc 6a
3 abc 6a
3 abc 6a
1 abc 6a
2 abc 6a
5 abc 6a

id-int
name-text
class-varchar
(there is no primary key in this table)

Now i want the result in following way:
id name class
2 abc 6a
3 abc 6a
4 abc 6a
1 abc 6a

I have tried the following query and its running fine but its not a dynamic stuff.

DELETE top (SELECT COUNT(*)-1 FROM aaa WHERE id ='3') --or put some number
FROM aaa
WHERE id ='3'-- or put some number

So i was wondering if some one could help me in this.

Many thanks,
Preetpal kapoor
Reply With Quote
  #2 (permalink)  
Old
Registered User
 
Join Date: Nov 2003
Posts: 2,780
If you have enough disk space and then you could do the following:
Code:
SELECT DISTINCT id, name, class
   into #temp
FROM aaa;
TRUNCATE TABLE aaa;
INSERT INTO aaa (id, name, class)
SELECT id, name, class
FROM #temp;
COMMIT;
Reply With Quote
  #3 (permalink)  
Old
Registered User
 
Join Date: Feb 2012
Posts: 7
Sorry I cant make any temp table.
I know this method but this should be done through a single query.
Reply With Quote
  #4 (permalink)  
Old
Registered User
 
Join Date: Nov 2003
Posts: 2,780
Quote:
Originally Posted by preetpalkpaoor View Post
Sorry I cant make any temp table.
Why is that so?
Reply With Quote
  #5 (permalink)  
Old
Registered User
 
Join Date: Feb 2012
Posts: 7
Because i have to optimize the query execution time.
And by creating a temp table and then deleting it will slow down the processing of a query.

This query will be used on a real time system and if it hampers the performance then it is of no use.
Reply With Quote
  #6 (permalink)  
Old
Registered User
 
Join Date: Nov 2003
Posts: 2,780
Quote:
Originally Posted by preetpalkpaoor View Post
Because i have to optimize the query execution time.
And by creating a temp table and then deleting it will slow down the processing of a query.

This query will be used on a real time system and if it hampers the performance then it is of no use.
So did you test if if the execution time was too slow?
And if this "hampers" the performance?
Reply With Quote
  #7 (permalink)  
Old
Registered User
 
Join Date: Feb 2012
Posts: 7
At present the table which i have provided you in my query is a fake one (or just for practice) and with this much data it will not hamper .

But on my real time data it is hampering because i have tried this earlier and sued by my senior.
Reply With Quote
  #8 (permalink)  
Old
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 14,450
With no PK, no index, and billions of rows there is no answer. That schema design is so careless that it makes a solution impossible.

There is so much that you haven't told us about the business rules that affect your table that we can only play guessing games. You really need to hire a professional consultant to help you resolve this issue or you need to work on giving us a better description of what you want.

What is the time window that you can work within? What kind of hardware are you using? How many rows are there in the table you are trying to de-dupe? How many of those rows are duplicates? What are the business rules that govern this table? What kind of processing environment does this server work within? What other things might we need to know? What have you tried, and why weren't your attempts acceptable?

Give us the information that we need to solve your problem for you, you might be surprised at what we can do!

-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
  #9 (permalink)  
Old
Registered User
 
Join Date: Feb 2012
Location: DUBAI
Posts: 6
Deleting Duplicate Records From MSSQL

-- Change the #Duplicate_Records to your own table name
-- Change the field as your own tables field
-- Take backup of the original table and test before applying on live


select * from #Duplicate_Records

--Show Duplicate Records in Duplicate_Records
SELECT id, COUNT(*) FROM #Duplicate_Records
GROUP BY id HAVING COUNT(*) > 1
ORDER BY COUNT(*) DESC

SET ROWCOUNT 1
SELECT @@rowcount
WHILE @@rowcount > 0
DELETE DupRec FROM #Duplicate_Records as DupRec
INNER JOIN
(SELECT id FROM #Duplicate_Records GROUP BY id HAVING count(*) > 1)
AS c ON c.id = DupRec.id
SET ROWCOUNT 0

SELECT * FROM #Duplicate_Records

-- akhilesh
Reply With Quote
  #10 (permalink)  
Old
Registered User
 
Join Date: Dec 2006
Posts: 24
do a select distinct or reduce doubletts by grouping correctly

with the result do a select into on a new table beeing identic plus a primary key ( int autoincrement or whatever)

rename the old table to garbage_without keye
rename the new table to the name of the old one.

...and never ever create a new table without an identity column
__________________
an Apple a day keeps Dr. Watson away !
Reply With Quote
  #11 (permalink)  
Old
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 20,000
Quote:
Originally Posted by wegus View Post
...and never ever create a new table without an identity column
you probably meant to say without a primary key

having an identitly column is no guarantee that "duplicates" won't be entered

(of course, they aren't true duplicates, they'ss all have different identity column values, but the rest of the columns can easily be duplicated)
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #12 (permalink)  
Old
Registered User
 
Join Date: Dec 2006
Posts: 24
Quote:
Originally Posted by r937
having an identitly column is no guarantee that "duplicates" won't be entered
in some cases duplicates might be possible in those cases it is good to have an identity column. In other cases you might not want to save duplicates, then you ougt to ad a primary key. Of course the latter is what mostly happens, but the first case might also occur. So i did mean "at least identities" or even better primary keys. That is what i intendet to say.
__________________
an Apple a day keeps Dr. Watson away !
Reply With Quote
  #13 (permalink)  
Old
Registered User
 
Join Date: Nov 2003
Posts: 2,780
Quote:
Originally Posted by preetpalkpaoor View Post
At present the table which i have provided you in my query is a fake one (or just for practice) and with this much data it will not hamper .

But on my real time data it is hampering because i have tried this earlier and sued by my senior
I don't believe that any statement that is able to delete those duplicates without using a temp table will be more efficient (= will hamper the server less) than the solution with a temp table
Reply With Quote
  #14 (permalink)  
Old
Registered User
 
Join Date: Feb 2012
Posts: 7
Thanks every one for your support.

i got 3 methods to do that.
1. To make temp table with i will check with my real time DB and is the easiest method.
2. With CTE. Actually i dnt have any knowledge about CTE.
3.It was my way but a lengthy one.

DELETE top (SELECT COUNT(*)-1 FROM aaa WHERE id ='3') --or put some number
FROM aaa
WHERE id ='3'

here i need to always change the ID.
Reply With Quote
  #15 (permalink)  
Old
Registered User
 
Join Date: Jan 2012
Posts: 84
Quote:
Originally Posted by preetpalkpaoor View Post
2. With CTE. Actually i dnt have any knowledge about CTE.
Try this (works on 2008 R2):
Code:
with aaa1 as (
select
    row_number() over
       (partition by id, name, class order by id) rn,
    aaa.* 
from aaa
)
delete from aaa1
where rn >1;
But I bet that you will be "sued by your manager" again with this query
- this query requires a sort of the whole table, without proper indexes
it wont perform very well. The method with a temp table is the fastests.

Last edited by kordirko; 02-09-12 at 02:42.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On