Results 1 to 15 of 15
  1. #1
    Join Date
    Feb 2012
    Posts
    7

    Delete Duplicate Records-without primary key

    Hi All ,

    I want to delete the duplicate record from a table keeping 1 record aside.

    my base table is-info

    id name class
    2 abc 6a
    3 abc 6a
    4 abc 6a
    1 abc 6a
    2 abc 6a
    4 abc 6a
    4 abc 6a
    3 abc 6a
    3 abc 6a
    1 abc 6a
    2 abc 6a
    5 abc 6a

    id-int
    name-text
    class-varchar
    (there is no primary key in this table)

    Now i want the result in following way:
    id name class
    2 abc 6a
    3 abc 6a
    4 abc 6a
    1 abc 6a

    I have tried the following query and its running fine but its not a dynamic stuff.

    DELETE top (SELECT COUNT(*)-1 FROM aaa WHERE id ='3') --or put some number
    FROM aaa
    WHERE id ='3'-- or put some number

    So i was wondering if some one could help me in this.

    Many thanks,
    Preetpal kapoor

  2. #2
    Join Date
    Nov 2003
    Posts
    2,817
    If you have enough disk space and then you could do the following:
    Code:
    SELECT DISTINCT id, name, class
       into #temp
    FROM aaa;
    TRUNCATE TABLE aaa;
    INSERT INTO aaa (id, name, class)
    SELECT id, name, class
    FROM #temp;
    COMMIT;

  3. #3
    Join Date
    Feb 2012
    Posts
    7
    Sorry I cant make any temp table.
    I know this method but this should be done through a single query.

  4. #4
    Join Date
    Nov 2003
    Posts
    2,817
    Quote Originally Posted by preetpalkpaoor View Post
    Sorry I cant make any temp table.
    Why is that so?

  5. #5
    Join Date
    Feb 2012
    Posts
    7
    Because i have to optimize the query execution time.
    And by creating a temp table and then deleting it will slow down the processing of a query.

    This query will be used on a real time system and if it hampers the performance then it is of no use.

  6. #6
    Join Date
    Nov 2003
    Posts
    2,817
    Quote Originally Posted by preetpalkpaoor View Post
    Because i have to optimize the query execution time.
    And by creating a temp table and then deleting it will slow down the processing of a query.

    This query will be used on a real time system and if it hampers the performance then it is of no use.
    So did you test if if the execution time was too slow?
    And if this "hampers" the performance?

  7. #7
    Join Date
    Feb 2012
    Posts
    7
    At present the table which i have provided you in my query is a fake one (or just for practice) and with this much data it will not hamper .

    But on my real time data it is hampering because i have tried this earlier and sued by my senior.

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    14,914
    With no PK, no index, and billions of rows there is no answer. That schema design is so careless that it makes a solution impossible.

    There is so much that you haven't told us about the business rules that affect your table that we can only play guessing games. You really need to hire a professional consultant to help you resolve this issue or you need to work on giving us a better description of what you want.

    What is the time window that you can work within? What kind of hardware are you using? How many rows are there in the table you are trying to de-dupe? How many of those rows are duplicates? What are the business rules that govern this table? What kind of processing environment does this server work within? What other things might we need to know? What have you tried, and why weren't your attempts acceptable?

    Give us the information that we need to solve your problem for you, you might be surprised at what we can do!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  9. #9
    Join Date
    Feb 2012
    Location
    DUBAI
    Posts
    6

    Deleting Duplicate Records From MSSQL

    -- Change the #Duplicate_Records to your own table name
    -- Change the field as your own tables field
    -- Take backup of the original table and test before applying on live


    select * from #Duplicate_Records

    --Show Duplicate Records in Duplicate_Records
    SELECT id, COUNT(*) FROM #Duplicate_Records
    GROUP BY id HAVING COUNT(*) > 1
    ORDER BY COUNT(*) DESC

    SET ROWCOUNT 1
    SELECT @@rowcount
    WHILE @@rowcount > 0
    DELETE DupRec FROM #Duplicate_Records as DupRec
    INNER JOIN
    (SELECT id FROM #Duplicate_Records GROUP BY id HAVING count(*) > 1)
    AS c ON c.id = DupRec.id
    SET ROWCOUNT 0

    SELECT * FROM #Duplicate_Records

    -- akhilesh

  10. #10
    Join Date
    Dec 2006
    Posts
    24
    do a select distinct or reduce doubletts by grouping correctly

    with the result do a select into on a new table beeing identic plus a primary key ( int autoincrement or whatever)

    rename the old table to garbage_without keye
    rename the new table to the name of the old one.

    ...and never ever create a new table without an identity column
    an Apple a day keeps Dr. Watson away !

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by wegus View Post
    ...and never ever create a new table without an identity column
    you probably meant to say without a primary key

    having an identitly column is no guarantee that "duplicates" won't be entered

    (of course, they aren't true duplicates, they'ss all have different identity column values, but the rest of the columns can easily be duplicated)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  12. #12
    Join Date
    Dec 2006
    Posts
    24
    Quote Originally Posted by r937
    having an identitly column is no guarantee that "duplicates" won't be entered
    in some cases duplicates might be possible in those cases it is good to have an identity column. In other cases you might not want to save duplicates, then you ougt to ad a primary key. Of course the latter is what mostly happens, but the first case might also occur. So i did mean "at least identities" or even better primary keys. That is what i intendet to say.
    an Apple a day keeps Dr. Watson away !

  13. #13
    Join Date
    Nov 2003
    Posts
    2,817
    Quote Originally Posted by preetpalkpaoor View Post
    At present the table which i have provided you in my query is a fake one (or just for practice) and with this much data it will not hamper .

    But on my real time data it is hampering because i have tried this earlier and sued by my senior
    I don't believe that any statement that is able to delete those duplicates without using a temp table will be more efficient (= will hamper the server less) than the solution with a temp table

  14. #14
    Join Date
    Feb 2012
    Posts
    7
    Thanks every one for your support.

    i got 3 methods to do that.
    1. To make temp table with i will check with my real time DB and is the easiest method.
    2. With CTE. Actually i dnt have any knowledge about CTE.
    3.It was my way but a lengthy one.

    DELETE top (SELECT COUNT(*)-1 FROM aaa WHERE id ='3') --or put some number
    FROM aaa
    WHERE id ='3'

    here i need to always change the ID.

  15. #15
    Join Date
    Jan 2012
    Posts
    84
    Quote Originally Posted by preetpalkpaoor View Post
    2. With CTE. Actually i dnt have any knowledge about CTE.
    Try this (works on 2008 R2):
    Code:
    with aaa1 as (
    select
        row_number() over
           (partition by id, name, class order by id) rn,
        aaa.* 
    from aaa
    )
    delete from aaa1
    where rn >1;
    But I bet that you will be "sued by your manager" again with this query
    - this query requires a sort of the whole table, without proper indexes
    it wont perform very well. The method with a temp table is the fastests.
    Last edited by kordirko; 02-09-12 at 02:42.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •