Results 1 to 5 of 5
  1. #1
    Join Date
    Jun 2003
    Posts
    3

    Unanswered: duplicate records

    I am having a headache working on a huge database which contains 120000 records. the big problem is that the database contains duplicates of the "serial no" of products. Another important field is the "date_purchased". How do I remove the duplicates and retain the serial no with the latest date purchased? I have managed to remove duplicates but the program does not remove on the basis of the date_purchased.

  2. #2
    Join Date
    Feb 2003
    Location
    Germany
    Posts
    53

    Re: duplicate records

    Hi,

    i found some info's about duplicates on planet-source-code.com as i had same problems:

    see attached file (too big to quote here)
    Attached Files Attached Files

  3. #3
    Join Date
    Jan 2003
    Location
    Nottinghamshire, UK
    Posts
    364
    How about doing it in 2 stages ie:-

    SELECT DISTINCT SerialNo,Max(DatePurchased) From tbl GROUP BY SerialNo

    Then inner Join it to

    SELECT * FROM tbl

    U could create 1 Statement as a Permanent View or Do it all in a stored procedure & #tmp table

    Should work -

    U'l prob find it can all be done in one Statement - Some1 Post & I'll C which way is faster

    GW
    "Everything should be made as simple as possible, but not simpler." - Albert Einstein
    "Everything should be made as complex as possible, so I look Cleverer." - Application Developer

  4. #4
    Join Date
    Sep 2002
    Location
    Mancos, CO
    Posts
    73
    Or a refinement on GWilley's code

    SELECT DISTINCT SerialNo, Max (DISTINCT DatePurchased)
    From tbl??
    Group By SerialNo

    I ran this on something similar on my db (table with over 700,000 records) and it seemed to work correctly.

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    you do not need DISTINCT when you use GROUP BY -- groups are distinct by definition

    save the serial numbers and max dates in a working table, then delete all the rows that don't have a match in the working table

    select serial, max(date_purchased) as maxdate
    into keepthese
    from yourtable
    group by serial

    delete from yourtable
    where not exists
    ( select 1
    from keepthese
    where serialno = yourtable.serial
    and maxdate = yourtable.date_purchased )


    rudy
    http://r937.com/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •