Results 1 to 6 of 6
  1. #1
    Join Date
    Nov 2014
    Posts
    4

    Unanswered: mark duplicate records

    hi guys,

    first of all, I apologize for my bad english!


    I have a Postgres (9.3) Database with about 3 million car records and my java code is to slow - so i try to fix my problem with a simple PSQL statement.

    the cars table looks like this:
    id(int), firstread(date), lastread(date), url(varchar), model (foreign key), horsepower(int), kilometer(int), color(foreign key), price(double), seller(foreign key), duplicate(boolean)

    and there are some duplicate records (same model, horsepower, kilometer, color, price and seller).

    now, I will search for duplicate records like this:

    select model, horsepower, kilometer, color, price, seller, count(url) from cars group by model, horsepower, kilometer, color, price, seller having count(*) > 1

    AND mark all duplicate records (except the car with the newest firstread date!) in one statement. is this possible?

    Info: the count(url) in my psql statement is between 2 and 16!

    I hope you can help me!

    best regards, steamy

  2. #2
    Join Date
    Nov 2014
    Posts
    4
    today, this is my first version:


    UPDATE cars
    SET duplicate = true
    FROM
    (
    select model, horsepower,kilometer, price, color,seller, min (id) OldestId, Count (*) NumRows
    from cars
    group by model, horsepower,kilometer, price, color, seller
    having count(*) >1
    ) DuplicateRows
    WHERE
    DuplicateRows.OldestId = cars.id


    but it only mark one of the duplicate records - so i need to run this about 20 times.
    is there any easier(faster!) solution?


    best regards, steamy

  3. #3
    Join Date
    Dec 2014
    Posts
    3
    Steamy

    Can you please tell me the logic of using min(id) function? Is it because duplicate cars have different ids?

  4. #4
    Join Date
    Nov 2014
    Posts
    4
    Quote Originally Posted by ch_farhan_nisar View Post
    Steamy

    Can you please tell me the logic of using min(id) function? Is it because duplicate cars have different ids?
    Yes!


    Different IDs but same kilometers, kw, modell, production-date, price,...

  5. #5
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    Why nit store the VIN of the vehicle. Its guaranteed to be globally unique https://www.google.co.uk/search?q=vi...nition+vehicle

    However be very wary of websites tgat 'guarantee' they can decode all VIN's. Most if 'em nake a good fist if US vehicles but fail spectacularly with non US
    I'd rather be riding on the Tiger 800 or the Norton

  6. #6
    Join Date
    Nov 2014
    Posts
    4
    Quote Originally Posted by healdem View Post
    Why nit store the VIN of the vehicle. Its guaranteed to be globally unique https://www.google.co.uk/search?q=vi...nition+vehicle

    However be very wary of websites tgat 'guarantee' they can decode all VIN's. Most if 'em nake a good fist if US vehicles but fail spectacularly with non US
    because i crawl the vehicles from other website and there is no VIN to crawl

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •