Results 1 to 6 of 6
  1. #1
    Join Date
    Jul 2006
    Posts
    3

    Unanswered: Big Database - Massive Duplication, Please Help.

    Hi,

    I have an access database that I created from importing several excel files. I just started this job and previously several excel files were used on different systems, containing a lot of the same data.

    After doing all of the imports, I have 400,000+ records. Running the built-in find duplicates query returns 236,000 records. I don't want to delete all 236,000, only the ones that are duplicated (I'm guessing I need to keep around 118,000).

    This duplicate query is returning results based on records with the same Name at the same Address in the same City and State.

    I have searched for a solution, but it seems that if I set the address to be the primary key on a new table, and imported the data there that Access would just keep the FIRST record for that address, not necessarily the CORRECT record. Also I imagine that this would cause problems for streets that appear in multiple cities ie - 100 3rd Avenue.

    The difference between the duplicates are dollar amounts and origination dates. One record may have been older, and thus contain a lower amount than the newer record. I want to delete duplicate records based on the dollar amount - i want to keep only the record with the highest dollar amount(some records are in the db 3 times, so I dont want to just delete the lowest). It seems that the best way to do this would be some sort of SQL delete query, but I'm no pro at writing script, and given the type of data I don't want to destroy it trying.

    Any thoughts on the best way to go about getting rid of the older 'duplicates' and possibly a sample Delete Query would be greatly appreciated.

    Thank You in advance!

  2. #2
    Join Date
    Jun 2005
    Location
    Richmond, Virginia USA
    Posts
    2,763
    Provided Answers: 19
    This problem is way out of my area of experience, but just be aware that you can assign multiple fields in your table as a Primary Key, i.e. Name + Address + City + State. In the Design View of your table press and hold down the <Shift> key, right click your mouse and click on Primary Key. Continue holding down the <Shift> key and repeat for each field.

    Good luck in this massive undertaking!
    Hope this helps!

    The problem with making anything foolproof...is that fools are so darn ingenious!

    All posts/responses based on Access 2003/2007

  3. #3
    Join Date
    Dec 2005
    Posts
    45
    How about this:

    1. Copy the structure of the main table to a new db.
    2. Copy the data as a recordset into the new table using VB. Use a group query as your record source.

  4. #4
    Join Date
    Jul 2006
    Posts
    3
    Well, I took a break from it and a simple solution magically appeared in my head, I'll describe what I did in case someone else ever runs into this problem, and to see if anyone finds any flawed logic in my solution.

    I made a new table, using the structure of the old. Set the House number, Address, City, and State as primary keys.

    I made an append query that read data from the old table.. With the amount field sorted in decending order. So as the query ran, it was pulling the larger amounts first. Any duplicate addresses it found had amounts equal to or less than record that was already in place in the new table. Thus, the duplicates are no longer in the DB, AND I was able to keep the larger amounts.

    Now I have 281,000 UNIQUE records in my database. Thanks for the tip about multiple primary keys, thats what got the wheels turning. I'm just a little embarassed that I didn't think of this solution sooner!
    Last edited by StoopidNoobie; 07-12-06 at 14:43.

  5. #5
    Join Date
    Jul 2003
    Location
    Michigan
    Posts
    1,941
    I haven't had to do this for a while. Your table needs to have a unique
    field (like an autonumber) for this to work. You create a subquery that pulls
    unique records, then use that to delete everything else:

    Code:
    DELETE FROM [YOURTABLE]
    WHERE RECID NOT IN
    (SELECT FIRST(RECID)
    FROM [YOURTABLE]
    GROUP BY FIELD1, FIELD2, ....)
    As always, make sure you back up your data before testing out a delete query on it!
    Inspiration Through Fermentation

  6. #6
    Join Date
    Jun 2005
    Location
    Richmond, Virginia USA
    Posts
    2,763
    Provided Answers: 19
    A break will do that sometimes, StoopidNoobie! Of course, a glass of bourbon and a good cigar will too!
    Hope this helps!

    The problem with making anything foolproof...is that fools are so darn ingenious!

    All posts/responses based on Access 2003/2007

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •