Results 1 to 7 of 7
  1. #1
    Join Date
    Aug 2004
    Posts
    7

    Unanswered: Deleting, grouping duplicates

    Hi,

    I've a database import procedure that fails to catch duplicate records and creates a new transaction ID for every record imported into Access. I'm stuck with this process by the way.

    I am trying to delete duplicate records but a Select Distinct doesn't work because this Transaction_ID is unique!!

    Anyone know how I can gather just the duplicates in such a case and use a delete query? The fields that if containing equal values, determines duplicates are:
    Patron_ID
    Event_Code
    Transaction_Sale_Date
    Seats.

    (Transaction_ID would still be unique but I need it to use in my delete query).

    Ideally, it would be nice to be able to 'group' duplicate records together with a unique new primary autonumber key in cases where I don't want to delete them but store them for future reference. This is a different but similar question.

    Thanks, Leon

  2. #2
    Join Date
    Sep 2003
    Location
    The extremely Royal borough of Kensington, London
    Posts
    778
    Hi,

    First of all, the table needs to be corrected.

    Alter table tableA
    add unique (patron_ID, Event_Code, Transaction_Sale_Date, Seats)

    To select and remove duplicates:

    Code:
    Select * Into duplicatesTable
    from tableA ta1
    where 1 <
      (select count(*)
       from tableA ta
       where ta.patron_ID = ta1.patron_ID AND
             ta.Event_Code = ta1.Event_Code AND
             ta.Transaction_Sale_Date = ta1.Transaction_Sale_Date AND
             ta.Seats = ta1.Seats)
    Code:
    Delete
    from tableA ta1
    where ta1.id <
      (select max(ta.id)
       from tableA ta
       group by ta.patron_ID, ta.Event_Code, ta.Transaction_Sale_Date, ta.Seats)
    Last edited by r123456; 11-17-04 at 01:57.

  3. #3
    Join Date
    Aug 2004
    Posts
    7

    Delete duplicates

    Thanks so much for your reply. I can't alter this production table.

    Question - you create a duplicatestable in the Select query but in the Delete query you don't reference it. Am I reading this correctly?

  4. #4
    Join Date
    Mar 2003
    Location
    The Bottom of The Barrel
    Posts
    6,102
    Provided Answers: 1
    You can still use select distinct... you just have to use it a little different.

    Use a SELECT INTO query to create a new table of distinct values, then rekey your new table. I am assuming you do not have referential integrity to consider...
    oh yeah... documentation... I have heard of that.

    *** What Do You Want In The MS Access Forum? ***

  5. #5
    Join Date
    Aug 2004
    Posts
    7
    Teddy, Thanks for the reply. Transaction ID is the primary key, so given that I need this column in my result, I don't see how a select distinct will do me any good - will just give me every row into the new table. Unless, I'm misunderstanding you.

  6. #6
    Join Date
    Mar 2003
    Location
    The Bottom of The Barrel
    Posts
    6,102
    Provided Answers: 1
    Don't include transaction id in the select into query. You then re-key with an autonumber field in your target table.

    If transaction id is tied to other records in other tables, you can't just go around deleting duplicates, you'll need to address that by hand.
    oh yeah... documentation... I have heard of that.

    *** What Do You Want In The MS Access Forum? ***

  7. #7
    Join Date
    Sep 2003
    Location
    The extremely Royal borough of Kensington, London
    Posts
    778
    Quote Originally Posted by itrain8535
    Thanks so much for your reply. I can't alter this production table.

    Question - you create a duplicatestable in the Select query but in the Delete query you don't reference it. Am I reading this correctly?
    Yes, that is correct. The new table is created because you stated you wish to keep a table of duplicate records.

    If you are using SQL Server as the backend, ADP for example, then you can enforce the unique constraint through the corresponding stored procedure to avoid altering the table structure.

    Although, there is no requirement to reference the duplicates table to perform the delete operation, I assume it would be faster to do so.
    Code:
    Delete
    from tableA ta1
    where ta1.id <
      (select max(ta.id)
       from duplicatesTable dt
       group by dt.patron_ID, dt.Event_Code, dt.Transaction_Sale_Date, dt.Seats)
    Last edited by r123456; 11-17-04 at 19:55.
    Bessie Braddock: Winston, you are drunk!
    Churchill: And Madam, you are ugly. And tomorrow, I'll be sober, and you will still be ugly.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •