Results 1 to 8 of 8
  1. #1
    Join Date
    Aug 2005
    Location
    Delaware
    Posts
    139

    Unanswered: Delete duplicate records using newest date or Auto-number

    Hello All;

    Can anyone suggest, or point me to, a working VBA solution to deleting duplicates from a table using the newest date as criteria.

    My table is populated with new job orders from a spreadsheet every morning. These job orders contain workorder statuses, and therefore due to updated statuses, there are duplicates in the main table. I need to delete any duplicates, keeping only the latest date/time record.

    The table has an auto-generated number field (AutoNum), and unique workorder number(WorkorkorderNum) and a date the workorder was last updated (dDateUpdated) (among numerous other fields) . I added the Auto Number field thinking it would be easier to locate the newest record, but I have been searching for hours trying to find a workable solution, VBA preferred, but SQL will do. Many have been found, but I can't seem to get any to function properly. Locating the duplicated records seems fairly straightforward but deleting them based on either the most recent date, or the Autonumber seems to be the difficult part.

    Any help would be appreciated.

    Thanks...

    Access 2013

  2. #2
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    Is there a VBA solution?
    ...Possibly
    However id argue the approach is wrong.
    Lets put to one side the idea that a spreadsheet is however remotely a good idea as a prime source of data capture or storages (frankly it isnt)

    You already have a unique value, the work order, so use that instead if complicating things by adding an autonumber column. Read up on when to use 'surrogate keys'...

    Two approaches.
    A) keep ghe existing design
    A.1 Import the spreadsheet into a temporary table
    A.2 run a query which updates the pre existing data, using a join clause on the respective work orders
    A.3 run a query that attempts to insert rows, reject any duplicates
    ...the tidier approach is after A.2 run a query which deletes riws in your temporary table that have a workorder in the temp table.

    Option B
    Redesign the table so that the stuff that changes is stored in a sub table so yoh have tbe history of the status changes

    Well Option C also exists but its really a good hoysekeeping policy:-
    Take the geniuses who insist on using spreadsheets as a prime source of data out behind the works shed and have them quietly put down, they are dangerous to the company's well being

    For me option B, ive yet to see anywhere where you dont want to see the pathway and timescale of the job is unimportant.

    You cant do this in vba alone, as at some point you have to manipulate the data, that means interacting with the DB, and that means SQL. On top of which this sort of process is right up SQL's street. Yes it will be painfull to develop the queries from scratch, but thats what places like dBForums exist for.

    As to eliminating existing duplicates, again run a query
    Its going to be sonething like
    Code:
    delete * from mytable as p
    Join mytable as d on p.workorder = s.workorder
    Where p.dDateuodated < s.dDateUpdated
    Its a simple query. Delete all rows where there is another row in the same table with the same workorder and a more recent status change date. The syntax of the join may need refining, Access / JET has its own way of firming a join
    Ive aliassed the table as we use it twice to p (primary) s (secondary)
    You will need to maje certain the table name and any referred columns are changed to yours
    When developing the SQL work on copies of tables not the original data. Once youve proved it works then move the queries, changing the table names to the live values.
    Take frequent backups BEFORE running update / delete queries

    You should/may be able to use the query designer for this, I don't use the designer so dont 'support' it here

    The answer yo your probkem irrespective of what approach is goibg to be using SQL. If you are not familiar with SQL or worried about it then now is a very good time to start learning about SQL, what it can do for you. If you are going to develop inAccess then you will need to dabble in SQL. Ints not some arcane mumbo jumbo chanted in the data centre, its a vital tool for handling data
    I'd rather be riding on the Tiger 800 or the Norton

  3. #3
    Join Date
    Aug 2005
    Location
    Delaware
    Posts
    139
    healdem... thank you for the well thought out reply. Yes, I agree, a spreadsheet is not the ideal way to go, but one needs to work with what one needs to work with. Some just LOVE their spreadsheets.

    It would be simple to delete the dups in the main table, but the problem I face is that the newly arriving data is what causes the dups. A workorder may change statuses several times over multiple days (and there is no foreseeable reason to keep historical status changes, as such is available elsewhere in addition to the workorder itself if needed). The same workorder may come in day after day and I only need to keep the most recent version. To prevent entering dups into the table in the first place, I would need to check the complete table each time prior to insertion of the new record in order to see if the workorder is already there, and if the status had changed. Possible yes, but I think would significantly slow down the process of loading the data. The process is already slow due to loading the spreadsheet data, further calculations, and adding fields to the newly arriving data, then inserting into the table.

    I'll study your response and work through your suggestions. Again, thanks so much for your reply. Although there are multiple questions when Googling on a process like this, there appear to be very few coherent responses. I guess each situation is a little different and therefore what work for one may not work for another.

    Thanks...

  4. #4
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    Why would a query slow down... Assuming the table is properly indexed it should run fast. Try the suggested query, get some metrics before discounting queries.
    Hint they are a feck of a lot quicker than vba.
    I'd rather be riding on the Tiger 800 or the Norton

  5. #5
    Join Date
    Aug 2005
    Location
    Delaware
    Posts
    139
    Why would a query slow down... Pulling in 200-300 records daily, requiring a table lookup for each record to check for a duplicate prior to writing the record, in a table with several thousand records, across a network.

    With your help, I got it working. Initial testing appears its working as desired, and the speed is acceptable. Will post the working code (SQL) asap.

    Thank you...

  6. #6
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    Ok so i dont think you grasp the power of SQL queries. SQL works on sets of data

    200..300 row daily insert is trivial
    If you must persist with your cack handed design do so. But leverage to power of SQL dont use VBA or loop processing
    I'd rather be riding on the Tiger 800 or the Norton

  7. #7
    Join Date
    Aug 2005
    Location
    Delaware
    Posts
    139
    Quote Originally Posted by healdem View Post
    Ok so i dont think you grasp the power of SQL queries. SQL works on sets of data

    200..300 row daily insert is trivial
    If you must persist with your cack handed design do so. But leverage to power of SQL dont use VBA or loop processing
    healdem... you are correct, I don't understand the power of SQL. There is a lot I don't understand (including Access), but I keep working at it. Never stop learning, and not be ashamed of what I don't know, it's the only way I can function.

    I finally ended up at a SQL solution:
    DELETE MyTableName.*, MyTableName.AutoNum
    FROM MyTableName
    WHERE (((MyTableName.AutoNum) Not In (SELECT MAX(AutoNum)
    FROM MyTableName mt
    WHERE dDateUpdated=(
    SELECT MAX(dDateUpdated)
    FROM MyTableName
    WHERE WorkOrderNum=mt.WorkOrderNum)
    GROUP BY WorkOrderNum)));

    Initial testing indicates this is working as needed, however I don't understand the what and why behind it. For my benefit, and the benefit of others who may find this thread, could you explain what the "mt" does?

  8. #8
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Provided Answers: 59
    That looks awfy complicated two subselects, uses the max function twice (make certain the columns are indexed)

    mt in that query is an alias for mytable, hence 'mt'. Its a valid shorthand used to shorten complex or longwinded names to make queries easier to read. OR its required if the table is used more than once. Hence why i refer to p. And s. (p: primary s:secondary).
    Last edited by healdem; 08-09-16 at 15:35.
    I'd rather be riding on the Tiger 800 or the Norton

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •