Unanswered: Finding duplicates when no obvious field
I have been asked to take a look at a large database which is used to store over 300,000 scientific statistical records. My first task is to try and slueth out if some of the records have been entered twice (some have). Unfortunatlely, The records are not completely the same: in many cases, records were entered at two different times with additional information the second time.
The "key field" in the table is just a number (not autonumber). The users just looked at the last number in the table and added incrementing numbers onto the new data.
I have SEVERAL questions for the forum (sorry and thank you!):
Is there a typical way that developers use to sniff out duplicate records? The user is now using the Find Duplicates wizard, which I intend to investigate, but I don't know in this case how reliable any wizard is.
Is there a typical way that developers generate key fields from existing data, for example taking four fields that together may generate a unique record and automatically generating a field:
Here is an example:
Area sample taken: WS0098
Year sample taken: 1976
Generated key field is these two fields plus incremental numbers:
Ive had a similar situation in a large DB in my enviroment. The same issues were involved and required the duplicate to be removed and like you some were made with more information or better information logged against the record point.
I pondered over using some sql and the duplicate wizard. In the end I was forced to use a dulicate finder so that "a qualified" user could check the information and delete the lesser useful.
This can be a long pain in the rear process but when you have stats that have to be kept then there is a danger that you will lose the better of the two (or three +) duplications and have a distorted reaction when wanting to use the data
to to answer one of your queries - consider tracking down the dulpicates a having a "qualified" individual sign them off so that you can flag them and then delete those flagged later