Results 1 to 2 of 2
  1. #1
    Join Date
    Feb 2004
    Posts
    15

    Unanswered: A better mousetrap: Your best method for matching records.

    I frequently run into multiple lists of people that all have valuable bits of information, but without any sort of unifying ID field.

    The current method I use for matching these lists is a join via a created
    "matchfield." Such as this

    Matchfield : Left([Last Name], 3) & Left([First Name], 3) & Left([Street
    Address], 3) & [ZipCode]

    I then use this as an ID to find matches in the two lists. My match rate
    with this method is sometimes very low, and I was wondering what sort of
    methods other people use to match up lists.

    I'm running Access 2003 on a Windows platform, FYI.

  2. #2
    Join Date
    Oct 2004
    Location
    Oxfordshire, UK
    Posts
    89
    Clean up text data first using action queries or code:
    1 Remove leading and trailing spaces e.g. Trim([ZipCode])
    2 Replace double spaces and non-alpha numeric characters e.g. Replace ([Street Address],", ", " ")

    Then:
    1 Create an ID field in the original table, tbData_Original. Assign unique IDs to all records.
    2 Create a duplicate of tbData_Original, tbData_Copy and make sure the tbData_Copy.ID isn't a primary key or autonumber.
    3 Create a scoring table (tblScore) with ID1, ID2, score fields. Create relationships between ID1 and tbData_Original.ID, and ID2 and tblData_Copy.ID respectively.

    Scoring will depend on the number of fields, but in this case all 4 fields matching gives you 100%. You can score the records by:
    1 Compare the records in two different recordsets based on tbData_Original and tblData_Copy. Records with 100% match can be updated to an existing ID in tblData_Copy. Delete the duplicates in tblData_Copy.
    2 Compare tblData_Original with tblData_Copy and create records in the linking table, tblScore. tblScore.Value can be calculated on the number of matching fields.

    Now you can create queries based on tblData_Original and tblScore. If you specify a threshold tblScore.Value in the criteria, inspecting records should be easier.

    A further refinement would be to parse comparison fields using split([Field], " ") and count the number of component string matches giving you a score for individual fields. Add these up to give the final score. You could even go further by counting the number of characters matched in same position string fragments, scoring the fragments. It depends on how often you need to do this and how much work you want to do! Do this in passes (excluding previous 100% matches) to speed up the process and don't forget to add DoEvents to code...
    Last edited by MyNewFlavour; 08-14-06 at 09:27.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •