mmm. that's not very nice!
my immediate reaction is to reject the data - it is unacceptably corrupt
otherwise...(and avoiding your original question beacuse any approach like that will take decades to run)
soundexing each (all) parts of the name might crack the names issue.
the numbers part is more difficult: it is hard to imagine a "soundex" mechanism for numbers. which pair is more similar??
unlike words, each number is absolutley unique, each digit (byte(bit)) is equally meaningful.
?? sum the digits, compare sums
- handles simple transposition and some other less likely typing errors but will produce many false hits:
digitsum(1111111111) = digitsum(73)
nothing meaningful comes to mind for matching two "similar" many-digit integers with typing errors.
currently using SS 2008R2