07-04-14, 08:52 #1www.gvee.co.uk
Provided Answers: 10
- Join Date
- Jan 2007
Unanswered: Bit of Friday Fun - Proper Casing Surnames
Have an interesting side project that I've just had a play with and thought I'd share what I've kludged together so far for comment, critique and to see if anyone else has a better way of skinning this cat!
Basically, we have a load of crap customer data that comes down from the web and users cannot be trusted with typing their own name in any sane casing. This affects us when we try to send them emails and they complain that we've started the correspondence with "Dear Mrs SMITH", "Dear Mr smith" or worse.
So I thought I'd take a punt at something reasonable that aims to resolve these casing issues as best as possible.
This isn't the best solution, but it's a start. It doesn't work correctly in cases where the name is intentionally lower case (e.g. "van der Vaart") but covers other more common bases.
; WITH names (name) AS ( SELECT 'O''Brien' UNION ALL SELECT 'smith' UNION ALL SELECT 'jaden-smith' UNION ALL SELECT 'van der Vaart' UNION ALL SELECT 'mcdonald' UNION ALL SELECT 'mccarthy' UNION ALL SELECT 'o''grady' ) , letters (letter) AS ( SELECT Char(number) As letter FROM dbo.numbers WHERE number BETWEEN 65 AND 90 ) , modifiers (modifier) AS ( SELECT '''' UNION ALL SELECT '-' UNION ALL SELECT 'Mc' UNION ALL SELECT ' ' ) , replacements AS ( SELECT Row_Number() OVER (ORDER BY letters.letter, modifiers.modifier) As sequence , modifiers.modifier + letters.letter As replacement FROM letters CROSS JOIN modifiers ) , do AS ( SELECT sequence.start As sequence , name As original , Cast(' ' + Lower(name) As varchar(50)) As modified FROM names CROSS JOIN ( SELECT Max(sequence) As start FROM replacements ) As sequence UNION ALL SELECT do.sequence - 1 , do.original , Cast(Replace(do.modified, replacements.replacement, replacements.replacement) As varchar(50)) FROM do INNER JOIN replacements ON replacements.sequence = do.sequence ) SELECT original , LTrim(modified) As propered FROM do WHERE sequence = 0 OPTION (MAXRECURSION 200) ;
original propered ------------- -------------- o'grady O'Grady mccarthy McCarthy mcdonald McDonald van der Vaart Van Der Vaart jaden-smith Jaden-Smith smith Smith O'Brien O'Brien
I'm not a big fan of the recursive method (loops feel dirty) but it gets the job done. I doubt this method will scale well though but we can happily batch things up and roll through a dataset even if it takes a while.
07-07-14, 17:21 #2Annie's Dog Walker
Provided Answers: 6
- Join Date
- Nov 2004
- on the wrong server
Bruce has a big old name formatting SQL function.If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry. Earnest Hemingway, A Farewell To Arms.