Results 1 to 2 of 2
  1. #1
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10

    Unanswered: Bit of Friday Fun - Proper Casing Surnames

    Have an interesting side project that I've just had a play with and thought I'd share what I've kludged together so far for comment, critique and to see if anyone else has a better way of skinning this cat!

    Basically, we have a load of crap customer data that comes down from the web and users cannot be trusted with typing their own name in any sane casing. This affects us when we try to send them emails and they complain that we've started the correspondence with "Dear Mrs SMITH", "Dear Mr smith" or worse.

    So I thought I'd take a punt at something reasonable that aims to resolve these casing issues as best as possible.

    This isn't the best solution, but it's a start. It doesn't work correctly in cases where the name is intentionally lower case (e.g. "van der Vaart") but covers other more common bases.

    Code:
    ; WITH names (name) AS (
      SELECT 'O''Brien'
      UNION ALL SELECT 'smith'
      UNION ALL SELECT 'jaden-smith'
      UNION ALL SELECT 'van der Vaart'
      UNION ALL SELECT 'mcdonald'
      UNION ALL SELECT 'mccarthy'
      UNION ALL SELECT 'o''grady'
    )
    , letters (letter) AS (
      SELECT Char(number) As letter
      FROM   dbo.numbers
      WHERE  number BETWEEN 65 AND 90
    )
    , modifiers (modifier) AS (
      SELECT ''''
      UNION ALL SELECT '-'
      UNION ALL SELECT 'Mc'
      UNION ALL SELECT ' '
    )
    , replacements AS (
      SELECT Row_Number() OVER (ORDER BY letters.letter, modifiers.modifier) As sequence
           , modifiers.modifier + letters.letter As replacement
      FROM   letters
       CROSS
        JOIN modifiers
    )
    , do AS (
      SELECT sequence.start As sequence
           , name As original
           , Cast(' ' + Lower(name) As varchar(50)) As modified
      FROM   names
       CROSS
        JOIN (
               SELECT Max(sequence) As start
               FROM   replacements
             ) As sequence
      UNION ALL
        SELECT do.sequence - 1
             , do.original
             , Cast(Replace(do.modified, replacements.replacement, replacements.replacement) As varchar(50))
        FROM   do
         INNER
          JOIN replacements
            ON replacements.sequence = do.sequence
    )
    SELECT original
         , LTrim(modified) As propered
    FROM   do
    WHERE  sequence = 0
    OPTION (MAXRECURSION 200)
    ;
    Results:
    Code:
    original      propered
    ------------- --------------
    o'grady       O'Grady
    mccarthy      McCarthy
    mcdonald      McDonald
    van der Vaart Van Der Vaart
    jaden-smith   Jaden-Smith
    smith         Smith
    O'Brien       O'Brien
    Whaddya reckon?

    I'm not a big fan of the recursive method (loops feel dirty) but it gets the job done. I doubt this method will scale well though but we can happily batch things up and roll through a dataset even if it takes a while.
    George
    Home | Blog

  2. #2
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    Bruce has a big old name formatting SQL function.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •