Page 1 of 2 12 LastLast
Results 1 to 15 of 18
  1. #1
    Join Date
    Jun 2003
    Location
    Longmont, CO
    Posts
    258

    Unanswered: COOK-O'NEIL vs. Cook-o'neil

    SQL Server 2000

    Does anybody know of a good T-SQL engine for de-mangling names?

    We're getting names in all caps and need to put them into Proper Case. This is, of course, trivial until you meet the real world where Miss Cook marries Mr. O'Neill and decides to hyphenate. Then of course there was Ms. MacArthur who just shouldn't exist at all, even before she marries Mr. O'Flanahan and has a brood of MacArthur-O'Flanahans.

    Surely there is something out there that fixes this?

    Thanks for any recommendations!

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    As nonstandard capitialization is by definition non-standard, the only definitive answer is a whole flock of secretaries that either know the answers or will ask the appropriate people to get them. Names are like laws, they are arbitrary, so no logic in the universe will help you reason them out.

    That being said, there are a number of packages that do a good job at guessing. It isn't even hard to write a good "case fixer" that will get about 98% of the names right. The problem still goes back to the arbitrary nature of names, so you'll never know which ones are wrong without manual intervention!

    -PatP

  3. #3
    Join Date
    Jun 2003
    Location
    Longmont, CO
    Posts
    258
    It may be the case that it isn't very hard to write something in TSql that will do the job, 98% of the time. However, the reason I was asking it that I have to have it complete and tested (as part of a whole pile of other work) by Monday.

    Hence, as a vb programmer slogging through TSQL, I was hoping for hyperlinks.

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    For what it is worth, here is an Access VB function I wrote some time back to deal with this issue.

    You can either convert the logic to SQL, or put it in an Access ADP file linked to your database for a one-time run.

    ---------------------------------------------
    Function PROPERCAP(TEXT_STRING)
    'B. Lindman
    '1997
    'Returns the name of a person or business with proper capitalization.

    Dim TRIMMED_STRING As String
    Dim PROPER_STRING As String
    Dim CHARACTER_NUMBER As Integer
    Dim STRING_LENGTH As Integer
    Dim EXTRA_SPACE
    Dim NEXT_CAP

    If Len(TEXT_STRING) > 0 Then
    TRIMMED_STRING = Trim(TEXT_STRING)
    End If
    STRING_LENGTH = Len(TRIMMED_STRING)
    NEXT_CAP = True


    If STRING_LENGTH > 0 Then
    For CHARACTER_NUMBER = 1 To STRING_LENGTH

    'CHECK FOR EXTRA SPACE CHARACTERS
    EXTRA_SPACE = False
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER, 1) = " " Then
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER - 1, 1) = " " Then
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER - 2, 1) <> "." Then EXTRA_SPACE = True
    End If
    End If

    'BUILD PROPER STRING
    If Not EXTRA_SPACE Then
    Select Case NEXT_CAP
    Case True
    PROPER_STRING = PROPER_STRING & UCase(Mid(TRIMMED_STRING, CHARACTER_NUMBER, 1))
    Case False
    PROPER_STRING = PROPER_STRING & LCase(Mid(TRIMMED_STRING, CHARACTER_NUMBER, 1))
    End Select
    End If

    'DETERMINE CAPITALIZATION STATUS OF NEXT CHARACTER
    NEXT_CAP = False
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER, 1) = "." Then NEXT_CAP = True
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER, 1) = " " Then NEXT_CAP = True
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER, 1) = "-" Then NEXT_CAP = True
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER, 1) = "(" Then NEXT_CAP = True
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER, 1) = "/" Then NEXT_CAP = True
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER, 1) = "I" Then
    If CHARACTER_NUMBER > 3 Then
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER - 1, 1) = " " Then
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER + 1, 1) = "I" Then
    NEXT_CAP = True
    End If
    Else
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER - 1, 1) = "I" Then
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER - 2, 1) = " " Then
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER + 1, 1) = "I" Then
    NEXT_CAP = True
    End If
    End If
    End If
    End If
    End If
    End If
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER, 1) = "c" Then
    If CHARACTER_NUMBER > 1 Then
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER - 1, 1) = "M" Then
    If CHARACTER_NUMBER - 2 = 0 Then
    NEXT_CAP = True
    Else
    If Mid(TRIMMED_STRING, CHARACTER_NUMBER - 2, 1) = " " Then
    NEXT_CAP = True
    End If
    End If
    End If
    End If
    End If

    Next CHARACTER_NUMBER

    End If

    If Len(PROPER_STRING) > 0 Then
    PROPERCAP = PROPER_STRING
    Else
    PROPERCAP = Null
    End If

    End Function
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Hey, does B. stand for Bob?

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    No! I am the ORIGINAL blindman!
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Oh, and I am an airplane

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Originally posted by rdjabarov
    Oh, and I am an airplane
    I'd always wondered about that. Thanks for clearing it up.

    -PatP

  9. #9
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    If an 85-90% solution will do, you can also use:
    PHP Code:
    CREATE TABLE names (
       
    name            VARCHAR(50)
       )

    INSERT INTO names ([name])
       
    SELECT 'TILLEY O''TOOL'
       
    UNION ALL SELECT 'PHRED PHARQUAR'
       
    UNION ALL SELECT 'JOHN KISS-BUTTS'
       
    UNION ALL SELECT 'RENE MACDONALD'
       
    UNION ALL SELECT 'CRAIG MCDERMOT'

    UPDATE names
       SET 
    [name] = Upper(Left(LTrim([name]), 1)) 
    +     
    Lower(SubString(LTrim(Rtrim([name])), 28000))

    DECLARE @
    cName    VARCHAR(50)
    ,  @
    i        INT

    DECLARE zFixes CURSOR FOR SELECT [name]
       
    FROM names
       WHERE  
    [nameLIKE '%[^A-Za-z][a-z]%'

    OPEN zFixes
    FETCH zFixes INTO 
    @cName

    WHILE = @@fetch_status
       BEGIN
          SET 
    @PatIndex('%[^A-Za-z][a-z]%', @cName)
          WHILE 
    < @i
             BEGIN
                SET 
    @cName SubString(@cName1, @1)
    +              
    Upper(SubString(@cName, @i2)) 
    +              
    SubString(@cName, @28000)

                
    SET @PatIndex('%[^A-Za-z][a-z]%'
    ,              @cName COLLATE Latin1_General_BIN )
             
    END

          UPDATE names SET 
    [name] = @cName WHERE CURRENT OF zFixes
          FETCH zFixes INTO 
    @cName
       END

    CLOSE zFixes
    DEALLOCATE zFixes

    SELECT 
    FROM names

    DROP TABLE names 
    It doesn't deal with exceptions, although you can add them several ways at the end. The biggest problem with figuring out how you want to process the exceptions is that nearly all of the methods are mutually exclusive, using one precludes the use of others. You have to know the data you want to fix (the exceptions you want to handle) in order to pick the right algorithm for coping with them.

    -PatP

  10. #10
    Join Date
    Jun 2003
    Location
    Longmont, CO
    Posts
    258
    Thanks Pat,

    I'll have to look at it tomorrow. I spent all afternoon converting/writing a vb fix in vb that used only functions that also work in vbScript. Worked well in vb.

    When I moved it into my vbScript module it flopped. What becomes 'O'Neill - McRoberts' in vb ends up as O'neil-mcroberts' in vbScript. Grrrrrrr!

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Fly djabarov.

    Do you offer frequent flier miles?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  12. #12
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    Hmmm, I think you already know my response, so let's keep it out of here...unless you want a piece of me?

  13. #13
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Originally posted by rdjabarov
    Hmmm, I think you already know my response, so let's keep it out of here...unless you want a piece of me?
    Great zot! Did you mean to leave that one hanging wide open like that??? Talk about a perfect opportunity for multiple straight lines!

    I suppose I have to behave, but it is hard to pass up temptation like that!

    -PatP

  14. #14
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Ain't it amazing how fast we end up in the gutter....

    Nice code pat....I got one here or abother forummm...
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  15. #15
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •