Results 1 to 7 of 7

Thread: Soundex issue

  1. #1
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483

    Unanswered: Soundex issue

    While soundex('Ashcraft') returns A226, it should be A261 according to
    Soundex - Wikipedia, the free encyclopedia
    and
    Soundex Indexing
    I found the issue in a thread in Oracle forum:
    OTN Discussion Forums : Soundex Issue ...

    It seems same on some other languages by looking into External links in Wikipedia.

    I thought that the issue was came by ignoring following paragraph in Soundex Indexing
    If "H" or "W" separate two consonants that have the same soundex code, the consonant to the right of the vowel is not coded. Example:
    Ashcraft is coded A-261 (A, 2 for the S, C ignored, 6 for the R, 1 for the F). It is not coded A-226.
    In fact, there was a description in "Text::Soundex Perl module from CPAN" in External links in Wikipedia.
    To use Text::Soundex to generate codes that can be used to search one of the publically available US Censuses, a variant of the soundex algorithm must be used:
    Code:
        use Text::Soundex;
        $code = soundex_nara($name);
    An example of where these algorithm differ follows:
    Code:
        use Text::Soundex;
        print soundex("Ashcraft"), "\n";       # prints: A226
        print soundex_nara("Ashcraft"), "\n";  # prints: A261
    Following code is a test to check my guess.
    Result:
    Code:
    ------------------------------------------------------------------------------
    
    SURNAME                         SOUNDEX HANDLE_HW_AS_VOWELS SPECIAL_HANDLING_FOR_HW
    ------------------------------- ------- ------------------- -----------------------
    Washington                      W252    W252                W252                   
    Lee                             L000    L000                L000                   
    Gutierrez                       G362    G362                G362                   
    Pfister                         P236    P236                P236                   
    Jackson                         J250    J250                J250                   
    Tymczak                         T522    T522                T522                   
    VanDeusen                       V532    V532                V532                   
    Ashcraft                        A226    A226                A261   <-----          
    Robert                          R163    R163                R163                   
    Rubin                           R150    R150                R150                   
    Knuth Donald                    K533    K533                K535   <-----          
    Mac'Donald                      M235    M235                M235                   
    von Neumann                     V555    V555                V555                   
    Ho                              H000    H000                H000                   
    *****                           Z000    Z000                Z000                   
    Hello World!                    H464    H464                H464                   
    <!--file:///D|/db2s1e951.pdf--> F431    F431                F431                   
    
      17 record(s) selected.
    Query which generated the result:
    Code:
    ------------------------------ Commands Entered ------------------------------
    WITH
     test_data(surname) AS (VALUES
     'Washington'
    ,'Lee'
    ,'Gutierrez'
    ,'Pfister'
    ,'Jackson'
    ,'Tymczak'
    ,'VanDeusen'
    ,'Ashcraft'
    ,'Robert'
    ,'Rubin'
    ,'Knuth Donald'
    ,'Mac''Donald'
    ,'von Neumann'
    ,'Ho'
    ,'*****'
    ,'Hello World!'
    ,'<!--file:///D|/db2s1e951.pdf-->'
    )
    SELECT surname
         , SOUNDEX(surname) soundex
         , (SELECT 
                   SUBSTR(alphabetic,1,1) ||
                   SUBSTR(
                   REPLACE(
                   REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
                   REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
                     TRANSLATE(SUBSTR(alphabetic,1,1),'111122222222334556'        ,'BFPVCGJKQSXZDTLMNR') ||
    --                 REPLACE( REPLACE(
                       TRANSLATE(SUBSTR(alphabetic,2),'111122222222334556--------','BFPVCGJKQSXZDTLMNRAEIOUYHW')
    --                 ,'H','') ,'W','')
                   ,'111','1') ,'222','2') ,'333','3') ,'444','4') ,'555','5') ,'666','6')
                   ,'11' ,'1') ,'22' ,'2') ,'33' ,'3') ,'44' ,'4') ,'55' ,'5') ,'66' ,'6')
                   ,'-','') || '000'
                   ,2,3)
              FROM LATERAL
                   (VALUES
                      COALESCE( NULLIF(
                      REPLACE(
                      TRANSLATE(
                        UPPER(surname),'',TRANSLATE(
                                            UPPER(surname),'','ABCDEFGHIJKLMNOPQRSTUVWXYZ'
                                          )
                      )
                      ,' ','')
                      ,'') ,'Z')
                   ) q(alphabetic)
           )  handle_hw_as_vowels
         , (SELECT 
                   SUBSTR(alphabetic,1,1) ||
                   SUBSTR(
                   REPLACE(
                   REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
                   REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
                     TRANSLATE(SUBSTR(alphabetic,1,1),'111122222222334556'        ,'BFPVCGJKQSXZDTLMNR') ||
                     REPLACE( REPLACE(
                       TRANSLATE(SUBSTR(alphabetic,2),'111122222222334556------'  ,'BFPVCGJKQSXZDTLMNRAEIOUY')
                     ,'H','') ,'W','')
                   ,'111','1') ,'222','2') ,'333','3') ,'444','4') ,'555','5') ,'666','6')
                   ,'11' ,'1') ,'22' ,'2') ,'33' ,'3') ,'44' ,'4') ,'55' ,'5') ,'66' ,'6')
                   ,'-','') || '000'
                   ,2,3)
              FROM LATERAL
                   (VALUES
                      COALESCE( NULLIF(
                      REPLACE(
                      TRANSLATE(
                        UPPER(surname),'',TRANSLATE(
                                            UPPER(surname),'','ABCDEFGHIJKLMNOPQRSTUVWXYZ'
                                          )
                      )
                      ,' ','')
                      ,'') ,'Z')
                   ) q(alphabetic)
           )  special_handling_for_hw
      FROM test_data
    ;
    Last edited by tonkuma; 04-27-09 at 03:01.

  2. #2
    Join Date
    Jul 2009
    Posts
    150
    I am sorry, tonkuma, but why you are interesting about this problem ?

    Who are using this code in our world ?

    Kara S.

  3. #3
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    I don't know who are using soundex.

    But, Wikipedia was updated recently.
    "This page was last modified on 12 September 2009 at 12:30."
    So, at least, someone(other than me) may be interested in soundex, now.

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by DB2Plus
    I am sorry, tonkuma, but why you are interesting about this problem ?

    Who are using this code in our world ?
    I bet there are more people using soundex than those solving sudokus using SQL...
    ---
    "It does not work" is not a valid problem statement.

  5. #5
    Join Date
    Jul 2009
    Posts
    150

    Thumbs down

    Quote Originally Posted by n_i
    I bet there are more people using soundex than those solving sudokus using SQL...
    Exactly, Kolja, exactly !
    But this complain not to me, but to lenny77.

    I prefer code of agent Zigzag Eddie Chapman, because I can code some messages and someone can decode them later.

    Soundex working in one way and I don't know what reason to use it.
    Also this code generate duplicates.

    Sincerely Kara S.

  6. #6
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by DB2Plus
    I don't know what reason to use it.
    In such circumstances many people avoid making blanket statements.
    ---
    "It does not work" is not a valid problem statement.

  7. #7
    Join Date
    Jul 2009
    Posts
    150

    Post SOUNDEX function. V9.

    I found something !

    SOUNDEX

    | The SOUNDEX function returns a 4 character code that represents the sound
    | of the words in the argument. The result can be used to compare with the
    | sound of other strings.
    SOUNDEX(expression)

    | The data type of the result is CHAR(4). If the argument can be null, the
    | result can be null; if the argument is null, the result is the null value.


    | The CCSID of the result is the Unicode SBCS CCSID.


    | The SOUNDEX function is useful for finding strings for which the sound is
    | known but the precise spelling is not. It makes assumptions about the way
    | that letters and combinations of letters sound that can help to search for
    | words with similar sounds. The comparison of words can be done directly or
    | by passing the strings as arguments to the DIFFERENCE function.


    | Example 1: Use the SOUNDEX function to find a row where the sound of the
    | LASTNAME value closely matches the phonetic spelling of 'Loucesy':

    |
    Code:
    SELECT EMPNO, LASTNAME
     |         FROM DSN910.EMPLOYEE
     |         WHERE SOUNDEX(LASTNAME) = SOUNDEX('Loucesy');
    | This example returns the following row:

    |
    000110 LUCCHESSI
    3.2.124 "DB2 V9.1 for z/OS SQL Reference" IBM Library Server

    Kara S.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •