If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Soundex issue

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-27-09, 01:43
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Soundex issue

While soundex('Ashcraft') returns A226, it should be A261 according to
Soundex - Wikipedia, the free encyclopedia
and
Soundex Indexing
I found the issue in a thread in Oracle forum:
OTN Discussion Forums : Soundex Issue ...

It seems same on some other languages by looking into External links in Wikipedia.

I thought that the issue was came by ignoring following paragraph in Soundex Indexing
Quote:
If "H" or "W" separate two consonants that have the same soundex code, the consonant to the right of the vowel is not coded. Example:
Ashcraft is coded A-261 (A, 2 for the S, C ignored, 6 for the R, 1 for the F). It is not coded A-226.
In fact, there was a description in "Text::Soundex Perl module from CPAN" in External links in Wikipedia.
Quote:
To use Text::Soundex to generate codes that can be used to search one of the publically available US Censuses, a variant of the soundex algorithm must be used:
Code:
    use Text::Soundex;
    $code = soundex_nara($name);
An example of where these algorithm differ follows:
Code:
    use Text::Soundex;
    print soundex("Ashcraft"), "\n";       # prints: A226
    print soundex_nara("Ashcraft"), "\n";  # prints: A261
Following code is a test to check my guess.
Result:
Code:
------------------------------------------------------------------------------

SURNAME                         SOUNDEX HANDLE_HW_AS_VOWELS SPECIAL_HANDLING_FOR_HW
------------------------------- ------- ------------------- -----------------------
Washington                      W252    W252                W252                   
Lee                             L000    L000                L000                   
Gutierrez                       G362    G362                G362                   
Pfister                         P236    P236                P236                   
Jackson                         J250    J250                J250                   
Tymczak                         T522    T522                T522                   
VanDeusen                       V532    V532                V532                   
Ashcraft                        A226    A226                A261   <-----          
Robert                          R163    R163                R163                   
Rubin                           R150    R150                R150                   
Knuth Donald                    K533    K533                K535   <-----          
Mac'Donald                      M235    M235                M235                   
von Neumann                     V555    V555                V555                   
Ho                              H000    H000                H000                   
*****                           Z000    Z000                Z000                   
Hello World!                    H464    H464                H464                   
<!--file:///D|/db2s1e951.pdf--> F431    F431                F431                   

  17 record(s) selected.
Query which generated the result:
Code:
------------------------------ Commands Entered ------------------------------
WITH
 test_data(surname) AS (VALUES
 'Washington'
,'Lee'
,'Gutierrez'
,'Pfister'
,'Jackson'
,'Tymczak'
,'VanDeusen'
,'Ashcraft'
,'Robert'
,'Rubin'
,'Knuth Donald'
,'Mac''Donald'
,'von Neumann'
,'Ho'
,'*****'
,'Hello World!'
,'<!--file:///D|/db2s1e951.pdf-->'
)
SELECT surname
     , SOUNDEX(surname) soundex
     , (SELECT 
               SUBSTR(alphabetic,1,1) ||
               SUBSTR(
               REPLACE(
               REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
               REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
                 TRANSLATE(SUBSTR(alphabetic,1,1),'111122222222334556'        ,'BFPVCGJKQSXZDTLMNR') ||
--                 REPLACE( REPLACE(
                   TRANSLATE(SUBSTR(alphabetic,2),'111122222222334556--------','BFPVCGJKQSXZDTLMNRAEIOUYHW')
--                 ,'H','') ,'W','')
               ,'111','1') ,'222','2') ,'333','3') ,'444','4') ,'555','5') ,'666','6')
               ,'11' ,'1') ,'22' ,'2') ,'33' ,'3') ,'44' ,'4') ,'55' ,'5') ,'66' ,'6')
               ,'-','') || '000'
               ,2,3)
          FROM LATERAL
               (VALUES
                  COALESCE( NULLIF(
                  REPLACE(
                  TRANSLATE(
                    UPPER(surname),'',TRANSLATE(
                                        UPPER(surname),'','ABCDEFGHIJKLMNOPQRSTUVWXYZ'
                                      )
                  )
                  ,' ','')
                  ,'') ,'Z')
               ) q(alphabetic)
       )  handle_hw_as_vowels
     , (SELECT 
               SUBSTR(alphabetic,1,1) ||
               SUBSTR(
               REPLACE(
               REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
               REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
                 TRANSLATE(SUBSTR(alphabetic,1,1),'111122222222334556'        ,'BFPVCGJKQSXZDTLMNR') ||
                 REPLACE( REPLACE(
                   TRANSLATE(SUBSTR(alphabetic,2),'111122222222334556------'  ,'BFPVCGJKQSXZDTLMNRAEIOUY')
                 ,'H','') ,'W','')
               ,'111','1') ,'222','2') ,'333','3') ,'444','4') ,'555','5') ,'666','6')
               ,'11' ,'1') ,'22' ,'2') ,'33' ,'3') ,'44' ,'4') ,'55' ,'5') ,'66' ,'6')
               ,'-','') || '000'
               ,2,3)
          FROM LATERAL
               (VALUES
                  COALESCE( NULLIF(
                  REPLACE(
                  TRANSLATE(
                    UPPER(surname),'',TRANSLATE(
                                        UPPER(surname),'','ABCDEFGHIJKLMNOPQRSTUVWXYZ'
                                      )
                  )
                  ,' ','')
                  ,'') ,'Z')
               ) q(alphabetic)
       )  special_handling_for_hw
  FROM test_data
;

Last edited by tonkuma; 04-27-09 at 02:01.
Reply With Quote
  #2 (permalink)  
Old 10-11-09, 00:52
DB2Plus DB2Plus is offline
Registered User
 
Join Date: Jul 2009
Posts: 150
I am sorry, tonkuma, but why you are interesting about this problem ?

Who are using this code in our world ?

Kara S.
Reply With Quote
  #3 (permalink)  
Old 10-11-09, 01:42
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
I don't know who are using soundex.

But, Wikipedia was updated recently.
Quote:
"This page was last modified on 12 September 2009 at 12:30."
So, at least, someone(other than me) may be interested in soundex, now.
Reply With Quote
  #4 (permalink)  
Old 10-11-09, 10:09
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by DB2Plus
I am sorry, tonkuma, but why you are interesting about this problem ?

Who are using this code in our world ?
I bet there are more people using soundex than those solving sudokus using SQL...
Reply With Quote
  #5 (permalink)  
Old 10-11-09, 11:17
DB2Plus DB2Plus is offline
Registered User
 
Join Date: Jul 2009
Posts: 150
Thumbs down

Quote:
Originally Posted by n_i
I bet there are more people using soundex than those solving sudokus using SQL...
Exactly, Kolja, exactly !
But this complain not to me, but to lenny77.

I prefer code of agent Zigzag Eddie Chapman, because I can code some messages and someone can decode them later.

Soundex working in one way and I don't know what reason to use it.
Also this code generate duplicates.

Sincerely Kara S.
Reply With Quote
  #6 (permalink)  
Old 10-12-09, 11:00
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by DB2Plus
I don't know what reason to use it.
In such circumstances many people avoid making blanket statements.
Reply With Quote
  #7 (permalink)  
Old 10-17-09, 17:47
DB2Plus DB2Plus is offline
Registered User
 
Join Date: Jul 2009
Posts: 150
Post SOUNDEX function. V9.

I found something !

SOUNDEX

| The SOUNDEX function returns a 4 character code that represents the sound
| of the words in the argument. The result can be used to compare with the
| sound of other strings.
SOUNDEX(expression)

| The data type of the result is CHAR(4). If the argument can be null, the
| result can be null; if the argument is null, the result is the null value.


| The CCSID of the result is the Unicode SBCS CCSID.


| The SOUNDEX function is useful for finding strings for which the sound is
| known but the precise spelling is not. It makes assumptions about the way
| that letters and combinations of letters sound that can help to search for
| words with similar sounds. The comparison of words can be done directly or
| by passing the strings as arguments to the DIFFERENCE function.


| Example 1: Use the SOUNDEX function to find a row where the sound of the
| LASTNAME value closely matches the phonetic spelling of 'Loucesy':

|
Code:
SELECT EMPNO, LASTNAME
 |         FROM DSN910.EMPLOYEE
 |         WHERE SOUNDEX(LASTNAME) = SOUNDEX('Loucesy');
| This example returns the following row:

|
Quote:
000110 LUCCHESSI
3.2.124 "DB2 V9.1 for z/OS SQL Reference" IBM Library Server

Kara S.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On