| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

04-27-09, 01:43
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,193
|
|
|
Soundex issue
|
|
While soundex('Ashcraft') returns A226, it should be A261 according to
Soundex - Wikipedia, the free encyclopedia
and
Soundex Indexing
I found the issue in a thread in Oracle forum:
OTN Discussion Forums : Soundex Issue ...
It seems same on some other languages by looking into External links in Wikipedia.
I thought that the issue was came by ignoring following paragraph in Soundex Indexing
Quote:
If "H" or "W" separate two consonants that have the same soundex code, the consonant to the right of the vowel is not coded. Example:
Ashcraft is coded A-261 (A, 2 for the S, C ignored, 6 for the R, 1 for the F). It is not coded A-226.
|
In fact, there was a description in "Text::Soundex Perl module from CPAN" in External links in Wikipedia.
Quote:
To use Text::Soundex to generate codes that can be used to search one of the publically available US Censuses, a variant of the soundex algorithm must be used:
Code:
use Text::Soundex;
$code = soundex_nara($name);
An example of where these algorithm differ follows:
Code:
use Text::Soundex;
print soundex("Ashcraft"), "\n"; # prints: A226
print soundex_nara("Ashcraft"), "\n"; # prints: A261
|
Following code is a test to check my guess.
Result:
Code:
------------------------------------------------------------------------------
SURNAME SOUNDEX HANDLE_HW_AS_VOWELS SPECIAL_HANDLING_FOR_HW
------------------------------- ------- ------------------- -----------------------
Washington W252 W252 W252
Lee L000 L000 L000
Gutierrez G362 G362 G362
Pfister P236 P236 P236
Jackson J250 J250 J250
Tymczak T522 T522 T522
VanDeusen V532 V532 V532
Ashcraft A226 A226 A261 <-----
Robert R163 R163 R163
Rubin R150 R150 R150
Knuth Donald K533 K533 K535 <-----
Mac'Donald M235 M235 M235
von Neumann V555 V555 V555
Ho H000 H000 H000
***** Z000 Z000 Z000
Hello World! H464 H464 H464
<!--file:///D|/db2s1e951.pdf--> F431 F431 F431
17 record(s) selected.
Query which generated the result:
Code:
------------------------------ Commands Entered ------------------------------
WITH
test_data(surname) AS (VALUES
'Washington'
,'Lee'
,'Gutierrez'
,'Pfister'
,'Jackson'
,'Tymczak'
,'VanDeusen'
,'Ashcraft'
,'Robert'
,'Rubin'
,'Knuth Donald'
,'Mac''Donald'
,'von Neumann'
,'Ho'
,'*****'
,'Hello World!'
,'<!--file:///D|/db2s1e951.pdf-->'
)
SELECT surname
, SOUNDEX(surname) soundex
, (SELECT
SUBSTR(alphabetic,1,1) ||
SUBSTR(
REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
TRANSLATE(SUBSTR(alphabetic,1,1),'111122222222334556' ,'BFPVCGJKQSXZDTLMNR') ||
-- REPLACE( REPLACE(
TRANSLATE(SUBSTR(alphabetic,2),'111122222222334556--------','BFPVCGJKQSXZDTLMNRAEIOUYHW')
-- ,'H','') ,'W','')
,'111','1') ,'222','2') ,'333','3') ,'444','4') ,'555','5') ,'666','6')
,'11' ,'1') ,'22' ,'2') ,'33' ,'3') ,'44' ,'4') ,'55' ,'5') ,'66' ,'6')
,'-','') || '000'
,2,3)
FROM LATERAL
(VALUES
COALESCE( NULLIF(
REPLACE(
TRANSLATE(
UPPER(surname),'',TRANSLATE(
UPPER(surname),'','ABCDEFGHIJKLMNOPQRSTUVWXYZ'
)
)
,' ','')
,'') ,'Z')
) q(alphabetic)
) handle_hw_as_vowels
, (SELECT
SUBSTR(alphabetic,1,1) ||
SUBSTR(
REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
REPLACE( REPLACE( REPLACE( REPLACE( REPLACE( REPLACE(
TRANSLATE(SUBSTR(alphabetic,1,1),'111122222222334556' ,'BFPVCGJKQSXZDTLMNR') ||
REPLACE( REPLACE(
TRANSLATE(SUBSTR(alphabetic,2),'111122222222334556------' ,'BFPVCGJKQSXZDTLMNRAEIOUY')
,'H','') ,'W','')
,'111','1') ,'222','2') ,'333','3') ,'444','4') ,'555','5') ,'666','6')
,'11' ,'1') ,'22' ,'2') ,'33' ,'3') ,'44' ,'4') ,'55' ,'5') ,'66' ,'6')
,'-','') || '000'
,2,3)
FROM LATERAL
(VALUES
COALESCE( NULLIF(
REPLACE(
TRANSLATE(
UPPER(surname),'',TRANSLATE(
UPPER(surname),'','ABCDEFGHIJKLMNOPQRSTUVWXYZ'
)
)
,' ','')
,'') ,'Z')
) q(alphabetic)
) special_handling_for_hw
FROM test_data
;
|
Last edited by tonkuma; 04-27-09 at 02:01.
|

10-11-09, 00:52
|
|
Registered User
|
|
Join Date: Jul 2009
Posts: 150
|
|
I am sorry, tonkuma, but why you are interesting about this problem ?
Who are using this code in our world ?
Kara S.
|
|

10-11-09, 01:42
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,193
|
|
|
|
I don't know who are using soundex.
But, Wikipedia was updated recently.
Quote:
|
"This page was last modified on 12 September 2009 at 12:30."
|
So, at least, someone(other than me) may be interested in soundex, now.
|
|

10-11-09, 10:09
|
|
:-)
|
|
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
|
|
Quote:
|
Originally Posted by DB2Plus
I am sorry, tonkuma, but why you are interesting about this problem ?
Who are using this code in our world ?
|
I bet there are more people using soundex than those solving sudokus using SQL...
|
|

10-11-09, 11:17
|
|
Registered User
|
|
Join Date: Jul 2009
Posts: 150
|
|
Quote:
|
Originally Posted by n_i
I bet there are more people using soundex than those solving sudokus using SQL...
|
Exactly, Kolja, exactly !
But this complain not to me, but to lenny77.
I prefer code of agent Zigzag Eddie Chapman, because I can code some messages and someone can decode them later.
Soundex working in one way and I don't know what reason to use it.
Also this code generate duplicates.
Sincerely Kara S.
|
|

10-12-09, 11:00
|
|
:-)
|
|
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
|
|
Quote:
|
Originally Posted by DB2Plus
I don't know what reason to use it.
|
In such circumstances many people avoid making blanket statements.
|
|

10-17-09, 17:47
|
|
Registered User
|
|
Join Date: Jul 2009
Posts: 150
|
|
SOUNDEX function. V9.
I found something !
SOUNDEX
| The SOUNDEX function returns a 4 character code that represents the sound
| of the words in the argument. The result can be used to compare with the
| sound of other strings.
SOUNDEX(expression)
| The data type of the result is CHAR(4). If the argument can be null, the
| result can be null; if the argument is null, the result is the null value.
| The CCSID of the result is the Unicode SBCS CCSID.
| The SOUNDEX function is useful for finding strings for which the sound is
| known but the precise spelling is not. It makes assumptions about the way
| that letters and combinations of letters sound that can help to search for
| words with similar sounds. The comparison of words can be done directly or
| by passing the strings as arguments to the DIFFERENCE function.
| Example 1: Use the SOUNDEX function to find a row where the sound of the
| LASTNAME value closely matches the phonetic spelling of 'Loucesy':
|
Code:
SELECT EMPNO, LASTNAME
| FROM DSN910.EMPLOYEE
| WHERE SOUNDEX(LASTNAME) = SOUNDEX('Loucesy');
| This example returns the following row:
| 3.2.124 "DB2 V9.1 for z/OS SQL Reference" IBM Library Server
Kara S.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|