If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > PostCode: Iterative partial match

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 07-01-10, 04:00
sigmazen sigmazen is offline
Registered User
 
Join Date: Jan 2010
Posts: 6
PostCode: Iterative partial match

Hi there
I've got an interesting problem with postcode searches. The question is how to find the nearest match if the exact isn't found using SQL
(ie in a program it would be IF +100 then remove a trailing char and try again)

Example Table
==========
PostCode Long_Lat_Value
--------- --------------
DE4 Value#1
DE45 Value#2
DE45UA Value#3
DE45UM Value#4
DE45UZ Value#5

So if I search for DE45UZ I get Value#5
If I search for DE45UG then I would want to iterate by removing trailing chars until I found a match ... in this case a match would be on DE45 and give me Value#2
And finally if I search for DE49AJ I iterate again and the match would return Value#1

I'm sure this will have been done before using CRTs and fancy SQL so apologies if this is a repeat.
Thanks in advance to any help
Cheers
Simon
Reply With Quote
  #2 (permalink)  
Old 07-01-10, 07:36
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,195
The sample query returned this result:
Code:
SEARCH_WORD POSTCODE   LONG_LAT_VALUE
----------- ---------- --------------
BUZ         No matched               
DE45UG      DE45       Value#2       
DE45UZ      DE45UZ     Value#5       
DE49AJ      DE4        Value#1       

  4 record(s) selected.
sample query:
Code:
------------------------------ Commands Entered ------------------------------
WITH sample_data
(postcode , long_lat_value) AS (
VALUES
  ('DE4'    , 'Value#1')
, ('DE45'   , 'Value#2')
, ('DE45UA' , 'Value#3')
, ('DE45UM' , 'Value#4')
, ('DE45UZ' , 'Value#5')
)
, search_words(search_word) AS (
VALUES 'DE45UZ' , 'DE45UG' , 'DE49AJ' , 'BUZ'
)
SELECT search_word
     , CASE
       WHEN matched_rank >= 0 THEN
            postcode
       ELSE 'No matched'
       END  AS postcode
     , CASE
       WHEN matched_rank >= 0 THEN
            long_lat_value
       ELSE ''
       END  AS long_lat_value
  FROM (SELECT search_word
             , postcode , long_lat_value
             , matched_rank
             , ROW_NUMBER() OVER(PARTITION BY search_word
                                     ORDER BY matched_rank DESC NULLS LAST) AS rn
          FROM (SELECT search_word
                     , postcode , long_lat_value
                     , MIN(CASE
                           WHEN LEFT(search_word , rank) = postcode
                           THEN rank
                           END
                          ) AS matched_rank
                 FROM  search_words s
                 CROSS JOIN
                       sample_data  t
                 INNER JOIN
                       (VALUES 1,2,3,4,5,6,7,8,9,10) r(rank)
                   ON  LENGTH(search_word) >= rank
                 GROUP BY
                       search_word
                     , postcode , long_lat_value
               )
       )
 WHERE rn = 1
;
------------------------------------------------------------------------------

SEARCH_WORD POSTCODE   LONG_LAT_VALUE
----------- ---------- --------------
BUZ         No matched               
DE45UG      DE45       Value#2       
DE45UZ      DE45UZ     Value#5       
DE49AJ      DE4        Value#1       

  4 record(s) selected.

Last edited by tonkuma; 07-01-10 at 08:28. Reason: Changed 3 phrase(markd bold, red)
Reply With Quote
  #3 (permalink)  
Old 07-01-10, 08:57
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,195
Nested table expressions could be reduced to one and join r(rank) was not necessary.

Another example:
Code:
------------------------------ Commands Entered ------------------------------
WITH sample_data
(postcode , long_lat_value) AS (
VALUES
  ('DE4'    , 'Value#1')
, ('DE45'   , 'Value#2')
, ('DE45UA' , 'Value#3')
, ('DE45UM' , 'Value#4')
, ('DE45UZ' , 'Value#5')
)
, search_words(search_word) AS (
VALUES 'DE45UZ' , 'DE45UG' , 'DE49AJ' , 'BUZ'
)
SELECT search_word
     , COALESCE(postcode , 'No matched') AS postcode
     , COALESCE(long_lat_value , '')     AS long_lat_value
  FROM (SELECT s.* , t.*
             , ROW_NUMBER()
                 OVER(PARTITION BY search_word
                          ORDER BY postcode DESC) AS rn
          FROM search_words s
          LEFT OUTER JOIN
               sample_data  t
           ON  LEFT(search_word , LENGTH(postcode) ) = postcode
       )
 WHERE rn = 1
;
------------------------------------------------------------------------------

SEARCH_WORD POSTCODE   LONG_LAT_VALUE
----------- ---------- --------------
BUZ         No matched               
DE45UG      DE45       Value#2       
DE45UZ      DE45UZ     Value#5       
DE49AJ      DE4        Value#1       

  4 record(s) selected.

Last edited by tonkuma; 07-01-10 at 22:41. Reason: Replace with a simple way using LEFT OUTER JOIN.
Reply With Quote
  #4 (permalink)  
Old 07-02-10, 05:17
sigmazen sigmazen is offline
Registered User
 
Join Date: Jan 2010
Posts: 6
Smile

Hi tonkuma
Thanks a million for this :-) Utterly superb and elegant.
I've tweaked it for zOS SQL and this is the result:


WITH SAMPLE_DATA(POSTCODE, LONG_LAT_VALUE) AS (
SELECT 'DE4' , 'Value#1' FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 'DE45' , 'Value#2' FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 'DE45UA', 'Value#3' FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 'DE45UM', 'Value#4' FROM SYSIBM.SYSDUMMY1 UNION ALL
SELECT 'DE45UZ', 'Value#5' FROM SYSIBM.SYSDUMMY1
)

SELECT COALESCE(X.POSTCODE, 'No match'), COALESCE(X.LONG_LAT_VALUE, '')
FROM (
SELECT T.POSTCODE, T.LONG_LAT_VALUE
, ROW_NUMBER()
OVER(PARTITION BY SEARCH_WORD
ORDER BY POSTCODE DESC) AS RN
FROM (SELECT RPAD(REPLACE(UCASE('De4 5uQ'),' ', ''),10,' ') AS SEARCH_WORD FROM SYSIBM.SYSDUMMY1) AS S
LEFT OUTER JOIN SAMPLE_DATA AS T
ON LEFT(S.SEARCH_WORD, LENGTH(T.POSTCODE) ) = POSTCODE
) AS X
WHERE RN = 1


Works a treat
Thanks again for your ultra-quick, excellent response
Cheers
Simon
Reply With Quote
Reply

Tags
iterative, partial, postcode, search, text

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On