If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Using a tokenizer function possible in SQL?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-29-12, 08:04
Shooza Shooza is offline
Registered User
 
Join Date: Mar 2012
Posts: 3
Using a tokenizer function possible in SQL?

Hi Everyone,

I've just started working with DB2 and my first major problem is being able to tokenize a string and extracting the N'th element of it.

So suppose I've got a table called "Person" and it contains fields FirstName,LastName, Address.

The Address is stored "DoorNumber:RoadName:Town:Postcode" as one long string with colon to separate each part of the string.

I need query result showing:

FirstName
LastName
DoorNumber
RoadName
Town
PostCode

Is this possible? Would be very handy if someone could post a solution but I'm more than happy to just be pointed in the right direction as I've had no success thus far.

Thanks.
Reply With Quote
  #2 (permalink)  
Old 03-29-12, 08:57
Shooza Shooza is offline
Registered User
 
Join Date: Mar 2012
Posts: 3
I think the solution may be to write a UDF that takes a string, the separator and the placement of the substring to return as parameters, and then it can be called like this:

tokenizer(Address, ":", 3)

which should return the "Town" part of the address.

Am I on the right lines here?

So then the final SQL code would look like:

SELECT FirstName, LastName, tokenizer(Address,":",1) as DoorNum, tokenizer(Address,":",2) as RoadName, tokenizer(Address,":",3) as TownName, tokenizer(Address,":",4) as PostCode
FROM Table_Person

If this is the case, can anyone point me to where I can get any info on writing a UDF please?
Reply With Quote
  #3 (permalink)  
Old 03-29-12, 13:13
lelle12 lelle12 is offline
Registered User
 
Join Date: Mar 2003
Posts: 202
Quote:
Originally Posted by Shooza View Post
I think the solution may be to write a UDF that takes a string, the separator and the placement of the substring to return as parameters, and then it can be called like this:

tokenizer(Address, ":", 3)

which should return the "Town" part of the address.

Am I on the right lines here?

So then the final SQL code would look like:

SELECT FirstName, LastName, tokenizer(Address,":",1) as DoorNum, tokenizer(Address,":",2) as RoadName, tokenizer(Address,":",3) as TownName, tokenizer(Address,":",4) as PostCode
FROM Table_Person

If this is the case, can anyone point me to where I can get any info on writing a UDF please?

There's an old article by Knut Stoltze here:

Parsing Strings in SQL

I assume normalizing the table is not an option?
__________________
--
Lennart
Reply With Quote
  #4 (permalink)  
Old 03-29-12, 14:17
tonkuma tonkuma is online now
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,814
Two examples.

Tested on DB2 Express-C 9.7.5 on Windows/XP.

Example 1(tokenizer) may be similar to your required.
Code:
/*
Description:
 Returns "placement"th element in "in_string" separated by "separator".
Note:
(1) If "placement" was zero or negative
    Then "placement" = 1 is assumed.
(2) If number of elements was less than "placement"
    Then returns zero length string(i.e. '').
(3) If no "separator" was found and "placement" was 1 (including assumed)
    Then returns whole "in_string".
*/
Exampe 2(extract_elements) may be similar to stolze's elements function, but all in one function.
Code:
/*
Description:
 Returns elements in "in_string" separated by "separator" with placement.
Note: If no "separator" was found Then returns whole "in_string" with placement = 1.
*/
Example 1:
Code:
------------------------------ Commands Entered ------------------------------
CREATE FUNCTION tokenizer
( in_string VARCHAR(4000)
, separator VARCHAR(1)
, placement INTEGER
)
  RETURNS VARCHAR(4000)
  READS SQL DATA
  DETERMINISTIC
  NO EXTERNAL ACTION
/*
Description: Returns "placement"th element in "in_string" separated by "separator".
Note:
(1) If "placement" was zero or negative
    Then "placement" = 1 is assumed.
(2) If number of elements was less than "placement"
    Then returns zero length string(i.e. '').
(3) If no "separator" was found and "placement" was 1 (including assumed)
    Then returns whole "in_string".
*/
RETURN
SELECT COALESCE(
          SUBSTR(
             in_string
           , start_position
           , next_separator - start_position
          )
        , ''
       )
 FROM (VALUES MAX(placement , 1) ) f(norm_place)
     , LATERAL
      (VALUES (  NULLIF(
                    INSTR(separator || in_string , separator , 1 , norm_place)
                  , 0
                 )
               , COALESCE(
                    NULLIF(
                       INSTR(in_string , separator , 1 , norm_place)
                     , 0
                    )
                  , LENGTH(in_string) + 1
                 )
              )
      ) f(start_position , next_separator)
;
------------------------------------------------------------------------------
DB20000I  The SQL command completed successfully.
Example 1a: Sample usage of tokenizer function
Code:
------------------------------ Commands Entered ------------------------------
SELECT SUBSTR(tokenizer(Address ,':' , 1) , 1 , 10) AS Door_Number
     , SUBSTR(tokenizer(Address ,':' , 2) , 1 , 10) AS Road_Name
     , SUBSTR(tokenizer(Address ,':' , 3) , 1 , 10) AS Town_Name
     , SUBSTR(tokenizer(Address ,':' , 4) , 1 , 10) AS Post_Code
     , SUBSTR(tokenizer(Address ,'*' , 1) , 1 , 35) AS invalid_separator_placement_1
     , SUBSTR(tokenizer(Address ,'*' , 2) , 1 , 35) AS invalid_separator_placement_2
 FROM (VALUES 'DoorNumber:RoadName:Town:Postcode' ) test(address);
------------------------------------------------------------------------------

DOOR_NUMBER ROAD_NAME  TOWN_NAME  POST_CODE  INVALID_SEPARATOR_PLACEMENT_1       INVALID_SEPARATOR_PLACEMENT_2      
----------- ---------- ---------- ---------- ----------------------------------- -----------------------------------
DoorNumber  RoadName   Town       Postcode   DoorNumber:RoadName:Town:Postcode                                      

  1 record(s) selected.

Example 2:
Code:
------------------------------ Commands Entered ------------------------------
CREATE FUNCTION extract_elements
( in_string VARCHAR(4000)
, separator VARCHAR(1)
)
  RETURNS TABLE
( placement INTEGER
, element   VARCHAR(4000)
)
  READS SQL DATA
  DETERMINISTIC
  NO EXTERNAL ACTION
/*
Description: Returns elements in "in_string" separated by "separator" with placement.
Note: If no "separator" was found Then returns whole "in_string" with placement = 1.
*/
RETURN
WITH
  search_separator(placement , position) AS (
VALUES (0 , 0)
UNION ALL
SELECT placement + 1
     , LOCATE(separator , in_string , position + 1)
 FROM  search_separator
 WHERE placement < 4000
   AND
   (   position  > 0
    OR placement = 0
   )
)
SELECT placement + 1
     , SUBSTR(
          in_string
        , position + 1
        , LEAD(position , 1 , LENGTH(in_string) + 1)
             OVER(ORDER BY placement)
          - position - 1
       )
 FROM  search_separator
 WHERE position  > 0
   OR  placement = 0;
------------------------------------------------------------------------------
DB20000I  The SQL command completed successfully.
Example 2a: sample usage of extract_elements function.
Code:
------------------------------ Commands Entered ------------------------------
SELECT placement
     , SUBSTR(element , 1 , 35) AS element
 FROM TABLE( extract_elements('DoorNumber:RoadName:Town:Postcode' , ':') )t
;
------------------------------------------------------------------------------

PLACEMENT   ELEMENT                            
----------- -----------------------------------
          1 DoorNumber                         
          2 RoadName                           
          3 Town                               
          4 Postcode                           

  4 record(s) selected.

Last edited by tonkuma; 03-29-12 at 14:55. Reason: Replace function name LOCATE_IN_STRING with synonym INSTR. Remove "AS placement" in final SELECT in Example 2.
Reply With Quote
  #5 (permalink)  
Old 03-29-12, 15:58
Lenny77 Lenny77 is offline
Registered User
 
Join Date: Jul 2009
Location: NY
Posts: 960
Question

Gentelmen, why nobody try to use the recursive for this ?

Lenny

Last edited by Lenny77; 03-29-12 at 16:41.
Reply With Quote
  #6 (permalink)  
Old 03-29-12, 16:15
tonkuma tonkuma is online now
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,814
Lenny,

I used recursive query in the function-body of my Example 2.
Stolze also used it in the function-body of elemIdx in his article which Lennart(lelle12) mentioned.
Reply With Quote
  #7 (permalink)  
Old 03-29-12, 16:43
Lenny77 Lenny77 is offline
Registered User
 
Join Date: Jul 2009
Location: NY
Posts: 960
Quote:
Originally Posted by tonkuma View Post
Lenny,

I used recursive query in the function-body of my Example 2.
Stolze also used it in the function-body of elemIdx in his article which Lennart(lelle12) mentioned.
I am asking because in my opinion we don't need to create the function for solve this problem.
Maybe I'm wrong ?
Reply With Quote
  #8 (permalink)  
Old 03-29-12, 17:59
Lenny77 Lenny77 is offline
Registered User
 
Join Date: Jul 2009
Location: NY
Posts: 960
Cool DB2 made easy

Tonkuma, sometimes something more readable also make sense:

Code:
With input as
(
select varchar('DoorNumber:RoadName:Town:Postcode', 255) instr
, char(':', 1) separ 
from sysibm.sysdummy1
)
,
tokens (inst, token, separ) as
(select instr, varchar('', 255) token, separ
   from input
union all
select 
substr(inst, 2)
,case when left(inst, 1) <> separ then token || left(inst, 1)
      else '' 
 end  
,separ
from tokens
where  substr(inst, 1) > ' '
 ) select token from tokens
where left(inst, 1) in (separ, space(1))
Result:

Quote:
TOKEN
DoorNumber
RoadName
Town
Postcode
Lenny

Last edited by Lenny77; 03-30-12 at 08:28.
Reply With Quote
  #9 (permalink)  
Old 03-30-12, 08:21
tonkuma tonkuma is online now
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,814
Quote:
I am asking because in my opinion we don't need to create the function for solve this problem.
If considered only the extraction of elements from a string, you would be right.
But, this was used in a query and results include other columns.
So, UDF would make whole query shorter and readable.

Quote:
... sometimes something more readable also make sense:
I considered execution efficiency at the time of coding of my Example 2, same as readability.

Main considerations were
(a) Reduce number of iterations by utilizing built-in functions(SYSIBM schema).
(b) String handling(substr, concat, so on...) may be heavier than integer calculations.
(c) Make shorter row size for intermediate result of recursive query.
Reply With Quote
  #10 (permalink)  
Old 03-30-12, 08:37
Lenny77 Lenny77 is offline
Registered User
 
Join Date: Jul 2009
Location: NY
Posts: 960
Arrow Recursive VS Table Function

Quote:
Originally Posted by tonkuma View Post
If considered only the extraction of elements from a string, you would be right.
But, this was used in a query and results include other columns.
So, UDF would make whole query shorter and readable.


I considered execution efficiency at the time of coding of my Example 2, same as readability.

Main considerations were
(a) Reduce number of iterations by utilizing built-in functions(SYSIBM schema).
(b) String handling(substr, concat, so on...) may be heavier than integer calculations.
(c) Make shorter row size for intermediate result of recursive query.
I agree with you, tonkuma.
But it has to be The Table Function and you have to make Join Table to the main table which will reduce the performance.

Lenny
Reply With Quote
  #11 (permalink)  
Old 03-30-12, 09:40
tonkuma tonkuma is online now
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,814
"To use UDF or not" might be balance of programming productivity, readability, execution efficiency, so on...

The UDF will hide(or encapsulate?) complexity of recursion for compensation of some performance degradation.

For example: Consider to get the same result without using the UDF.
Code:
------------------------------ Commands Entered ------------------------------
WITH
 person(first_name , last_name , address) AS (
SELECT 'Shooza'  , 'Shoo' , 'DoorNumber:RoadName:Town:Postcode' FROM sysibm.sysdummy1 UNION ALL
SELECT 'Tonkuma' , 'Ton'  , '1-2-3:Village st.:Miyaji:876-5432' FROM sysibm.sysdummy1
)
SELECT CASE placement WHEN 1 THEN first_name ELSE '' END AS first_name
     , CASE placement WHEN 1 THEN last_name  ELSE '' END AS last_name
     , VARCHAR(element , 15) AS address_element
 FROM  person                                   p
     , TABLE( extract_elements(address , ':') ) t
 ORDER BY
       p.first_name
     , placement
;
------------------------------------------------------------------------------

FIRST_NAME LAST_NAME ADDRESS_ELEMENT
---------- --------- ---------------
Shooza     Shoo      DoorNumber     
                     RoadName       
                     Town           
                     Postcode       
Tonkuma    Ton       1-2-3          
                     Village st.    
                     Miyaji         
                     876-5432       

  8 record(s) selected.

Last edited by tonkuma; 03-30-12 at 09:45.
Reply With Quote
  #12 (permalink)  
Old 03-30-12, 10:36
Shooza Shooza is offline
Registered User
 
Join Date: Mar 2012
Posts: 3
Thank you

Thank you all for taking the time out to help me with my problem.

I very much appreciate it.

Thanks to your help, I have now resolved the problem, which I wouldn't have been able to do on my own.
Reply With Quote
  #13 (permalink)  
Old 03-30-12, 11:01
Lenny77 Lenny77 is offline
Registered User
 
Join Date: Jul 2009
Location: NY
Posts: 960
It could be more interesting if we have the different separators in a single string:

'DoorNumber;RoadName,Town:Postcode'.

Lenny
Reply With Quote
  #14 (permalink)  
Old 03-30-12, 11:02
Lenny77 Lenny77 is offline
Registered User
 
Join Date: Jul 2009
Location: NY
Posts: 960
Question How

Quote:
Originally Posted by Shooza View Post
Thank you all for taking the time out to help me with my problem.

I very much appreciate it.

Thanks to your help, I have now resolved the problem, which I wouldn't have been able to do on my own.
What method did you use ?

Lenny
Reply With Quote
  #15 (permalink)  
Old 03-31-12, 00:06
tonkuma tonkuma is online now
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,814
Quote:
Originally Posted by Lenny77 View Post
It could be more interesting if we have the different separators in a single string:

'DoorNumber;RoadName,Town:Postcode'.

Lenny
Step 1: Replace all separators in the string with a separator(e.g. leftmost character in the separators).
Step 2: Apply the technique for single separator.

Example:
Code:
------------------------------ Commands Entered ------------------------------
WITH
  person(first_name , last_name , address) AS (
SELECT 'Shooza'  , 'Shoo' , 'DoorNumber;RoadName,Town:Postcode' FROM sysibm.sysdummy1 UNION ALL
SELECT 'Tonkuma' , 'Ton'  , '1-2-3@Village st.,Miyaji/876-5432' FROM sysibm.sysdummy1
)
, parameter(separators) AS (
SELECT ';:,@/' FROM sysibm.sysdummy1
)
SELECT CASE placement WHEN 1 THEN first_name ELSE '' END AS first_name
     , CASE placement WHEN 1 THEN last_name  ELSE '' END AS last_name
     , VARCHAR(element , 15) AS address_element
 FROM  person    p
     , parameter
     , TABLE(
         extract_elements(
            TRANSLATE(address , '' , separators , LEFT(separators , 1) )
          , LEFT(separators , 1)
         )
       ) t
 ORDER BY
       p.first_name
     , placement
;
------------------------------------------------------------------------------

FIRST_NAME LAST_NAME ADDRESS_ELEMENT
---------- --------- ---------------
Shooza     Shoo      DoorNumber     
                     RoadName       
                     Town           
                     Postcode       
Tonkuma    Ton       1-2-3          
                     Village st.    
                     Miyaji         
                     876-5432       

  8 record(s) selected.
Reply With Quote
Reply

Tags
db2, string function

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On