Results 1 to 15 of 15
  1. #1
    Join Date
    Mar 2012
    Posts
    3

    Unanswered: Using a tokenizer function possible in SQL?

    Hi Everyone,

    I've just started working with DB2 and my first major problem is being able to tokenize a string and extracting the N'th element of it.

    So suppose I've got a table called "Person" and it contains fields FirstName,LastName, Address.

    The Address is stored "DoorNumber:RoadName:Town:Postcode" as one long string with colon to separate each part of the string.

    I need query result showing:

    FirstName
    LastName
    DoorNumber
    RoadName
    Town
    PostCode

    Is this possible? Would be very handy if someone could post a solution but I'm more than happy to just be pointed in the right direction as I've had no success thus far.

    Thanks.

  2. #2
    Join Date
    Mar 2012
    Posts
    3
    I think the solution may be to write a UDF that takes a string, the separator and the placement of the substring to return as parameters, and then it can be called like this:

    tokenizer(Address, ":", 3)

    which should return the "Town" part of the address.

    Am I on the right lines here?

    So then the final SQL code would look like:

    SELECT FirstName, LastName, tokenizer(Address,":",1) as DoorNum, tokenizer(Address,":",2) as RoadName, tokenizer(Address,":",3) as TownName, tokenizer(Address,":",4) as PostCode
    FROM Table_Person

    If this is the case, can anyone point me to where I can get any info on writing a UDF please?

  3. #3
    Join Date
    Mar 2003
    Posts
    280
    Quote Originally Posted by Shooza View Post
    I think the solution may be to write a UDF that takes a string, the separator and the placement of the substring to return as parameters, and then it can be called like this:

    tokenizer(Address, ":", 3)

    which should return the "Town" part of the address.

    Am I on the right lines here?

    So then the final SQL code would look like:

    SELECT FirstName, LastName, tokenizer(Address,":",1) as DoorNum, tokenizer(Address,":",2) as RoadName, tokenizer(Address,":",3) as TownName, tokenizer(Address,":",4) as PostCode
    FROM Table_Person

    If this is the case, can anyone point me to where I can get any info on writing a UDF please?

    There's an old article by Knut Stoltze here:

    Parsing Strings in SQL

    I assume normalizing the table is not an option?
    --
    Lennart

  4. #4
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Two examples.

    Tested on DB2 Express-C 9.7.5 on Windows/XP.

    Example 1(tokenizer) may be similar to your required.
    Code:
    /*
    Description:
     Returns "placement"th element in "in_string" separated by "separator".
    Note:
    (1) If "placement" was zero or negative
        Then "placement" = 1 is assumed.
    (2) If number of elements was less than "placement"
        Then returns zero length string(i.e. '').
    (3) If no "separator" was found and "placement" was 1 (including assumed)
        Then returns whole "in_string".
    */
    Exampe 2(extract_elements) may be similar to stolze's elements function, but all in one function.
    Code:
    /*
    Description:
     Returns elements in "in_string" separated by "separator" with placement.
    Note: If no "separator" was found Then returns whole "in_string" with placement = 1.
    */
    Example 1:
    Code:
    ------------------------------ Commands Entered ------------------------------
    CREATE FUNCTION tokenizer
    ( in_string VARCHAR(4000)
    , separator VARCHAR(1)
    , placement INTEGER
    )
      RETURNS VARCHAR(4000)
      READS SQL DATA
      DETERMINISTIC
      NO EXTERNAL ACTION
    /*
    Description: Returns "placement"th element in "in_string" separated by "separator".
    Note:
    (1) If "placement" was zero or negative
        Then "placement" = 1 is assumed.
    (2) If number of elements was less than "placement"
        Then returns zero length string(i.e. '').
    (3) If no "separator" was found and "placement" was 1 (including assumed)
        Then returns whole "in_string".
    */
    RETURN
    SELECT COALESCE(
              SUBSTR(
                 in_string
               , start_position
               , next_separator - start_position
              )
            , ''
           )
     FROM (VALUES MAX(placement , 1) ) f(norm_place)
         , LATERAL
          (VALUES (  NULLIF(
                        INSTR(separator || in_string , separator , 1 , norm_place)
                      , 0
                     )
                   , COALESCE(
                        NULLIF(
                           INSTR(in_string , separator , 1 , norm_place)
                         , 0
                        )
                      , LENGTH(in_string) + 1
                     )
                  )
          ) f(start_position , next_separator)
    ;
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    Example 1a: Sample usage of tokenizer function
    Code:
    ------------------------------ Commands Entered ------------------------------
    SELECT SUBSTR(tokenizer(Address ,':' , 1) , 1 , 10) AS Door_Number
         , SUBSTR(tokenizer(Address ,':' , 2) , 1 , 10) AS Road_Name
         , SUBSTR(tokenizer(Address ,':' , 3) , 1 , 10) AS Town_Name
         , SUBSTR(tokenizer(Address ,':' , 4) , 1 , 10) AS Post_Code
         , SUBSTR(tokenizer(Address ,'*' , 1) , 1 , 35) AS invalid_separator_placement_1
         , SUBSTR(tokenizer(Address ,'*' , 2) , 1 , 35) AS invalid_separator_placement_2
     FROM (VALUES 'DoorNumber:RoadName:Town:Postcode' ) test(address);
    ------------------------------------------------------------------------------
    
    DOOR_NUMBER ROAD_NAME  TOWN_NAME  POST_CODE  INVALID_SEPARATOR_PLACEMENT_1       INVALID_SEPARATOR_PLACEMENT_2      
    ----------- ---------- ---------- ---------- ----------------------------------- -----------------------------------
    DoorNumber  RoadName   Town       Postcode   DoorNumber:RoadName:Town:Postcode                                      
    
      1 record(s) selected.

    Example 2:
    Code:
    ------------------------------ Commands Entered ------------------------------
    CREATE FUNCTION extract_elements
    ( in_string VARCHAR(4000)
    , separator VARCHAR(1)
    )
      RETURNS TABLE
    ( placement INTEGER
    , element   VARCHAR(4000)
    )
      READS SQL DATA
      DETERMINISTIC
      NO EXTERNAL ACTION
    /*
    Description: Returns elements in "in_string" separated by "separator" with placement.
    Note: If no "separator" was found Then returns whole "in_string" with placement = 1.
    */
    RETURN
    WITH
      search_separator(placement , position) AS (
    VALUES (0 , 0)
    UNION ALL
    SELECT placement + 1
         , LOCATE(separator , in_string , position + 1)
     FROM  search_separator
     WHERE placement < 4000
       AND
       (   position  > 0
        OR placement = 0
       )
    )
    SELECT placement + 1
         , SUBSTR(
              in_string
            , position + 1
            , LEAD(position , 1 , LENGTH(in_string) + 1)
                 OVER(ORDER BY placement)
              - position - 1
           )
     FROM  search_separator
     WHERE position  > 0
       OR  placement = 0;
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    Example 2a: sample usage of extract_elements function.
    Code:
    ------------------------------ Commands Entered ------------------------------
    SELECT placement
         , SUBSTR(element , 1 , 35) AS element
     FROM TABLE( extract_elements('DoorNumber:RoadName:Town:Postcode' , ':') )t
    ;
    ------------------------------------------------------------------------------
    
    PLACEMENT   ELEMENT                            
    ----------- -----------------------------------
              1 DoorNumber                         
              2 RoadName                           
              3 Town                               
              4 Postcode                           
    
      4 record(s) selected.
    Last edited by tonkuma; 03-29-12 at 15:55. Reason: Replace function name LOCATE_IN_STRING with synonym INSTR. Remove "AS placement" in final SELECT in Example 2.

  5. #5
    Join Date
    Jul 2009
    Location
    NY
    Posts
    963

    Question

    Gentelmen, why nobody try to use the recursive for this ?

    Lenny
    Last edited by Lenny77; 03-29-12 at 17:41.

  6. #6
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Lenny,

    I used recursive query in the function-body of my Example 2.
    Stolze also used it in the function-body of elemIdx in his article which Lennart(lelle12) mentioned.

  7. #7
    Join Date
    Jul 2009
    Location
    NY
    Posts
    963
    Quote Originally Posted by tonkuma View Post
    Lenny,

    I used recursive query in the function-body of my Example 2.
    Stolze also used it in the function-body of elemIdx in his article which Lennart(lelle12) mentioned.
    I am asking because in my opinion we don't need to create the function for solve this problem.
    Maybe I'm wrong ?

  8. #8
    Join Date
    Jul 2009
    Location
    NY
    Posts
    963

    Cool DB2 made easy

    Tonkuma, sometimes something more readable also make sense:

    Code:
    With input as
    (
    select varchar('DoorNumber:RoadName:Town:Postcode', 255) instr
    , char(':', 1) separ 
    from sysibm.sysdummy1
    )
    ,
    tokens (inst, token, separ) as
    (select instr, varchar('', 255) token, separ
       from input
    union all
    select 
    substr(inst, 2)
    ,case when left(inst, 1) <> separ then token || left(inst, 1)
          else '' 
     end  
    ,separ
    from tokens
    where  substr(inst, 1) > ' '
     ) select token from tokens
    where left(inst, 1) in (separ, space(1))
    Result:

    TOKEN
    DoorNumber
    RoadName
    Town
    Postcode
    Lenny
    Last edited by Lenny77; 03-30-12 at 09:28.

  9. #9
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    I am asking because in my opinion we don't need to create the function for solve this problem.
    If considered only the extraction of elements from a string, you would be right.
    But, this was used in a query and results include other columns.
    So, UDF would make whole query shorter and readable.

    ... sometimes something more readable also make sense:
    I considered execution efficiency at the time of coding of my Example 2, same as readability.

    Main considerations were
    (a) Reduce number of iterations by utilizing built-in functions(SYSIBM schema).
    (b) String handling(substr, concat, so on...) may be heavier than integer calculations.
    (c) Make shorter row size for intermediate result of recursive query.

  10. #10
    Join Date
    Jul 2009
    Location
    NY
    Posts
    963

    Arrow Recursive VS Table Function

    Quote Originally Posted by tonkuma View Post
    If considered only the extraction of elements from a string, you would be right.
    But, this was used in a query and results include other columns.
    So, UDF would make whole query shorter and readable.


    I considered execution efficiency at the time of coding of my Example 2, same as readability.

    Main considerations were
    (a) Reduce number of iterations by utilizing built-in functions(SYSIBM schema).
    (b) String handling(substr, concat, so on...) may be heavier than integer calculations.
    (c) Make shorter row size for intermediate result of recursive query.
    I agree with you, tonkuma.
    But it has to be The Table Function and you have to make Join Table to the main table which will reduce the performance.

    Lenny

  11. #11
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    "To use UDF or not" might be balance of programming productivity, readability, execution efficiency, so on...

    The UDF will hide(or encapsulate?) complexity of recursion for compensation of some performance degradation.

    For example: Consider to get the same result without using the UDF.
    Code:
    ------------------------------ Commands Entered ------------------------------
    WITH
     person(first_name , last_name , address) AS (
    SELECT 'Shooza'  , 'Shoo' , 'DoorNumber:RoadName:Town:Postcode' FROM sysibm.sysdummy1 UNION ALL
    SELECT 'Tonkuma' , 'Ton'  , '1-2-3:Village st.:Miyaji:876-5432' FROM sysibm.sysdummy1
    )
    SELECT CASE placement WHEN 1 THEN first_name ELSE '' END AS first_name
         , CASE placement WHEN 1 THEN last_name  ELSE '' END AS last_name
         , VARCHAR(element , 15) AS address_element
     FROM  person                                   p
         , TABLE( extract_elements(address , ':') ) t
     ORDER BY
           p.first_name
         , placement
    ;
    ------------------------------------------------------------------------------
    
    FIRST_NAME LAST_NAME ADDRESS_ELEMENT
    ---------- --------- ---------------
    Shooza     Shoo      DoorNumber     
                         RoadName       
                         Town           
                         Postcode       
    Tonkuma    Ton       1-2-3          
                         Village st.    
                         Miyaji         
                         876-5432       
    
      8 record(s) selected.
    Last edited by tonkuma; 03-30-12 at 10:45.

  12. #12
    Join Date
    Mar 2012
    Posts
    3

    Thank you

    Thank you all for taking the time out to help me with my problem.

    I very much appreciate it.

    Thanks to your help, I have now resolved the problem, which I wouldn't have been able to do on my own.

  13. #13
    Join Date
    Jul 2009
    Location
    NY
    Posts
    963
    It could be more interesting if we have the different separators in a single string:

    'DoorNumber;RoadName,Town:Postcode'.

    Lenny

  14. #14
    Join Date
    Jul 2009
    Location
    NY
    Posts
    963

    Question How

    Quote Originally Posted by Shooza View Post
    Thank you all for taking the time out to help me with my problem.

    I very much appreciate it.

    Thanks to your help, I have now resolved the problem, which I wouldn't have been able to do on my own.
    What method did you use ?

    Lenny

  15. #15
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Quote Originally Posted by Lenny77 View Post
    It could be more interesting if we have the different separators in a single string:

    'DoorNumber;RoadName,Town:Postcode'.

    Lenny
    Step 1: Replace all separators in the string with a separator(e.g. leftmost character in the separators).
    Step 2: Apply the technique for single separator.

    Example:
    Code:
    ------------------------------ Commands Entered ------------------------------
    WITH
      person(first_name , last_name , address) AS (
    SELECT 'Shooza'  , 'Shoo' , 'DoorNumber;RoadName,Town:Postcode' FROM sysibm.sysdummy1 UNION ALL
    SELECT 'Tonkuma' , 'Ton'  , '1-2-3@Village st.,Miyaji/876-5432' FROM sysibm.sysdummy1
    )
    , parameter(separators) AS (
    SELECT ';:,@/' FROM sysibm.sysdummy1
    )
    SELECT CASE placement WHEN 1 THEN first_name ELSE '' END AS first_name
         , CASE placement WHEN 1 THEN last_name  ELSE '' END AS last_name
         , VARCHAR(element , 15) AS address_element
     FROM  person    p
         , parameter
         , TABLE(
             extract_elements(
                TRANSLATE(address , '' , separators , LEFT(separators , 1) )
              , LEFT(separators , 1)
             )
           ) t
     ORDER BY
           p.first_name
         , placement
    ;
    ------------------------------------------------------------------------------
    
    FIRST_NAME LAST_NAME ADDRESS_ELEMENT
    ---------- --------- ---------------
    Shooza     Shoo      DoorNumber     
                         RoadName       
                         Town           
                         Postcode       
    Tonkuma    Ton       1-2-3          
                         Village st.    
                         Miyaji         
                         876-5432       
    
      8 record(s) selected.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •