| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

03-29-12, 08:04
|
|
Registered User
|
|
Join Date: Mar 2012
Posts: 3
|
|
|
Using a tokenizer function possible in SQL?
|
|
Hi Everyone,
I've just started working with DB2 and my first major problem is being able to tokenize a string and extracting the N'th element of it.
So suppose I've got a table called "Person" and it contains fields FirstName,LastName, Address.
The Address is stored "DoorNumber:RoadName:Town:Postcode" as one long string with colon to separate each part of the string.
I need query result showing:
FirstName
LastName
DoorNumber
RoadName
Town
PostCode
Is this possible? Would be very handy if someone could post a solution but I'm more than happy to just be pointed in the right direction as I've had no success thus far.
Thanks.
|
|

03-29-12, 08:57
|
|
Registered User
|
|
Join Date: Mar 2012
Posts: 3
|
|
I think the solution may be to write a UDF that takes a string, the separator and the placement of the substring to return as parameters, and then it can be called like this:
tokenizer(Address, ":", 3)
which should return the "Town" part of the address.
Am I on the right lines here?
So then the final SQL code would look like:
SELECT FirstName, LastName, tokenizer(Address,":",1) as DoorNum, tokenizer(Address,":",2) as RoadName, tokenizer(Address,":",3) as TownName, tokenizer(Address,":",4) as PostCode
FROM Table_Person
If this is the case, can anyone point me to where I can get any info on writing a UDF please?
|
|

03-29-12, 13:13
|
|
Registered User
|
|
Join Date: Mar 2003
Posts: 202
|
|
|
|
Quote:
Originally Posted by Shooza
I think the solution may be to write a UDF that takes a string, the separator and the placement of the substring to return as parameters, and then it can be called like this:
tokenizer(Address, ":", 3)
which should return the "Town" part of the address.
Am I on the right lines here?
So then the final SQL code would look like:
SELECT FirstName, LastName, tokenizer(Address,":",1) as DoorNum, tokenizer(Address,":",2) as RoadName, tokenizer(Address,":",3) as TownName, tokenizer(Address,":",4) as PostCode
FROM Table_Person
If this is the case, can anyone point me to where I can get any info on writing a UDF please?
|
There's an old article by Knut Stoltze here:
Parsing Strings in SQL
I assume normalizing the table is not an option?
__________________
--
Lennart
|
|

03-29-12, 14:17
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,814
|
|
Two examples.
Tested on DB2 Express-C 9.7.5 on Windows/XP.
Example 1(tokenizer) may be similar to your required.
Code:
/*
Description:
Returns "placement"th element in "in_string" separated by "separator".
Note:
(1) If "placement" was zero or negative
Then "placement" = 1 is assumed.
(2) If number of elements was less than "placement"
Then returns zero length string(i.e. '').
(3) If no "separator" was found and "placement" was 1 (including assumed)
Then returns whole "in_string".
*/
Exampe 2(extract_elements) may be similar to stolze's elements function, but all in one function.
Code:
/*
Description:
Returns elements in "in_string" separated by "separator" with placement.
Note: If no "separator" was found Then returns whole "in_string" with placement = 1.
*/
Example 1:
Code:
------------------------------ Commands Entered ------------------------------
CREATE FUNCTION tokenizer
( in_string VARCHAR(4000)
, separator VARCHAR(1)
, placement INTEGER
)
RETURNS VARCHAR(4000)
READS SQL DATA
DETERMINISTIC
NO EXTERNAL ACTION
/*
Description: Returns "placement"th element in "in_string" separated by "separator".
Note:
(1) If "placement" was zero or negative
Then "placement" = 1 is assumed.
(2) If number of elements was less than "placement"
Then returns zero length string(i.e. '').
(3) If no "separator" was found and "placement" was 1 (including assumed)
Then returns whole "in_string".
*/
RETURN
SELECT COALESCE(
SUBSTR(
in_string
, start_position
, next_separator - start_position
)
, ''
)
FROM (VALUES MAX(placement , 1) ) f(norm_place)
, LATERAL
(VALUES ( NULLIF(
INSTR(separator || in_string , separator , 1 , norm_place)
, 0
)
, COALESCE(
NULLIF(
INSTR(in_string , separator , 1 , norm_place)
, 0
)
, LENGTH(in_string) + 1
)
)
) f(start_position , next_separator)
;
------------------------------------------------------------------------------
DB20000I The SQL command completed successfully.
Example 1a: Sample usage of tokenizer function
Code:
------------------------------ Commands Entered ------------------------------
SELECT SUBSTR(tokenizer(Address ,':' , 1) , 1 , 10) AS Door_Number
, SUBSTR(tokenizer(Address ,':' , 2) , 1 , 10) AS Road_Name
, SUBSTR(tokenizer(Address ,':' , 3) , 1 , 10) AS Town_Name
, SUBSTR(tokenizer(Address ,':' , 4) , 1 , 10) AS Post_Code
, SUBSTR(tokenizer(Address ,'*' , 1) , 1 , 35) AS invalid_separator_placement_1
, SUBSTR(tokenizer(Address ,'*' , 2) , 1 , 35) AS invalid_separator_placement_2
FROM (VALUES 'DoorNumber:RoadName:Town:Postcode' ) test(address);
------------------------------------------------------------------------------
DOOR_NUMBER ROAD_NAME TOWN_NAME POST_CODE INVALID_SEPARATOR_PLACEMENT_1 INVALID_SEPARATOR_PLACEMENT_2
----------- ---------- ---------- ---------- ----------------------------------- -----------------------------------
DoorNumber RoadName Town Postcode DoorNumber:RoadName:Town:Postcode
1 record(s) selected.
Example 2:
Code:
------------------------------ Commands Entered ------------------------------
CREATE FUNCTION extract_elements
( in_string VARCHAR(4000)
, separator VARCHAR(1)
)
RETURNS TABLE
( placement INTEGER
, element VARCHAR(4000)
)
READS SQL DATA
DETERMINISTIC
NO EXTERNAL ACTION
/*
Description: Returns elements in "in_string" separated by "separator" with placement.
Note: If no "separator" was found Then returns whole "in_string" with placement = 1.
*/
RETURN
WITH
search_separator(placement , position) AS (
VALUES (0 , 0)
UNION ALL
SELECT placement + 1
, LOCATE(separator , in_string , position + 1)
FROM search_separator
WHERE placement < 4000
AND
( position > 0
OR placement = 0
)
)
SELECT placement + 1
, SUBSTR(
in_string
, position + 1
, LEAD(position , 1 , LENGTH(in_string) + 1)
OVER(ORDER BY placement)
- position - 1
)
FROM search_separator
WHERE position > 0
OR placement = 0;
------------------------------------------------------------------------------
DB20000I The SQL command completed successfully.
Example 2a: sample usage of extract_elements function.
Code:
------------------------------ Commands Entered ------------------------------
SELECT placement
, SUBSTR(element , 1 , 35) AS element
FROM TABLE( extract_elements('DoorNumber:RoadName:Town:Postcode' , ':') )t
;
------------------------------------------------------------------------------
PLACEMENT ELEMENT
----------- -----------------------------------
1 DoorNumber
2 RoadName
3 Town
4 Postcode
4 record(s) selected.
|
Last edited by tonkuma; 03-29-12 at 14:55.
Reason: Replace function name LOCATE_IN_STRING with synonym INSTR. Remove "AS placement" in final SELECT in Example 2.
|

03-29-12, 15:58
|
|
Registered User
|
|
Join Date: Jul 2009
Location: NY
Posts: 960
|
|
Gentelmen, why nobody try to use the recursive for this ?
Lenny
|
Last edited by Lenny77; 03-29-12 at 16:41.
|

03-29-12, 16:15
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,814
|
|
Lenny,
I used recursive query in the function-body of my Example 2.
Stolze also used it in the function-body of elemIdx in his article which Lennart(lelle12) mentioned.
|
|

03-29-12, 16:43
|
|
Registered User
|
|
Join Date: Jul 2009
Location: NY
Posts: 960
|
|
Quote:
Originally Posted by tonkuma
Lenny,
I used recursive query in the function-body of my Example 2.
Stolze also used it in the function-body of elemIdx in his article which Lennart(lelle12) mentioned.
|
I am asking because in my opinion we don't need to create the function for solve this problem.
Maybe I'm wrong ?
|
|

03-29-12, 17:59
|
|
Registered User
|
|
Join Date: Jul 2009
Location: NY
Posts: 960
|
|
DB2 made easy
Tonkuma, sometimes something more readable also make sense:
Code:
With input as
(
select varchar('DoorNumber:RoadName:Town:Postcode', 255) instr
, char(':', 1) separ
from sysibm.sysdummy1
)
,
tokens (inst, token, separ) as
(select instr, varchar('', 255) token, separ
from input
union all
select
substr(inst, 2)
,case when left(inst, 1) <> separ then token || left(inst, 1)
else ''
end
,separ
from tokens
where substr(inst, 1) > ' '
) select token from tokens
where left(inst, 1) in (separ, space(1))
Result:
Quote:
TOKEN
DoorNumber
RoadName
Town
Postcode
|
Lenny
|
Last edited by Lenny77; 03-30-12 at 08:28.
|

03-30-12, 08:21
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,814
|
|
Quote:
|
I am asking because in my opinion we don't need to create the function for solve this problem.
|
If considered only the extraction of elements from a string, you would be right.
But, this was used in a query and results include other columns.
So, UDF would make whole query shorter and readable.
Quote:
|
... sometimes something more readable also make sense:
|
I considered execution efficiency at the time of coding of my Example 2, same as readability.
Main considerations were
(a) Reduce number of iterations by utilizing built-in functions(SYSIBM schema).
(b) String handling(substr, concat, so on...) may be heavier than integer calculations.
(c) Make shorter row size for intermediate result of recursive query.
|
|

03-30-12, 08:37
|
|
Registered User
|
|
Join Date: Jul 2009
Location: NY
Posts: 960
|
|
Recursive VS Table Function
Quote:
Originally Posted by tonkuma
If considered only the extraction of elements from a string, you would be right.
But, this was used in a query and results include other columns.
So, UDF would make whole query shorter and readable.
I considered execution efficiency at the time of coding of my Example 2, same as readability.
Main considerations were
(a) Reduce number of iterations by utilizing built-in functions(SYSIBM schema).
(b) String handling(substr, concat, so on...) may be heavier than integer calculations.
(c) Make shorter row size for intermediate result of recursive query.
|
I agree with you, tonkuma.
But it has to be The Table Function and you have to make Join Table to the main table which will reduce the performance.
Lenny
|
|

03-30-12, 09:40
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,814
|
|
"To use UDF or not" might be balance of programming productivity, readability, execution efficiency, so on...
The UDF will hide(or encapsulate?) complexity of recursion for compensation of some performance degradation.
For example: Consider to get the same result without using the UDF.
Code:
------------------------------ Commands Entered ------------------------------
WITH
person(first_name , last_name , address) AS (
SELECT 'Shooza' , 'Shoo' , 'DoorNumber:RoadName:Town:Postcode' FROM sysibm.sysdummy1 UNION ALL
SELECT 'Tonkuma' , 'Ton' , '1-2-3:Village st.:Miyaji:876-5432' FROM sysibm.sysdummy1
)
SELECT CASE placement WHEN 1 THEN first_name ELSE '' END AS first_name
, CASE placement WHEN 1 THEN last_name ELSE '' END AS last_name
, VARCHAR(element , 15) AS address_element
FROM person p
, TABLE( extract_elements(address , ':') ) t
ORDER BY
p.first_name
, placement
;
------------------------------------------------------------------------------
FIRST_NAME LAST_NAME ADDRESS_ELEMENT
---------- --------- ---------------
Shooza Shoo DoorNumber
RoadName
Town
Postcode
Tonkuma Ton 1-2-3
Village st.
Miyaji
876-5432
8 record(s) selected.
|
Last edited by tonkuma; 03-30-12 at 09:45.
|

03-30-12, 10:36
|
|
Registered User
|
|
Join Date: Mar 2012
Posts: 3
|
|
|
Thank you
Thank you all for taking the time out to help me with my problem.
I very much appreciate it.
Thanks to your help, I have now resolved the problem, which I wouldn't have been able to do on my own.
|
|

03-30-12, 11:01
|
|
Registered User
|
|
Join Date: Jul 2009
Location: NY
Posts: 960
|
|
It could be more interesting if we have the different separators in a single string:
'DoorNumber;RoadName,Town:Postcode'.
Lenny
|
|

03-30-12, 11:02
|
|
Registered User
|
|
Join Date: Jul 2009
Location: NY
Posts: 960
|
|
How
Quote:
Originally Posted by Shooza
Thank you all for taking the time out to help me with my problem.
I very much appreciate it.
Thanks to your help, I have now resolved the problem, which I wouldn't have been able to do on my own.
|
What method did you use ?
Lenny
|
|

03-31-12, 00:06
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,814
|
|
Quote:
Originally Posted by Lenny77
It could be more interesting if we have the different separators in a single string:
'DoorNumber;RoadName,Town:Postcode'.
Lenny
|
Step 1: Replace all separators in the string with a separator(e.g. leftmost character in the separators).
Step 2: Apply the technique for single separator.
Example:
Code:
------------------------------ Commands Entered ------------------------------
WITH
person(first_name , last_name , address) AS (
SELECT 'Shooza' , 'Shoo' , 'DoorNumber;RoadName,Town:Postcode' FROM sysibm.sysdummy1 UNION ALL
SELECT 'Tonkuma' , 'Ton' , '1-2-3@Village st.,Miyaji/876-5432' FROM sysibm.sysdummy1
)
, parameter(separators) AS (
SELECT ';:,@/' FROM sysibm.sysdummy1
)
SELECT CASE placement WHEN 1 THEN first_name ELSE '' END AS first_name
, CASE placement WHEN 1 THEN last_name ELSE '' END AS last_name
, VARCHAR(element , 15) AS address_element
FROM person p
, parameter
, TABLE(
extract_elements(
TRANSLATE(address , '' , separators , LEFT(separators , 1) )
, LEFT(separators , 1)
)
) t
ORDER BY
p.first_name
, placement
;
------------------------------------------------------------------------------
FIRST_NAME LAST_NAME ADDRESS_ELEMENT
---------- --------- ---------------
Shooza Shoo DoorNumber
RoadName
Town
Postcode
Tonkuma Ton 1-2-3
Village st.
Miyaji
876-5432
8 record(s) selected.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|