| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

10-20-08, 18:08
|
|
Registered User
|
|
Join Date: Oct 2008
Posts: 19
|
|
|
Table UDF/Stored Procedure - output any data as returned rows
|
|
I'm not sure if my subject was clear but here's the issue I'm having.
I'm working on a project where I take some data that is on a single row and parse it out onto separate rows using recursion.
My colleagues are concerned that the recursion is what is making it slow which might be the case.
The reason I Went with recursion is that I know that the union all is going to allow the rows to union together into nice, generic row output. All that is happening is basically a bunch of unions.
I don't know how to do this without recursion but I was told that it is likely doable with a table UDF written in C or some other language.
Before I pursue figuring it out, I figured I'd ask if anyone knows if this is true or if there's a better way.
Basically, lets say I want to take a row with 2 columns:
c1 | c2
-----------
123 | abc
And I want to turn it into:
c1 | c2
------------
1 | a
2 | b
3 | c
With recursion, this is very simple. The question is whether or not this is doable without recursion in such a way where the data is returned as regular db2 rows. By regular rows, I mean that I could say:
select * from my.table a where (a.key, a.val) in
(
select c1, c2 from table(my.udf()) b
)
I just have no idea how to do this without unions. I know I could prepare a giant SQL statement but that would not work with the vast amounts of data I'd be playing with.
Any help is appreciated.
Thanks in advance!
|
|

10-20-08, 19:25
|
|
Registered User
|
|
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
|
|
You can write a table function that does the parsing and returns 1 row on each FETCH call. You can then call the function like this:
Code:
SELECT *
FROM TABLE ( youTableUDF(inputString) ) AS t
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
|
|

10-20-08, 19:48
|
|
Registered User
|
|
Join Date: Sep 2008
Posts: 6
|
|
|
|
I looked at the "Fetch" function and I'm not sure how it would work. I'll have to look at it more.
So what you're saying is that every time I fetch some values, I can somehow output it. For instance, in my example:
Basically, lets say I want to take a row with 2 columns:
c1 | c2
-----------
123 | abc
And I want to turn it into:
c1 | c2
------------
1 | a
2 | b
3 | c
(below, column separator is | )
For the row 123 | abc, I could take the values 1 | a and then return them as a row, then 2 | b and return them as a row, and 3 | c and return them as a row, all from the same call? I assume that's what you're saying. I'll have to spend more time on it in the morning.
But would that be considered more efficient than having a single recursive query that outputs everything without literally millions of fetches?
|
|

10-20-08, 19:53
|
|
Registered User
|
|
Join Date: Sep 2008
Posts: 6
|
|
I forgot to mention that when I looked at fetch, it seemed that you fetch into host variables. Is there something in fetch that makes it just output these variables into a row? That's the part I'm having trouble with. I understand I can take the values into the fetch, but I'm not sure how to output the variables as a row.
I'm reading but its a lot of documentation to sift thru.
|
|

10-21-08, 00:03
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,193
|
|
The CTE col_to_row in the following example would give you some idea.
Code:
------------------------------ Commands Entered ------------------------------
WITH
table(key, val) AS (
VALUES
('1', 'a')
,('2', 'a')
,('3', 'c')
,('4', 'b')
)
,data(c1, c2) AS (
VALUES ('123', 'abc')
)
,col_to_row(c1, c2) AS (
SELECT SUBSTR(c1, n, 1)
, SUBSTR(c2, n, 1)
FROM data
, LATERAL
(SELECT 1 + n1 + n2*10 + n3*100
/* or
INTEGER( ROW_NUMBER() OVER() )
*/
FROM (VALUES 0,1,2,3,4,5,6,7,8,9) N1(n1)
INNER JOIN
(VALUES 0,1,2,3,4,5,6,7,8,9) N2(n2)
ON LENGTH(c1) / 10 >= n2
INNER JOIN
(VALUES 0,1,2,3,4,5,6,7,8,9) N3(n3)
ON LENGTH(c1) / 100 >= n3
WHERE n1 + n2*10 + n3*100 < LENGTH(c1)
) N(n)
)
SELECT *
FROM table
WHERE (key, val)
IN (
SELECT c1, c2
FROM col_to_row
)
;
------------------------------------------------------------------------------
KEY VAL
--- ---
1 a
3 c
2 record(s) selected.
|
|

10-21-08, 04:13
|
|
Registered User
|
|
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
|
|
I was talking about the FETCH call made to a table UDF - not the FETCH SQL statement - and there is no FETCH function at all in DB2.
Table UDFs work like a cursor/iterator:
- first you get a single OPEN call. During that, you do all necessary preparations and precomputations on the input parameters
- then come 1 ore more FETCH calls. On each FETCH call, you return the next row - or SQLSTATE 02000 to indicate you are done
- finally, DB2 makes a CLOSE call to the UDF so that you can clean up whatever needs to be cleaned up
Additionally, you can use a SCRATCHPAD (for C/C++ UDFs) to store temporary information like the position in the string that you have processed last. For Java UDFs, you can put such information into class members. The class members or SCRATCHPAD information is carried from one call to the UDF to the next.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
|
|

10-21-08, 10:03
|
|
Registered User
|
|
Join Date: Oct 2008
Posts: 19
|
|
Stolze:
This is the fetch that I Was looking at in DB2. Is it because I called it a function instead of a statement?
Forgive me but I'm still not sure what you're talking about.
I currently have a table udf with a recursive function in it that does all the work. I want to get rid of the recursion that does the work. What I need to do is to have my UDF output my computed rows 1 by 1 in the UDF. I don't know what command, statement, or function I need to do that or if it is possible.
I know that I can put a query into a table function, parameterize it using the table function parameters, and then output the query's output. What needs to happen, however, is I select on a table, perform a while loop on each row to split the data until the row runs out (kinda like my example), and then output each split. Then move onto the next row in the primary select and do the same.
Is that possible with a PL SQL UDF table function? Or do I need to do it in another language? Do I need a stored procedure instead?
----------------------------------------
Tonkuma, I saw a similar thing you commented on on the IBM developer forums last night. Thanks for the thought.
I have to learn more about the "lateral" function and whether or not it will work with stuff. What makes me think that it might not is that I'm not just pivoting columns, but I'm splitting individual columns and pulling their pieces out which makes me think this has to be more iterative.
If anything, I'll have learned a new function that I've never seen :-) .
|
|

10-21-08, 15:23
|
|
Registered User
|
|
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
|
|
I was talking about an external UDF, i.e. one implemented in C/C++ or Java.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
|
|

10-23-08, 09:09
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,193
|
|
Quote:
|
..., but I'm splitting individual columns and pulling their pieces out which makes me think this has to be more iterative.
|
Which way to split columns?
If each pieces in columns are separated by a separater character(comma, etc.), it must be possible without using iterative ways.
Here is an example:
Although this is very long and troublesome to read, (I think) the logics of each CTEs are simple.
Code:
------------------------------ Commands Entered ------------------------------
WITH testdata(c1, c2) AS (
VALUES ('aaa,b,cccc', '11,22222,333')
)
,position_of_separators(n, f1, f2) AS (
SELECT n
, CASE
WHEN n BETWEEN 1 AND LENGTH(c1) THEN
CASE
WHEN SUBSTR(c1, n, 1) = ',' THEN
'Y'
ELSE NULL
END
WHEN n IN (0, LENGTH(c1) + 1) THEN
'Y'
ELSE NULL
END
, CASE
WHEN n BETWEEN 1 AND LENGTH(c2) THEN
CASE
WHEN SUBSTR(c2, n, 1) = ',' THEN
'Y'
ELSE NULL
END
WHEN n IN (0, LENGTH(c2) + 1) THEN
'Y'
ELSE NULL
END
FROM testdata
INNER JOIN
LATERAL
(SELECT n1 + n2*10 + n3*100
FROM (VALUES 0,1,2,3,4,5,6,7,8,9) N1(n1)
INNER JOIN
(VALUES 0,1,2,3,4,5,6,7,8,9) N2(n2)
ON (LENGTH(c1) + 1) / 10 >= n2
OR (LENGTH(c2) + 1) / 10 >= n2
INNER JOIN
(VALUES 0,1,2,3,4,5,6,7,8,9) N3(n3)
ON (LENGTH(c1) + 1) / 100 >= n3
OR (LENGTH(c2) + 1) / 100 >= n3
) N(n)
ON n <= LENGTH(c1) + 1
OR n <= LENGTH(c2) + 1
)
,numbered_position_of_separators(n, f1, k1, f2, k2) AS (
SELECT n
, f1
, INTEGER( ROW_NUMBER() OVER(ORDER BY f1 NULLS LAST, n) )
, f2
, INTEGER( ROW_NUMBER() OVER(ORDER BY f2 NULLS LAST, n) )
FROM position_of_separators
WHERE f1 IS NOT NULL
OR f2 IS NOT NULL
)
SELECT n1b.k1 AS k
, SUBSTR(c1, n1b.n+1, n1e.n-n1b.n-1) AS c1
, SUBSTR(c2, n2b.n+1, n2e.n-n2b.n-1) AS c2
FROM testdata
INNER JOIN
numbered_position_of_separators n1b
ON n1b.f1 IS NOT NULL
INNER JOIN
numbered_position_of_separators n1e
ON n1e.f1 IS NOT NULL
AND n1e.k1 = n1b.k1 + 1
INNER JOIN
numbered_position_of_separators n2b
ON n2b.k2 = n1b.k1
AND n2b.f2 IS NOT NULL
INNER JOIN
numbered_position_of_separators n2e
ON n2e.f2 IS NOT NULL
AND n2e.k2 = n2b.k2 + 1
ORDER BY k
;
------------------------------------------------------------------------------
K C1 C2
----------- ---------- ------------
1 aaa 11
2 b 22222
3 cccc 333
3 record(s) selected.
|
|

10-23-08, 16:41
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,193
|
|
1) LATERAL is a keyword to enable reference to columns of previous table references in the follwing subquery(nested table expression).
There is no extra functionarities(I think).
It's a same JOIN without LATERAL keyword.
2) Followings are the results of each CTEs and final select.
Code:
------------------------------------------------------------------------------
SELECT * FROM testdata;
------------------------------------------------------------------------------
C1 C2
---------- ------------
aaa,b,cccc 11,22222,333
1 record(s) selected.
Code:
------------------------------------------------------------------------------
SELECT * FROM position_of_separators
ORDER BY n;
------------------------------------------------------------------------------
N F1 F2
----------- -- --
0 Y Y
1 - -
2 - -
3 - Y
4 Y -
5 - -
6 Y -
7 - -
8 - -
9 - Y
10 - -
11 Y -
12 - -
13 - Y
14 record(s) selected.
Code:
------------------------------------------------------------------------------
SELECT * FROM numbered_position_of_separators
ORDER BY n;
------------------------------------------------------------------------------
N F1 K1 F2 K2
----------- -- ----------- -- -----------
0 Y 1 Y 1
3 - 5 Y 2
4 Y 2 - 5
6 Y 3 - 6
9 - 6 Y 3
11 Y 4 - 7
13 - 7 Y 4
7 record(s) selected.
Code:
------------------------------------------------------------------------------
SELECT n1b.k1 AS k
, SUBSTR(c1, n1b.n+1, n1e.n-n1b.n-1) AS c1
, SMALLINT(n1b.n) AS "n1b.n"
, SMALLINT(n1b.k1) AS "n1b.k1"
, SMALLINT(n1e.n) AS "n1e.n"
, SMALLINT(n1e.k1) AS "n1e.k1"
, SUBSTR(c2, n2b.n+1, n2e.n-n2b.n-1) AS c2
, SMALLINT(n2b.n) AS "n2b.n"
, SMALLINT(n2b.k2) AS "n2b.k2"
, SMALLINT(n2e.n) AS "n2e.n"
, SMALLINT(n2e.k2) AS "n2e.k2"
FROM testdata
INNER JOIN
numbered_position_of_separators n1b
ON n1b.f1 IS NOT NULL
INNER JOIN
numbered_position_of_separators n1e
ON n1e.f1 IS NOT NULL
AND n1e.k1 = n1b.k1 + 1
INNER JOIN
numbered_position_of_separators n2b
ON n2b.k2 = n1b.k1
AND n2b.f2 IS NOT NULL
INNER JOIN
numbered_position_of_separators n2e
ON n2e.f2 IS NOT NULL
AND n2e.k2 = n2b.k2 + 1
ORDER BY k
;
------------------------------------------------------------------------------
K C1 n1b.n n1b.k1 n1e.n n1e.k1 C2 n2b.n n2b.k2 n2e.n n2e.k2
----------- ---------- ------ ------ ------ ------ ------------ ------ ------ ------ ------
1 aaa 0 1 4 2 11 0 1 3 2
2 b 4 2 6 3 22222 3 2 9 3
3 cccc 6 3 11 4 333 9 3 13 4
3 record(s) selected.
|
|

06-04-10, 14:58
|
|
Registered User
|
|
Join Date: Jun 2010
Posts: 1
|
|
|
|

06-06-10, 14:47
|
|
Registered User
|
|
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
|
|
You are aware that this is completely irrelevant because T-SQL is not the SQL dialect implemented by DB2?
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
|
|

06-08-10, 07:43
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,193
|
|
Although, I showed a way without using iteration(or recursion),
it would be shorter and easy to understand by using recursion.
Hore is a sample UDF:
(You can specify DEFAULT for the parameters of a function on DB2 for LUW 9.7 fixpack 2.)
Code:
------------------------------ Commands Entered ------------------------------
connect to SAMPLE ;
------------------------------------------------------------------------------
Database Connection Information
Database server = DB2/NT 9.7.2
SQL authorization ID = DB2ADMIN
Local database alias = SAMPLE
A JDBC connection to the target has succeeded.
------------------------------ Commands Entered ------------------------------
CREATE FUNCTION splitString
( inString VARCHAR(4000)
, inDelimiter VARCHAR(1) DEFAULT ','
)
RETURNS
TABLE( element VARCHAR(100)
, seq INTEGER
, nbr INTEGER
, pos INTEGER
)
LANGUAGE SQL
READS SQL DATA
DETERMINISTIC
NO EXTERNAL ACTION
RETURN
WITH
find_delemiters(k , pos) AS (
VALUES (0 , 0 )
UNION ALL
SELECT k + 1
, LOCATE(inDelimiter , inString , pos + 1)
FROM find_delemiters
WHERE k < 10000
AND (k = 0 OR pos > 0)
)
SELECT SUBSTR(inString , pos1 + 1 , pos2 - pos1 - 1)
, ROW_NUMBER() OVER()
, k2
, pos1 + 1
FROM find_delemiters
AS s1(k1 , pos1)
, (SELECT k
, COALESCE( NULLIF(pos , 0)
, LENGTH(inString) + 1
)
FROM find_delemiters
) AS s2(k2 , pos2)
WHERE k2 = k1 + 1
AND pos2 - pos1 > 1
;
------------------------------------------------------------------------------
DB20000I The SQL command completed successfully.
An example using the UDF:
Code:
------------------------------ Commands Entered ------------------------------
WITH testdata(k , c1 , c2) AS (
VALUES
(1 , 'aaa,b,cccc' , ',,11,22222,,,333,4444,')
, (2 , 'd,eeeee,,fff,gg,h' , '5555555')
)
SELECT k
, COALESCE(s1.seq , s2.seq) AS seq
, CAST(s1.element AS VARCHAR(10) ) AS c1_element
, s1.nbr , s1.pos
, CAST(s2.element AS VARCHAR(10) ) AS c2_element
, s2.nbr , s2.pos
FROM testdata
LEFT OUTER JOIN
TABLE( splitString(c1) ) AS s1
FULL OUTER JOIN
TABLE( splitString(c2) ) AS s2
ON s2.seq = s1.seq
ON 0=0
ORDER BY
k , seq
;
------------------------------------------------------------------------------
K SEQ C1_ELEMENT NBR POS C2_ELEMENT NBR POS
----------- ----------- ---------- ----------- ----------- ---------- ----------- -----------
1 1 aaa 1 1 11 3 3
1 2 b 2 5 22222 4 6
1 3 cccc 3 7 333 7 14
1 4 - - - 4444 8 18
2 1 d 1 1 5555555 1 1
2 2 eeeee 2 3 - - -
2 3 fff 4 10 - - -
2 4 gg 5 14 - - -
2 5 h 6 17 - - -
9 record(s) selected.
If you are not using DB2 for LUW 9.7 fixpack 2,
remove DEFAULT clause from inDelimiter in CREATE FUNCTION statement
and specify the delimiter(',') at the time of invoking the UDF.
|
Last edited by tonkuma; 06-08-10 at 07:48.
Reason: Change testdata a little.
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|