If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Table UDF/Stored Procedure - output any data as returned rows

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-20-08, 18:08
hifferyj hifferyj is offline
Registered User
 
Join Date: Oct 2008
Posts: 19
Table UDF/Stored Procedure - output any data as returned rows

I'm not sure if my subject was clear but here's the issue I'm having.

I'm working on a project where I take some data that is on a single row and parse it out onto separate rows using recursion.

My colleagues are concerned that the recursion is what is making it slow which might be the case.

The reason I Went with recursion is that I know that the union all is going to allow the rows to union together into nice, generic row output. All that is happening is basically a bunch of unions.

I don't know how to do this without recursion but I was told that it is likely doable with a table UDF written in C or some other language.

Before I pursue figuring it out, I figured I'd ask if anyone knows if this is true or if there's a better way.

Basically, lets say I want to take a row with 2 columns:

c1 | c2
-----------
123 | abc

And I want to turn it into:

c1 | c2
------------
1 | a
2 | b
3 | c

With recursion, this is very simple. The question is whether or not this is doable without recursion in such a way where the data is returned as regular db2 rows. By regular rows, I mean that I could say:

select * from my.table a where (a.key, a.val) in
(
select c1, c2 from table(my.udf()) b
)

I just have no idea how to do this without unions. I know I could prepare a giant SQL statement but that would not work with the vast amounts of data I'd be playing with.

Any help is appreciated.

Thanks in advance!
Reply With Quote
  #2 (permalink)  
Old 10-20-08, 19:25
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
You can write a table function that does the parsing and returns 1 row on each FETCH call. You can then call the function like this:
Code:
SELECT *
FROM TABLE ( youTableUDF(inputString) ) AS t
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #3 (permalink)  
Old 10-20-08, 19:48
jercole jercole is offline
Registered User
 
Join Date: Sep 2008
Posts: 6
I looked at the "Fetch" function and I'm not sure how it would work. I'll have to look at it more.

So what you're saying is that every time I fetch some values, I can somehow output it. For instance, in my example:

Basically, lets say I want to take a row with 2 columns:

c1 | c2
-----------
123 | abc

And I want to turn it into:

c1 | c2
------------
1 | a
2 | b
3 | c

(below, column separator is | )
For the row 123 | abc, I could take the values 1 | a and then return them as a row, then 2 | b and return them as a row, and 3 | c and return them as a row, all from the same call? I assume that's what you're saying. I'll have to spend more time on it in the morning.

But would that be considered more efficient than having a single recursive query that outputs everything without literally millions of fetches?
Reply With Quote
  #4 (permalink)  
Old 10-20-08, 19:53
jercole jercole is offline
Registered User
 
Join Date: Sep 2008
Posts: 6
I forgot to mention that when I looked at fetch, it seemed that you fetch into host variables. Is there something in fetch that makes it just output these variables into a row? That's the part I'm having trouble with. I understand I can take the values into the fetch, but I'm not sure how to output the variables as a row.

I'm reading but its a lot of documentation to sift thru.
Reply With Quote
  #5 (permalink)  
Old 10-21-08, 00:03
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
The CTE col_to_row in the following example would give you some idea.
Code:
------------------------------ Commands Entered ------------------------------
WITH
 table(key, val) AS (
VALUES
 ('1', 'a')
,('2', 'a')
,('3', 'c')
,('4', 'b')
)
,data(c1, c2) AS (
VALUES ('123', 'abc')
)
,col_to_row(c1, c2) AS (
SELECT SUBSTR(c1, n, 1)
     , SUBSTR(c2, n, 1)
  FROM data
     , LATERAL
       (SELECT 1 + n1 + n2*10 + n3*100
            /* or
               INTEGER( ROW_NUMBER() OVER() )
            */
          FROM (VALUES 0,1,2,3,4,5,6,7,8,9) N1(n1)
         INNER JOIN
               (VALUES 0,1,2,3,4,5,6,7,8,9) N2(n2)
           ON  LENGTH(c1) / 10 >= n2
         INNER JOIN
               (VALUES 0,1,2,3,4,5,6,7,8,9) N3(n3)
           ON  LENGTH(c1) / 100 >= n3
         WHERE n1 + n2*10 + n3*100 < LENGTH(c1)
       ) N(n)
)
SELECT *
  FROM table
 WHERE (key, val)
       IN (
       SELECT c1, c2
         FROM col_to_row
       )
;
------------------------------------------------------------------------------

KEY VAL
--- ---
1   a  
3   c  

  2 record(s) selected.
Reply With Quote
  #6 (permalink)  
Old 10-21-08, 04:13
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
I was talking about the FETCH call made to a table UDF - not the FETCH SQL statement - and there is no FETCH function at all in DB2.

Table UDFs work like a cursor/iterator:
- first you get a single OPEN call. During that, you do all necessary preparations and precomputations on the input parameters
- then come 1 ore more FETCH calls. On each FETCH call, you return the next row - or SQLSTATE 02000 to indicate you are done
- finally, DB2 makes a CLOSE call to the UDF so that you can clean up whatever needs to be cleaned up

Additionally, you can use a SCRATCHPAD (for C/C++ UDFs) to store temporary information like the position in the string that you have processed last. For Java UDFs, you can put such information into class members. The class members or SCRATCHPAD information is carried from one call to the UDF to the next.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #7 (permalink)  
Old 10-21-08, 10:03
hifferyj hifferyj is offline
Registered User
 
Join Date: Oct 2008
Posts: 19
Stolze:

This is the fetch that I Was looking at in DB2. Is it because I called it a function instead of a statement?

Forgive me but I'm still not sure what you're talking about.

I currently have a table udf with a recursive function in it that does all the work. I want to get rid of the recursion that does the work. What I need to do is to have my UDF output my computed rows 1 by 1 in the UDF. I don't know what command, statement, or function I need to do that or if it is possible.

I know that I can put a query into a table function, parameterize it using the table function parameters, and then output the query's output. What needs to happen, however, is I select on a table, perform a while loop on each row to split the data until the row runs out (kinda like my example), and then output each split. Then move onto the next row in the primary select and do the same.

Is that possible with a PL SQL UDF table function? Or do I need to do it in another language? Do I need a stored procedure instead?


----------------------------------------


Tonkuma, I saw a similar thing you commented on on the IBM developer forums last night. Thanks for the thought.

I have to learn more about the "lateral" function and whether or not it will work with stuff. What makes me think that it might not is that I'm not just pivoting columns, but I'm splitting individual columns and pulling their pieces out which makes me think this has to be more iterative.

If anything, I'll have learned a new function that I've never seen :-) .
Reply With Quote
  #8 (permalink)  
Old 10-21-08, 15:23
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
I was talking about an external UDF, i.e. one implemented in C/C++ or Java.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #9 (permalink)  
Old 10-23-08, 09:09
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Quote:
..., but I'm splitting individual columns and pulling their pieces out which makes me think this has to be more iterative.
Which way to split columns?
If each pieces in columns are separated by a separater character(comma, etc.), it must be possible without using iterative ways.

Here is an example:
Although this is very long and troublesome to read, (I think) the logics of each CTEs are simple.

Code:
------------------------------ Commands Entered ------------------------------
WITH testdata(c1, c2) AS (
VALUES ('aaa,b,cccc', '11,22222,333')
)
,position_of_separators(n, f1, f2) AS (
SELECT n
     , CASE
       WHEN n BETWEEN 1 AND LENGTH(c1) THEN
            CASE 
            WHEN SUBSTR(c1, n, 1) = ',' THEN
                 'Y'
            ELSE NULL
            END
       WHEN n IN (0, LENGTH(c1) + 1) THEN
            'Y'
       ELSE NULL
       END
     , CASE
       WHEN n BETWEEN 1 AND LENGTH(c2) THEN
            CASE 
            WHEN SUBSTR(c2, n, 1) = ',' THEN
                 'Y'
            ELSE NULL
            END
       WHEN n IN (0, LENGTH(c2) + 1) THEN
            'Y'
       ELSE NULL
       END
  FROM testdata
 INNER JOIN
       LATERAL
       (SELECT n1 + n2*10 + n3*100
          FROM (VALUES 0,1,2,3,4,5,6,7,8,9) N1(n1)
         INNER JOIN
               (VALUES 0,1,2,3,4,5,6,7,8,9) N2(n2)
           ON  (LENGTH(c1) + 1) / 10 >= n2
           OR  (LENGTH(c2) + 1) / 10 >= n2
         INNER JOIN
               (VALUES 0,1,2,3,4,5,6,7,8,9) N3(n3)
           ON  (LENGTH(c1) + 1) / 100 >= n3
           OR  (LENGTH(c2) + 1) / 100 >= n3
       ) N(n)
   ON  n <= LENGTH(c1) + 1
   OR  n <= LENGTH(c2) + 1
)
,numbered_position_of_separators(n, f1, k1, f2, k2) AS (
SELECT n
     , f1
     , INTEGER( ROW_NUMBER() OVER(ORDER BY f1 NULLS LAST, n) )
     , f2
     , INTEGER( ROW_NUMBER() OVER(ORDER BY f2 NULLS LAST, n) )
  FROM position_of_separators
 WHERE f1 IS NOT NULL
   OR  f2 IS NOT NULL
)
SELECT n1b.k1                             AS k
     , SUBSTR(c1, n1b.n+1, n1e.n-n1b.n-1) AS c1
     , SUBSTR(c2, n2b.n+1, n2e.n-n2b.n-1) AS c2
  FROM testdata
 INNER JOIN
       numbered_position_of_separators n1b
   ON  n1b.f1 IS NOT NULL
 INNER JOIN
       numbered_position_of_separators n1e
   ON  n1e.f1 IS NOT NULL
   AND n1e.k1 = n1b.k1 + 1
 INNER JOIN
       numbered_position_of_separators n2b
   ON  n2b.k2 = n1b.k1
   AND n2b.f2 IS NOT NULL
 INNER JOIN
       numbered_position_of_separators n2e
   ON  n2e.f2 IS NOT NULL
   AND n2e.k2 = n2b.k2 + 1
 ORDER BY k
;
------------------------------------------------------------------------------

K           C1         C2          
----------- ---------- ------------
          1 aaa        11          
          2 b          22222       
          3 cccc       333         

  3 record(s) selected.
Reply With Quote
  #10 (permalink)  
Old 10-23-08, 16:41
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
1) LATERAL is a keyword to enable reference to columns of previous table references in the follwing subquery(nested table expression).
There is no extra functionarities(I think).
It's a same JOIN without LATERAL keyword.


2) Followings are the results of each CTEs and final select.
Code:
------------------------------------------------------------------------------
SELECT * FROM testdata;
------------------------------------------------------------------------------

C1         C2          
---------- ------------
aaa,b,cccc 11,22222,333

  1 record(s) selected.

Code:
------------------------------------------------------------------------------
SELECT * FROM position_of_separators
 ORDER BY n;
------------------------------------------------------------------------------

N           F1 F2
----------- -- --
          0 Y  Y 
          1 -  - 
          2 -  - 
          3 -  Y 
          4 Y  - 
          5 -  - 
          6 Y  - 
          7 -  - 
          8 -  - 
          9 -  Y 
         10 -  - 
         11 Y  - 
         12 -  - 
         13 -  Y 

  14 record(s) selected.

Code:
------------------------------------------------------------------------------
SELECT * FROM numbered_position_of_separators
 ORDER BY n;
------------------------------------------------------------------------------

N           F1 K1          F2 K2         
----------- -- ----------- -- -----------
          0 Y            1 Y            1
          3 -            5 Y            2
          4 Y            2 -            5
          6 Y            3 -            6
          9 -            6 Y            3
         11 Y            4 -            7
         13 -            7 Y            4

  7 record(s) selected.

Code:
------------------------------------------------------------------------------
SELECT n1b.k1                             AS k
     , SUBSTR(c1, n1b.n+1, n1e.n-n1b.n-1) AS c1
     , SMALLINT(n1b.n)                    AS "n1b.n"
     , SMALLINT(n1b.k1)                   AS "n1b.k1"
     , SMALLINT(n1e.n)                    AS "n1e.n"
     , SMALLINT(n1e.k1)                   AS "n1e.k1"
     , SUBSTR(c2, n2b.n+1, n2e.n-n2b.n-1) AS c2
     , SMALLINT(n2b.n)                    AS "n2b.n"
     , SMALLINT(n2b.k2)                   AS "n2b.k2"
     , SMALLINT(n2e.n)                    AS "n2e.n"
     , SMALLINT(n2e.k2)                   AS "n2e.k2"
  FROM testdata
 INNER JOIN
       numbered_position_of_separators n1b
   ON  n1b.f1 IS NOT NULL
 INNER JOIN
       numbered_position_of_separators n1e
   ON  n1e.f1 IS NOT NULL
   AND n1e.k1 = n1b.k1 + 1
 INNER JOIN
       numbered_position_of_separators n2b
   ON  n2b.k2 = n1b.k1
   AND n2b.f2 IS NOT NULL
 INNER JOIN
       numbered_position_of_separators n2e
   ON  n2e.f2 IS NOT NULL
   AND n2e.k2 = n2b.k2 + 1
 ORDER BY k
;
------------------------------------------------------------------------------

K           C1         n1b.n  n1b.k1 n1e.n  n1e.k1 C2           n2b.n  n2b.k2 n2e.n  n2e.k2
----------- ---------- ------ ------ ------ ------ ------------ ------ ------ ------ ------
          1 aaa             0      1      4      2 11                0      1      3      2
          2 b               4      2      6      3 22222             3      2      9      3
          3 cccc            6      3     11      4 333               9      3     13      4

  3 record(s) selected.
Reply With Quote
  #11 (permalink)  
Old 06-04-10, 14:58
praveenbattula praveenbattula is offline
Registered User
 
Join Date: Jun 2010
Posts: 1
Reply With Quote
  #12 (permalink)  
Old 06-06-10, 14:47
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
You are aware that this is completely irrelevant because T-SQL is not the SQL dialect implemented by DB2?
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #13 (permalink)  
Old 06-08-10, 07:43
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Although, I showed a way without using iteration(or recursion),
it would be shorter and easy to understand by using recursion.

Hore is a sample UDF:
(You can specify DEFAULT for the parameters of a function on DB2 for LUW 9.7 fixpack 2.)

Code:
------------------------------ Commands Entered ------------------------------
connect to SAMPLE ;
------------------------------------------------------------------------------

   Database Connection Information

 Database server        = DB2/NT 9.7.2
 SQL authorization ID   = DB2ADMIN
 Local database alias   = SAMPLE


A JDBC connection to the target has succeeded.
------------------------------ Commands Entered ------------------------------
CREATE FUNCTION splitString
( inString    VARCHAR(4000)
, inDelimiter VARCHAR(1)    DEFAULT ','
)
 RETURNS
 TABLE( element VARCHAR(100)
      , seq     INTEGER
      , nbr     INTEGER
      , pos     INTEGER
      )
 LANGUAGE SQL
 READS SQL DATA
 DETERMINISTIC
 NO EXTERNAL ACTION
RETURN
WITH
 find_delemiters(k , pos) AS (
VALUES (0 , 0 )
UNION ALL
SELECT k + 1
     , LOCATE(inDelimiter , inString , pos + 1)
  FROM find_delemiters
 WHERE k < 10000
   AND (k = 0 OR pos > 0)
)
SELECT SUBSTR(inString , pos1 + 1 , pos2 - pos1 - 1)
     , ROW_NUMBER() OVER()
     , k2
     , pos1 + 1
  FROM find_delemiters
         AS s1(k1 , pos1)
     , (SELECT k
             , COALESCE( NULLIF(pos , 0)
                       , LENGTH(inString) + 1
                       )
          FROM find_delemiters
       ) AS s2(k2 , pos2)
 WHERE k2 = k1 + 1
   AND pos2 - pos1 > 1
;
------------------------------------------------------------------------------
DB20000I  The SQL command completed successfully.
An example using the UDF:
Code:
------------------------------ Commands Entered ------------------------------
WITH testdata(k , c1 , c2) AS (
VALUES
  (1 , 'aaa,b,cccc'        , ',,11,22222,,,333,4444,')
, (2 , 'd,eeeee,,fff,gg,h' , '5555555')
)
SELECT k
     , COALESCE(s1.seq , s2.seq) AS seq
     , CAST(s1.element AS VARCHAR(10) ) AS c1_element
     , s1.nbr , s1.pos
     , CAST(s2.element AS VARCHAR(10) ) AS c2_element
     , s2.nbr , s2.pos
  FROM testdata
  LEFT OUTER JOIN
       TABLE( splitString(c1) ) AS s1
  FULL OUTER JOIN
       TABLE( splitString(c2) ) AS s2
   ON  s2.seq = s1.seq
   ON  0=0
 ORDER BY
       k , seq
;
------------------------------------------------------------------------------

K           SEQ         C1_ELEMENT NBR         POS         C2_ELEMENT NBR         POS        
----------- ----------- ---------- ----------- ----------- ---------- ----------- -----------
          1           1 aaa                  1           1 11                   3           3
          1           2 b                    2           5 22222                4           6
          1           3 cccc                 3           7 333                  7          14
          1           4 -                    -           - 4444                 8          18
          2           1 d                    1           1 5555555              1           1
          2           2 eeeee                2           3 -                    -           -
          2           3 fff                  4          10 -                    -           -
          2           4 gg                   5          14 -                    -           -
          2           5 h                    6          17 -                    -           -

  9 record(s) selected.
If you are not using DB2 for LUW 9.7 fixpack 2,
remove DEFAULT clause from inDelimiter in CREATE FUNCTION statement
and specify the delimiter(',') at the time of invoking the UDF.

Last edited by tonkuma; 06-08-10 at 07:48. Reason: Change testdata a little.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On