Results 1 to 13 of 13
  1. #1
    Join Date
    Oct 2008
    Posts
    23

    Unanswered: Table UDF/Stored Procedure - output any data as returned rows

    I'm not sure if my subject was clear but here's the issue I'm having.

    I'm working on a project where I take some data that is on a single row and parse it out onto separate rows using recursion.

    My colleagues are concerned that the recursion is what is making it slow which might be the case.

    The reason I Went with recursion is that I know that the union all is going to allow the rows to union together into nice, generic row output. All that is happening is basically a bunch of unions.

    I don't know how to do this without recursion but I was told that it is likely doable with a table UDF written in C or some other language.

    Before I pursue figuring it out, I figured I'd ask if anyone knows if this is true or if there's a better way.

    Basically, lets say I want to take a row with 2 columns:

    c1 | c2
    -----------
    123 | abc

    And I want to turn it into:

    c1 | c2
    ------------
    1 | a
    2 | b
    3 | c

    With recursion, this is very simple. The question is whether or not this is doable without recursion in such a way where the data is returned as regular db2 rows. By regular rows, I mean that I could say:

    select * from my.table a where (a.key, a.val) in
    (
    select c1, c2 from table(my.udf()) b
    )

    I just have no idea how to do this without unions. I know I could prepare a giant SQL statement but that would not work with the vast amounts of data I'd be playing with.

    Any help is appreciated.

    Thanks in advance!

  2. #2
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    You can write a table function that does the parsing and returns 1 row on each FETCH call. You can then call the function like this:
    Code:
    SELECT *
    FROM TABLE ( youTableUDF(inputString) ) AS t
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  3. #3
    Join Date
    Sep 2008
    Posts
    6
    I looked at the "Fetch" function and I'm not sure how it would work. I'll have to look at it more.

    So what you're saying is that every time I fetch some values, I can somehow output it. For instance, in my example:

    Basically, lets say I want to take a row with 2 columns:

    c1 | c2
    -----------
    123 | abc

    And I want to turn it into:

    c1 | c2
    ------------
    1 | a
    2 | b
    3 | c

    (below, column separator is | )
    For the row 123 | abc, I could take the values 1 | a and then return them as a row, then 2 | b and return them as a row, and 3 | c and return them as a row, all from the same call? I assume that's what you're saying. I'll have to spend more time on it in the morning.

    But would that be considered more efficient than having a single recursive query that outputs everything without literally millions of fetches?

  4. #4
    Join Date
    Sep 2008
    Posts
    6
    I forgot to mention that when I looked at fetch, it seemed that you fetch into host variables. Is there something in fetch that makes it just output these variables into a row? That's the part I'm having trouble with. I understand I can take the values into the fetch, but I'm not sure how to output the variables as a row.

    I'm reading but its a lot of documentation to sift thru.

  5. #5
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    The CTE col_to_row in the following example would give you some idea.
    Code:
    ------------------------------ Commands Entered ------------------------------
    WITH
     table(key, val) AS (
    VALUES
     ('1', 'a')
    ,('2', 'a')
    ,('3', 'c')
    ,('4', 'b')
    )
    ,data(c1, c2) AS (
    VALUES ('123', 'abc')
    )
    ,col_to_row(c1, c2) AS (
    SELECT SUBSTR(c1, n, 1)
         , SUBSTR(c2, n, 1)
      FROM data
         , LATERAL
           (SELECT 1 + n1 + n2*10 + n3*100
                /* or
                   INTEGER( ROW_NUMBER() OVER() )
                */
              FROM (VALUES 0,1,2,3,4,5,6,7,8,9) N1(n1)
             INNER JOIN
                   (VALUES 0,1,2,3,4,5,6,7,8,9) N2(n2)
               ON  LENGTH(c1) / 10 >= n2
             INNER JOIN
                   (VALUES 0,1,2,3,4,5,6,7,8,9) N3(n3)
               ON  LENGTH(c1) / 100 >= n3
             WHERE n1 + n2*10 + n3*100 < LENGTH(c1)
           ) N(n)
    )
    SELECT *
      FROM table
     WHERE (key, val)
           IN (
           SELECT c1, c2
             FROM col_to_row
           )
    ;
    ------------------------------------------------------------------------------
    
    KEY VAL
    --- ---
    1   a  
    3   c  
    
      2 record(s) selected.

  6. #6
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    I was talking about the FETCH call made to a table UDF - not the FETCH SQL statement - and there is no FETCH function at all in DB2.

    Table UDFs work like a cursor/iterator:
    - first you get a single OPEN call. During that, you do all necessary preparations and precomputations on the input parameters
    - then come 1 ore more FETCH calls. On each FETCH call, you return the next row - or SQLSTATE 02000 to indicate you are done
    - finally, DB2 makes a CLOSE call to the UDF so that you can clean up whatever needs to be cleaned up

    Additionally, you can use a SCRATCHPAD (for C/C++ UDFs) to store temporary information like the position in the string that you have processed last. For Java UDFs, you can put such information into class members. The class members or SCRATCHPAD information is carried from one call to the UDF to the next.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  7. #7
    Join Date
    Oct 2008
    Posts
    23
    Stolze:

    This is the fetch that I Was looking at in DB2. Is it because I called it a function instead of a statement?

    Forgive me but I'm still not sure what you're talking about.

    I currently have a table udf with a recursive function in it that does all the work. I want to get rid of the recursion that does the work. What I need to do is to have my UDF output my computed rows 1 by 1 in the UDF. I don't know what command, statement, or function I need to do that or if it is possible.

    I know that I can put a query into a table function, parameterize it using the table function parameters, and then output the query's output. What needs to happen, however, is I select on a table, perform a while loop on each row to split the data until the row runs out (kinda like my example), and then output each split. Then move onto the next row in the primary select and do the same.

    Is that possible with a PL SQL UDF table function? Or do I need to do it in another language? Do I need a stored procedure instead?


    ----------------------------------------


    Tonkuma, I saw a similar thing you commented on on the IBM developer forums last night. Thanks for the thought.

    I have to learn more about the "lateral" function and whether or not it will work with stuff. What makes me think that it might not is that I'm not just pivoting columns, but I'm splitting individual columns and pulling their pieces out which makes me think this has to be more iterative.

    If anything, I'll have learned a new function that I've never seen :-) .

  8. #8
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    I was talking about an external UDF, i.e. one implemented in C/C++ or Java.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  9. #9
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    ..., but I'm splitting individual columns and pulling their pieces out which makes me think this has to be more iterative.
    Which way to split columns?
    If each pieces in columns are separated by a separater character(comma, etc.), it must be possible without using iterative ways.

    Here is an example:
    Although this is very long and troublesome to read, (I think) the logics of each CTEs are simple.

    Code:
    ------------------------------ Commands Entered ------------------------------
    WITH testdata(c1, c2) AS (
    VALUES ('aaa,b,cccc', '11,22222,333')
    )
    ,position_of_separators(n, f1, f2) AS (
    SELECT n
         , CASE
           WHEN n BETWEEN 1 AND LENGTH(c1) THEN
                CASE 
                WHEN SUBSTR(c1, n, 1) = ',' THEN
                     'Y'
                ELSE NULL
                END
           WHEN n IN (0, LENGTH(c1) + 1) THEN
                'Y'
           ELSE NULL
           END
         , CASE
           WHEN n BETWEEN 1 AND LENGTH(c2) THEN
                CASE 
                WHEN SUBSTR(c2, n, 1) = ',' THEN
                     'Y'
                ELSE NULL
                END
           WHEN n IN (0, LENGTH(c2) + 1) THEN
                'Y'
           ELSE NULL
           END
      FROM testdata
     INNER JOIN
           LATERAL
           (SELECT n1 + n2*10 + n3*100
              FROM (VALUES 0,1,2,3,4,5,6,7,8,9) N1(n1)
             INNER JOIN
                   (VALUES 0,1,2,3,4,5,6,7,8,9) N2(n2)
               ON  (LENGTH(c1) + 1) / 10 >= n2
               OR  (LENGTH(c2) + 1) / 10 >= n2
             INNER JOIN
                   (VALUES 0,1,2,3,4,5,6,7,8,9) N3(n3)
               ON  (LENGTH(c1) + 1) / 100 >= n3
               OR  (LENGTH(c2) + 1) / 100 >= n3
           ) N(n)
       ON  n <= LENGTH(c1) + 1
       OR  n <= LENGTH(c2) + 1
    )
    ,numbered_position_of_separators(n, f1, k1, f2, k2) AS (
    SELECT n
         , f1
         , INTEGER( ROW_NUMBER() OVER(ORDER BY f1 NULLS LAST, n) )
         , f2
         , INTEGER( ROW_NUMBER() OVER(ORDER BY f2 NULLS LAST, n) )
      FROM position_of_separators
     WHERE f1 IS NOT NULL
       OR  f2 IS NOT NULL
    )
    SELECT n1b.k1                             AS k
         , SUBSTR(c1, n1b.n+1, n1e.n-n1b.n-1) AS c1
         , SUBSTR(c2, n2b.n+1, n2e.n-n2b.n-1) AS c2
      FROM testdata
     INNER JOIN
           numbered_position_of_separators n1b
       ON  n1b.f1 IS NOT NULL
     INNER JOIN
           numbered_position_of_separators n1e
       ON  n1e.f1 IS NOT NULL
       AND n1e.k1 = n1b.k1 + 1
     INNER JOIN
           numbered_position_of_separators n2b
       ON  n2b.k2 = n1b.k1
       AND n2b.f2 IS NOT NULL
     INNER JOIN
           numbered_position_of_separators n2e
       ON  n2e.f2 IS NOT NULL
       AND n2e.k2 = n2b.k2 + 1
     ORDER BY k
    ;
    ------------------------------------------------------------------------------
    
    K           C1         C2          
    ----------- ---------- ------------
              1 aaa        11          
              2 b          22222       
              3 cccc       333         
    
      3 record(s) selected.

  10. #10
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    1) LATERAL is a keyword to enable reference to columns of previous table references in the follwing subquery(nested table expression).
    There is no extra functionarities(I think).
    It's a same JOIN without LATERAL keyword.


    2) Followings are the results of each CTEs and final select.
    Code:
    ------------------------------------------------------------------------------
    SELECT * FROM testdata;
    ------------------------------------------------------------------------------
    
    C1         C2          
    ---------- ------------
    aaa,b,cccc 11,22222,333
    
      1 record(s) selected.

    Code:
    ------------------------------------------------------------------------------
    SELECT * FROM position_of_separators
     ORDER BY n;
    ------------------------------------------------------------------------------
    
    N           F1 F2
    ----------- -- --
              0 Y  Y 
              1 -  - 
              2 -  - 
              3 -  Y 
              4 Y  - 
              5 -  - 
              6 Y  - 
              7 -  - 
              8 -  - 
              9 -  Y 
             10 -  - 
             11 Y  - 
             12 -  - 
             13 -  Y 
    
      14 record(s) selected.

    Code:
    ------------------------------------------------------------------------------
    SELECT * FROM numbered_position_of_separators
     ORDER BY n;
    ------------------------------------------------------------------------------
    
    N           F1 K1          F2 K2         
    ----------- -- ----------- -- -----------
              0 Y            1 Y            1
              3 -            5 Y            2
              4 Y            2 -            5
              6 Y            3 -            6
              9 -            6 Y            3
             11 Y            4 -            7
             13 -            7 Y            4
    
      7 record(s) selected.

    Code:
    ------------------------------------------------------------------------------
    SELECT n1b.k1                             AS k
         , SUBSTR(c1, n1b.n+1, n1e.n-n1b.n-1) AS c1
         , SMALLINT(n1b.n)                    AS "n1b.n"
         , SMALLINT(n1b.k1)                   AS "n1b.k1"
         , SMALLINT(n1e.n)                    AS "n1e.n"
         , SMALLINT(n1e.k1)                   AS "n1e.k1"
         , SUBSTR(c2, n2b.n+1, n2e.n-n2b.n-1) AS c2
         , SMALLINT(n2b.n)                    AS "n2b.n"
         , SMALLINT(n2b.k2)                   AS "n2b.k2"
         , SMALLINT(n2e.n)                    AS "n2e.n"
         , SMALLINT(n2e.k2)                   AS "n2e.k2"
      FROM testdata
     INNER JOIN
           numbered_position_of_separators n1b
       ON  n1b.f1 IS NOT NULL
     INNER JOIN
           numbered_position_of_separators n1e
       ON  n1e.f1 IS NOT NULL
       AND n1e.k1 = n1b.k1 + 1
     INNER JOIN
           numbered_position_of_separators n2b
       ON  n2b.k2 = n1b.k1
       AND n2b.f2 IS NOT NULL
     INNER JOIN
           numbered_position_of_separators n2e
       ON  n2e.f2 IS NOT NULL
       AND n2e.k2 = n2b.k2 + 1
     ORDER BY k
    ;
    ------------------------------------------------------------------------------
    
    K           C1         n1b.n  n1b.k1 n1e.n  n1e.k1 C2           n2b.n  n2b.k2 n2e.n  n2e.k2
    ----------- ---------- ------ ------ ------ ------ ------------ ------ ------ ------ ------
              1 aaa             0      1      4      2 11                0      1      3      2
              2 b               4      2      6      3 22222             3      2      9      3
              3 cccc            6      3     11      4 333               9      3     13      4
    
      3 record(s) selected.

  11. #11
    Join Date
    Jun 2010
    Posts
    1

  12. #12
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    You are aware that this is completely irrelevant because T-SQL is not the SQL dialect implemented by DB2?
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  13. #13
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Although, I showed a way without using iteration(or recursion),
    it would be shorter and easy to understand by using recursion.

    Hore is a sample UDF:
    (You can specify DEFAULT for the parameters of a function on DB2 for LUW 9.7 fixpack 2.)

    Code:
    ------------------------------ Commands Entered ------------------------------
    connect to SAMPLE ;
    ------------------------------------------------------------------------------
    
       Database Connection Information
    
     Database server        = DB2/NT 9.7.2
     SQL authorization ID   = DB2ADMIN
     Local database alias   = SAMPLE
    
    
    A JDBC connection to the target has succeeded.
    ------------------------------ Commands Entered ------------------------------
    CREATE FUNCTION splitString
    ( inString    VARCHAR(4000)
    , inDelimiter VARCHAR(1)    DEFAULT ','
    )
     RETURNS
     TABLE( element VARCHAR(100)
          , seq     INTEGER
          , nbr     INTEGER
          , pos     INTEGER
          )
     LANGUAGE SQL
     READS SQL DATA
     DETERMINISTIC
     NO EXTERNAL ACTION
    RETURN
    WITH
     find_delemiters(k , pos) AS (
    VALUES (0 , 0 )
    UNION ALL
    SELECT k + 1
         , LOCATE(inDelimiter , inString , pos + 1)
      FROM find_delemiters
     WHERE k < 10000
       AND (k = 0 OR pos > 0)
    )
    SELECT SUBSTR(inString , pos1 + 1 , pos2 - pos1 - 1)
         , ROW_NUMBER() OVER()
         , k2
         , pos1 + 1
      FROM find_delemiters
             AS s1(k1 , pos1)
         , (SELECT k
                 , COALESCE( NULLIF(pos , 0)
                           , LENGTH(inString) + 1
                           )
              FROM find_delemiters
           ) AS s2(k2 , pos2)
     WHERE k2 = k1 + 1
       AND pos2 - pos1 > 1
    ;
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    An example using the UDF:
    Code:
    ------------------------------ Commands Entered ------------------------------
    WITH testdata(k , c1 , c2) AS (
    VALUES
      (1 , 'aaa,b,cccc'        , ',,11,22222,,,333,4444,')
    , (2 , 'd,eeeee,,fff,gg,h' , '5555555')
    )
    SELECT k
         , COALESCE(s1.seq , s2.seq) AS seq
         , CAST(s1.element AS VARCHAR(10) ) AS c1_element
         , s1.nbr , s1.pos
         , CAST(s2.element AS VARCHAR(10) ) AS c2_element
         , s2.nbr , s2.pos
      FROM testdata
      LEFT OUTER JOIN
           TABLE( splitString(c1) ) AS s1
      FULL OUTER JOIN
           TABLE( splitString(c2) ) AS s2
       ON  s2.seq = s1.seq
       ON  0=0
     ORDER BY
           k , seq
    ;
    ------------------------------------------------------------------------------
    
    K           SEQ         C1_ELEMENT NBR         POS         C2_ELEMENT NBR         POS        
    ----------- ----------- ---------- ----------- ----------- ---------- ----------- -----------
              1           1 aaa                  1           1 11                   3           3
              1           2 b                    2           5 22222                4           6
              1           3 cccc                 3           7 333                  7          14
              1           4 -                    -           - 4444                 8          18
              2           1 d                    1           1 5555555              1           1
              2           2 eeeee                2           3 -                    -           -
              2           3 fff                  4          10 -                    -           -
              2           4 gg                   5          14 -                    -           -
              2           5 h                    6          17 -                    -           -
    
      9 record(s) selected.
    If you are not using DB2 for LUW 9.7 fixpack 2,
    remove DEFAULT clause from inDelimiter in CREATE FUNCTION statement
    and specify the delimiter(',') at the time of invoking the UDF.
    Last edited by tonkuma; 06-08-10 at 08:48. Reason: Change testdata a little.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •