Page 1 of 2 12 LastLast
Results 1 to 15 of 16
  1. #1
    Join Date
    Nov 2011
    Posts
    31

    Smile Unanswered: Help with long running UPDATE statement

    Hi, I'm hoping to get some advice on how best to optimize an update statement in DB2 LUW 9.5

    We have a table that holds approximately 1.5 billion rows. An ETL process I've just started to look at is broken up into 3 steps:

    Step 1 inserts rows into a table. This table has a surrogate primary key (rec_seq_nbr) and a composite index consisting of 8 columns. These columns are the Natural business key. When the rows are inserted into the table a column (curr_ind) takes on a value of 'X'. This steps itself run relatively quick.

    Step 2 is where I require some advice. It takes 11+ hours to complete. This step consists an Update statement that looks like this:

    update t1 set curr_ind = '0'
    where rec_seq_nbr in
    (select b.rec_seq_nbr
    from (select rec_seq_nbr, col1,col2,col3,col4,col5,col6,col7,col8 from T1
    where curr_ind = 'X') as A, T1 AS B
    where a.col1 = b.col1
    and a.col2 = b.col2
    and a.col3 = b.col3
    and a.col4 = b.col4
    and a.col5 = b.col5
    and a.col6 = b.col6
    and a.col7 = b.col7
    and a.col8 = b.col8
    and b.rec_seq_nbr < a.rec_seq_nbr)

    (as you deduced there can be duplicates and they want to set the first duplicate with the smaller REC_SEQ_NBR to '0')

    Step 3 takes one more pass and updates all rows where curr_ind = 'X' and sets it to '1'.

    My question, step 2's Explain plan is ugly as at least 3 tablescans are performed. I've thought about a 'GROUP BY' with a MIN function but the 1.5 billion rows frightens me. Also, why couldn't I perform steps 2 & 3 in one sql job?

    Additional info, CURR_IND doesn't belong to a Index. As this column can have 4 states I was pondering about making it a BITMAP index.

    Greatly appreciate any help.....

  2. #2
    Join Date
    May 2010
    Location
    India
    Posts
    84
    Provided Answers: 2
    If you can modify ETL process, it might be better to update existing records to '0' using the composite key as step 1 and insert new records with value '1' as step 2.

    If you can not change ETL process, see whether following modification to sql helps your process.

    update t1 B set curr_ind = '0'
    where exists
    (select 1
    from T1 as A
    where a.col1 = b.col1
    and a.col2 = b.col2
    and a.col3 = b.col3
    and a.col4 = b.col4
    and a.col5 = b.col5
    and a.col6 = b.col6
    and a.col7 = b.col7
    and a.col8 = b.col8
    and a.curr_ind = 'X'
    and b.rec_seq_nbr < a.rec_seq_nbr)
    and b.curr_ind = '1'


    ....Satya

  3. #3
    Join Date
    Nov 2011
    Posts
    31

    Help with long running UPDATE statment

    Thanks Satya, the Exists makes sense, but I'm not sure what the last statement "and b.curr_ind = '1'" is for?

  4. #4
    Join Date
    May 2010
    Location
    India
    Posts
    84
    Provided Answers: 2
    From your first post, it looks like for same composite key, you might have one record with curr_ind '1' and other older records with curr_ind '0'. By adding the condition for curr_ind ='1', you will update only one record.


    Satya...

  5. #5
    Join Date
    Nov 2011
    Posts
    31

    Help with long running Update statement

    Hi again, after the first pass I could possible have duplicate rows with the same natural key (composite index) but the PK will be different. I want the PK with the lower value (but same business key) to take on a value of '0' for CURR_IND. This implies it's not current. The next pass through will set CURR_IND to '1' (most current) for all remaining 'X'.

  6. #6
    Join Date
    May 2010
    Location
    India
    Posts
    84
    Provided Answers: 2
    Ok, In that case, you might want to change it to b.curr_ind <> '0' to avoid updating older records which are already set '0'.

    update t1 B set curr_ind = '0'
    where exists
    (select 1
    from T1 as A
    where a.col1 = b.col1
    and a.col2 = b.col2
    and a.col3 = b.col3
    and a.col4 = b.col4
    and a.col5 = b.col5
    and a.col6 = b.col6
    and a.col7 = b.col7
    and a.col8 = b.col8
    and a.curr_ind = 'X'
    and b.rec_seq_nbr < a.rec_seq_nbr)
    and b.curr_ind <> '0'

  7. #7
    Join Date
    Nov 2011
    Posts
    334
    Please post the access plan for sql of step 2 here(db2exfmt output)。
    thx

  8. #8
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    1) Step 2 and step 3 may be conbined into a MERGE statement.

    2) I thought the following indexes might be usefull.
    (col1 , col2 , col3 , col4 , col5 , col6 , col7 , col8 , rec_seq_nbr DESC, curr_ind)
    or
    (curr_ind , col1 , col2 , col3 , col4 , col5 , col6 , col7 , col8 , rec_seq_nbr DESC)

    But, I didn't check performance by those ideas.


    Example 1: MERGE statement for Step 2 and Step 3.
    Code:
    MERGE INTO
    (SELECT curr_ind
          , ROW_NUMBER()
              OVER(PARTITION BY col1 , col2 , col3 , col4
                              , col5 , col6 , col7 , col8
                       ORDER BY rec_seq_nbr DESC
                  ) AS rnum
      FROM  t1
      WHERE curr_ind = 'X'
    ) t1
    USING (VALUES 0) dummy(d)
      ON  d = d
    WHEN MATCHED
     AND rnum = 1 THEN
    UPDATE
       SET curr_ind = '1'
    WHEN MATCHED  THEN
    UPDATE
       SET curr_ind = '0'
    ;
    Last edited by tonkuma; 11-29-11 at 06:41. Reason: Change sample code made by my misunderstanding of requirements.

  9. #9
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    The UPDATE statement may be better than MERGE in Example 1.

    Example 2:
    Code:
    UPDATE
    (SELECT *
      FROM  (SELECT curr_ind
                  , ROW_NUMBER()
                       OVER(PARTITION BY col1 , col2 , col3 , col4
                                       , col5 , col6 , col7 , col8
                                ORDER BY rec_seq_nbr DESC
                           ) AS rnum
              FROM  t1
            )
      WHERE curr_ind = 'X'
    )
    SET curr_ind
      = CASE rnum
        WHEN 1 THEN '1'
        ELSE        '0'
        END
    ;
    Last edited by tonkuma; 11-29-11 at 06:41. Reason: Change sample code made by my misunderstanding of requirements.

  10. #10
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Removed.
    Perhaps, my poor English capability made me misunderstanding of requirements.
    Last edited by tonkuma; 11-29-11 at 06:38.

  11. #11
    Join Date
    Nov 2011
    Posts
    31

    Help with long running UPDATE statement

    Actually your 'English' is perfect. I will test out both suggestions. thanks

  12. #12
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    This seems better performance with index
    (col1 , col2 , col3 , col4 , col5 , col6 , col7 , col8 , curr_ind , rec_seq_nbr DESC)

    Example 3: for Step 2 and Step 3.
    Code:
    UPDATE t1 t
       SET curr_ind
         = COALESCE(
              (SELECT LEFT( '0' , SIGN(MAX(s.rec_seq_nbr)) )
                FROM  t1 s
                WHERE s.col1        = t.col1
                  AND s.col2        = t.col2
                  AND s.col3        = t.col3
                  AND s.col4        = t.col4
                  AND s.col5        = t.col5
                  AND s.col6        = t.col6
                  AND s.col7        = t.col7
                  AND s.col8        = t.col8
                  AND s.rec_seq_nbr > t.rec_seq_nbr
                  AND s.curr_ind    = 'X'
              )
            , '1'
           )
     WHERE curr_ind    = 'X'
    ;
    Last edited by tonkuma; 11-29-11 at 11:09. Reason: Revise code(remove qualifier of t1, add conditions for co5 to col8)

  13. #13
    Join Date
    Nov 2011
    Posts
    31

    Help with long running UPDATE statement

    hi, not sure if an aggreagate fuction is allowed in the SET clause of an UPDATE statement

    Getting the following message:

    SQL0120N
    Invalid use of an aggregate function or OLAP function.

    Explanation
    An aggregate function or OLAP function can only be used in the select list of a fullselect, the having clause, or, with restrictions, in a WHERE clause or GROUP BY clause.

    A WHERE clause can contain an aggregate function or OLAP function only if that clause appears within a subquery of a HAVING clause and the argument of the function is a correlated reference to a group.

    A GROUP BY clause can contain an aggregate function or OLAP function only if the argument of the function is a correlated reference to a column in a different subselect than the one containing the GROUP BY clause.

    An OLAP function cannot be used within the argument list of an XMLQUERY or XMLEXISTS expression.

    The statement cannot be processed.

  14. #14
    Join Date
    Nov 2011
    Posts
    31

    Help with long running UPDATE statement

    My last post was incorrect, my aplogies. I get the following error when I attempt to Explain the SQL

    No authorized routine named 'LEFT' of type 'Function' having compaitble arguments was found. SQLcode = -440, SQLSTATE = 42884

  15. #15
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    It was successful on my DB2 Express-C 9.7.5 for Windows,
    by using small test data(and simplified DDL).

    Did you used exactly same expression?
    (SELECT LEFT( '0' , SIGN(MAX(s.rec_seq_nbr)) )

    If error was still returned, please try
    (SELECT MAX('0')

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •