Page 1 of 2 12 LastLast
Results 1 to 15 of 24
  1. #1
    Join Date
    Nov 2011
    Posts
    31

    Unanswered: Help in Optimizing long running query

    Table 1 has 3 columns (REC_SEQ_NBR, BUS_KEY, CURR_IND)

    Last 2 columns of table below represents value of CURR_IND after pass #1 and pass #2

    1 A 1 1 1
    2 B 1 0 0
    3 C 1 0 0
    4 D 1 1 1
    --- --- -- --- --
    1600000000 G 1 1 1
    1600000001 C X 0 0
    1600000002 C X X 1
    1600000003 B X X 1
    1600000004 E X X 1
    1600000005 F X 0 0
    1600000006 F X X 1



    Background

    1.Table contains approximately 2 billion rows

    2. Process 1 inserts approximately 3 million rows with a value of ‘X’ for column CURR_IND. In table above this would be represented by PK REC_SEQ_NBR rows 1600000001 thru 1600000006. This executes fast and isn’t a bottleneck.

    3. Table is partitioned on a date column (1 partition/month). Clustering key is this same column

    4. Non PK index is on BUS_KEY column only.

    5. CURR_IND is not indexed. It has a possibility of 3 values (‘0’, ‘1’, ‘X’). After process 1 completes, the cardinality is as follows (‘0’ 300,000 rows, ‘1’ 1.6 billion rows, ‘X’ 3.5 million rows).

    6. It is the rows with a CURR_IND = ‘X’ that I’m primarily concerned with

    7. While the business key consists of 8 columns, for simplicity I have only included one (i.e. BUS_KEY).

    8. Note from above table we have duplicate rows having the same business key (BUS_KEY column).

    9. 1st pass SQL that reads every row and takes 6+ hours to complete.

    update t1 set curr_ind = '0'
    where rec_seq_nbr in
    (select b.rec_seq_nbr
    from (select rec_seq_nbr from t1 where curr_ind = 'X') as a, t1 as b
    where a.bus_key = b.bus_key
    and b.rec_seq_nbr < a.rec_seq_nbr)

    10. 2nd pass (which is very fast) simply scans the entire table and changes all rows that have a CURR_IND = ‘X’ to ‘1’


    Challenge
    1. How can I avoid reading every row as part of the UPDATE statement? I’m only interested in rows that have a CURR_IND = ‘X’ and related BUS_KEY rows (see REC_SEQ_NBR 2 & 3 in table above). Essentially out of a 1.6 billion row table I’ll only ever update < 2% of these rows. It seems like such a waste of processing power to read the entire table.

    2. Also, how could I accomplish pass #1 and pass #2 in a single pass?

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    The question seems similar to the thread
    http://www.dbforums.com/db2/1672196-...statement.html

    If so, PLEASE try the idea in the thread, like...
    (Combined some of my posts and modified.)
    1) Create an unique index (CURR_IND , BUS_KEY , REC_SEQ_NBR DESC)

    2) Update statement to accomplish pass #1 and pass #2.
    Example 1:
    Code:
    UPDATE t1 t
       SET curr_ind
         = COALESCE(
              (SELECT MAX( '0' )
                FROM  t1 s
                WHERE s.bus_key     = t.bus_key
                  AND s.rec_seq_nbr > t.rec_seq_nbr
                  AND s.curr_ind    = 'X'
              )
            , '1'
           )
     WHERE curr_ind    = 'X'
    ;
    Last edited by tonkuma; 12-10-11 at 15:33. Reason: Add unique for a index.

  3. #3
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    The Example 2 was showed better performance than Example 1 on my simplified test with 100000 row.

    Example 2: unique index (CURR_IND , BUS_KEY , REC_SEQ_NBR DESC) was used.
    Code:
    UPDATE
    (SELECT *
      FROM  (SELECT curr_ind
                  , ROW_NUMBER()
                       OVER(PARTITION BY bus_key
                                ORDER BY rec_seq_nbr
                           ) AS rnum
              FROM  t1 t
              WHERE curr_ind = 'X'
            )
    )
    SET curr_ind
        = CASE rnum
          WHEN 1 THEN '0'
          ELSE        '1'
          END
    ;

  4. #4
    Join Date
    Nov 2011
    Posts
    31

    help in optimizing long running query

    Thanks for your help!! Sorry for the confusion I've caused but I don't see how your first example will update rows other than CURR_IND = 'X'.
    Please look at output column which represents value of CURR_IND after 1st pass.

    Note in the sample table output that I may also have to update related rows that have the same business key.

    For example, REC_SEQ_NBR 1600000003 has a BUS_KEY of 'B' and a CURR_IND of 'X'. Also note that REC_SEQ_NBR 2 also has the same BUS_KEY of 'B'
    where it's CURR_IND is set to '1',meaning most current. So when processing REC_SEQ_NBR 1600000003 I also need to reset the CURR_IND of
    REC_SEQ_NBR 2 from '1' to '0'. This implies not current.

    My aplogies for the confusion. In essence the driving force is rows that have a CURR_IND = 'X' but I also need to reset the previous related BUS_KEY(s)
    from ('1' or 'X') to '0'.

  5. #5
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    1.
    Table 1 has 3 columns (REC_SEQ_NBR, BUS_KEY, CURR_IND)

    Last 2 columns of table below represents value of CURR_IND after pass #1 and pass #2

    1 A 1 1 1
    2 B 1 0 0
    3 C 1 0 0
    4 D 1 1 1
    --- --- -- --- --
    1600000000 G 1 1 1
    1600000001 C X 0 0
    1600000002 C X X 1
    1600000003 B X X 1
    1600000004 E X X 1
    1600000005 F X 0 0
    1600000006 F X X 1
    1-1) Your example has contradictions.
    "has 3 columns" vs "2 B 1 0 0" vs "1600000000 G 1 1 1"

    1-2) I couldn't find the reason of marked bold.


    2.
    You wrote in first(original) post
    I’m only interested in rows that have a CURR_IND = ‘X’ and related BUS_KEY rows (see REC_SEQ_NBR 2 & 3 in table above). Essentially out of a 1.6 billion row table I’ll only ever update < 2% of these rows. It seems like such a waste of processing power to read the entire table.
    and wrote in last post
    In essence the driving force is rows that have a CURR_IND = 'X' but I also need to reset the previous related BUS_KEY(s)
    from ('1' or 'X') to '0'.
    I thought that both are completely different.
    "only interested in rows that have a CURR_IND = ‘X’"
    vs
    "also need to reset the previous related BUS_KEY(s)
    from ('1' or 'X') to '0'"


    3.
    but I don't see how your first example will update rows other than CURR_IND = 'X'.
    Anyhow, PLEASE TRY the examples(Example 1 and Example 2) with small test data,
    if you couldn't understand them fully.
    And see the result wheather it worked without error message, wheather the gotton result meet your required result or not.

    If the result was not your expected result.
    Please write
    3-1) your test data
    3-2) executed statements
    Because you might changed table name or column name(for rxample bus_key).
    3-3) If you got error message, copy and paste the error message.
    3-4) expected result
    3-5) actually gotton result
    Last edited by tonkuma; 12-10-11 at 22:29. Reason: Change 1-1)

  6. #6
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    If you supplied test data and expected final result(I don't need intemediate results),
    it would help greatly to understand your requirements
    and test my examples by myself before show in this forum.

    The test data should include at least...
    (bus_key group with one, two, three or more rows)
    *
    (bus_key group having only curr_ind = 'X', only curr_ind = '0', only curr_ind = '1', two combination of current_ind values, all possible values of current_ind)

    = 3group(one row in a group) * 1 + 6group(two rows in a group) * 2 + 7group(three rows in a group) * 3 = 36 rows.
    If some combinations are not present in the table, write explicitly the fact.
    Last edited by tonkuma; 12-10-11 at 22:27. Reason: Change calculation of number of rows in a test data.

  7. #7
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Here, I prepared test data(assumed just before applying required update).

    Would you please copy and modify the result of the select statement to meet required final result.
    (I'm not interested in intermediate result.)

    If some combination(or sequence) of curr_ind in a bus_key group should not exists in your table,
    please comment out the rows corresponding to the bus_key.
    Code:
    ------------------------------ Commands Entered ------------------------------
    CREATE TABLE chippib.table1
    ( rec_seq_nbr INTEGER  NOT NULL PRIMARY KEY
    , bus_key     CHAR( 8) NOT NULL
    , curr_ind    CHAR( 1) NOT NULL
                  CHECK( curr_ind IN ('0' , '1' , 'X') )
    );
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    Code:
    ------------------------------ Commands Entered ------------------------------
    SELECT * FROM chippib.table1;
    ------------------------------------------------------------------------------
    
    REC_SEQ_NBR BUS_KEY  CURR_IND
    ----------- -------- --------
              1 A        0       
              2 B        1       
              3 C        X       
              4 D        0       
              5 D        0       
              6 E        1       
              7 E        1       
              8 F        X       
              9 F        X       
             10 G        1       
             11 G        X       
             12 H        0       
             13 H        X       
             14 I        0       
             15 I        1       
             16 J        0       
             17 J        0       
             18 J        0       
             19 K        1       
             20 K        1       
             21 K        1       
             22 L        X       
             23 L        X       
             24 L        X       
             25 M        1       
             26 M        1       
             27 M        X       
             28 N        0       
             29 N        0       
             30 N        X       
             31 O        0       
             32 O        0       
             33 O        1       
             34 P        0       
             35 P        1       
             36 P        X       
             37 Q        X       
             38 Q        1       
             39 Q        1       
             40 R        X       
             41 R        0       
             42 R        0       
             43 S        1       
             44 S        0       
             45 S        0       
             46 T        1       
             47 T        X       
             48 T        1       
             49 U        0       
             50 U        X       
             51 U        0       
             52 V        0       
             53 V        1       
             54 V        0       
             55 W        1       
             56 W        X       
             57 W        X       
             58 X        0       
             59 X        X       
             60 X        X       
             61 Y        X       
             62 Y        X       
             63 Y        1       
             64 Z        X       
             65 Z        X       
             66 Z        0       
    
      66 record(s) selected.
    Last edited by tonkuma; 12-11-11 at 03:00. Reason: Add test data(rec_seq_nbr = 55 to 66, bus_key = 'W' to 'Z')

  8. #8
    Join Date
    Nov 2011
    Posts
    334

    hi

    Plz post the exactly access plan for your 1st update here。
    I guess there may be two point in the SQL we must concern
    first , (select rec_seq_nbr from t1 where curr_ind = 'X') as a ,
    if there is no index key on curr_ind , DB2 must use entire table scan to build table a。SO you can create a index on column ( curr_ind ) but ,if the the cluster ratio of this index is low enough, db2 will still use table scan in stead of using this index ), or you can build a temorary tables to hold the inserted rows manually (it will be better ) . fisrt you insert your rows into both temorary table and table t1 then update t1 use tempary table ........

    second (I dont know which method DB2 chose to join table a and b while you
    did not put access plan here。) If it is NL join,maybe a index on column ( bus_key, rec_seq_nbr) will give some help for your update statement .....

    also your can run db2advis on your update sql to see whether there are some helpful indexes......

  9. #9
    Join Date
    Nov 2011
    Posts
    31

    Help in Optimizing long running query

    Thank you for your interest. The cardinality on CURR_IND is as follows:

    0 (not current)- 300,000
    1 (current) - 1.8 billion
    X (rows to be processed along with related business keys) - 3 million



    Note, I've updated curr_ind on some of your test data.
    #1 It is not possible to have rows that have a identical business key
    where all similar business key rows are considered current '1'. It is however possible to have
    non unique values of '0' and 'X' values (CURR_IND) for the same business key.
    #2 Rows where CURR_IND = 'X' will always have a REC_SEQ_NBR greater than similar rows having same business key.

    Test data before script executes

    SELECT * FROM chippib.table1;
    -------------------------------

    REC_SEQ_NBR BUS_KEY CURR_IND
    ----------- -------- --------
    1 A 0
    2 B 1
    3 C X
    4 D 0
    5 D 0
    6 E 0
    7 E 1
    8 F X
    9 F X
    10 G 1
    11 G X
    12 H 0
    13 H X
    14 I 0
    15 I 1
    16 J 0
    17 J 0
    18 J 0
    19 K 0
    20 K 1
    21 K X
    22 L X
    23 L X
    24 L X
    25 M 0
    26 M 1
    27 M X
    28 N 0
    29 N 0
    30 N X
    31 O 0
    32 O 0
    33 O 1
    34 P 0
    35 P 1
    36 P X
    37 Q 0
    38 Q 0
    39 Q 1
    40 R 0
    41 R 0
    42 R X
    43 S 0
    44 S 0
    45 S 0
    46 T 1
    47 T X
    48 T X
    49 U 0
    50 U 0
    51 U 0
    52 V 0
    53 V 0
    54 V 0
    55 W 1
    56 W X
    57 W X
    58 X 0
    59 X X
    60 X X
    61 Y 1
    62 Y X
    63 Y X
    64 Z X
    65 Z X
    66 Z X

    66 record(s) selected.

    ----------------------------------------------------------------------------------
    Test data AFTER script executes

    SELECT * FROM chippib.table1;
    -------------------------------

    REC_SEQ_NBR BUS_KEY CURR_IND
    ----------- -------- --------
    1 A 0
    2 B 1
    3 C 1
    4 D 0
    5 D 0
    6 E 0
    7 E 1
    8 F 0
    9 F 1
    10 G 0
    11 G 1
    12 H 0
    13 H 1
    14 I 0
    15 I 1
    16 J 0
    17 J 0
    18 J 0
    19 K 0
    20 K 0
    21 K 1
    22 L 0
    23 L 0
    24 L 1
    25 M 0
    26 M 0
    27 M 1
    28 N 0
    29 N 0
    30 N 1
    31 O 0
    32 O 0
    33 O 1
    34 P 0
    35 P 0
    36 P 1
    37 Q 0
    38 Q 0
    39 Q 1
    40 R 0
    41 R 0
    42 R 0
    43 S 0
    44 S 0
    45 S 0
    46 T 0
    47 T 0
    48 T 1
    49 U 0
    50 U 0
    51 U 0
    52 V 0
    53 V 0
    54 V 0
    55 W 0
    56 W 0
    57 W 1
    58 X 0
    59 X 0
    60 X 1
    61 Y 0
    62 Y 0
    63 Y 1
    64 Z 0
    65 Z 0
    66 Z 1

    66 record(s) selected.

  10. #10
    Join Date
    Nov 2011
    Posts
    31

    Help in optimizing long running query

    Results of db2advis

    --
    --
    -- LIST OF RECOMMENDED INDEXES
    -- ===========================
    -- index[1], 2563.122MB
    CREATE INDEX "CHIPPIB "."IDX1112121650040" ON "CHIPPIB"."T1"
    ("CURR_IND" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
    COMMIT WORK ;
    -- index[2], 6409.372MB
    CREATE UNIQUE INDEX "CHIPPIB "."IDX1112121651510"
    ON "CHIPPIB"."T1" ("REC_SEQ_NBR" ASC) INCLUDE
    ("CURR_IND") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
    COMMIT WORK ;


    --
    --
    -- RECOMMENDED EXISTING INDEXES
    -- ============================
    -- RUNSTATS ON TABLE "CHIPPIB"."T1" FOR SAMPLED DETAILED INDEX "CHIPPIB"."XT101P" ;
    -- COMMIT WORK ;
    -- RUNSTATS ON TABLE "CHIPPIB"."T1" FOR SAMPLED DETAILED INDEX "CHIPPIB"."XT102" ;
    -- COMMIT WORK ;

  11. #11
    Join Date
    Nov 2011
    Posts
    31

    Help in Optimizing long running query

    I've implemented recommended indexes (db2advis) with no discernible reduction in overall cost. Regarding the Explain plan, I don't see any elegant way to get the Explain results pasted into this thread (using both Control Center and Data Studio). Please advise.

  12. #12
    Join Date
    Nov 2011
    Posts
    334
    You can use clp command to get the plan,,,,
    db2 connect to <DBNAME>
    db2 set current explain mode explain
    run your sql
    db2 set current explain mode no
    db2exfmt -d <dbname> -1 > exfmt.out

    and post the exfmt.out as attachements.....

  13. #13
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Your AFTER data has a contradiction.

    Before: N and R are same pattern
    sequence of curr_ind : 0 0 X

    After : N and R are different
    sequence of curr_ind of N: 0 0 1
    sequence of curr_ind of R: 0 0 0


    Test data before script executes

    SELECT * FROM chippib.table1;
    -------------------------------

    REC_SEQ_NBR BUS_KEY CURR_IND
    ----------- -------- --------
    28 N 0
    29 N 0
    30 N X

    40 R 0
    41 R 0
    42 R X

    Test data AFTER script executes

    SELECT * FROM chippib.table1;
    -------------------------------

    REC_SEQ_NBR BUS_KEY CURR_IND
    ----------- -------- --------
    28 N 0
    29 N 0
    30 N 1

    40 R 0
    41 R 0
    42 R 0
    I wrote
    Quote Originally Posted by tonkuma View Post
    If some combination(or sequence) of curr_ind in a bus_key group should not exists in your table,
    please comment out the rows corresponding to the bus_key.
    I expected like...

    Test data before script executes

    SELECT * FROM chippib.table1;
    -------------------------------

    REC_SEQ_NBR BUS_KEY CURR_IND
    ----------- -------- --------
    1 A 0
    2 B 1
    3 C X
    4 D 0
    5 D 0
    /*
    6 E 1
    7 E 1
    */
    8 F X
    9 F X
    10 G 1
    11 G X
    12 H 0
    13 H X
    14 I 0
    15 I 1
    16 J 0
    17 J 0
    18 J 0
    /*
    19 K 1
    20 K 1
    21 K 1
    */
    22 L X
    23 L X
    24 L X
    /*
    25 M 1
    26 M 1
    27 M X
    */
    28 N 0
    29 N 0
    30 N X
    31 O 0
    32 O 0
    33 O 1
    34 P 0
    35 P 1
    36 P X
    /*
    37 Q X
    38 Q 1
    39 Q 1
    */
    /*
    40 R X
    41 R 0
    42 R 0
    */
    /*
    43 S 1
    44 S 0
    45 S 0
    */
    /*
    46 T 1
    47 T X
    48 T 1
    */
    /*
    49 U 0
    50 U X
    51 U 0
    */
    /*
    52 V 0
    53 V 1
    54 V 0
    */
    55 W 1
    56 W X
    57 W X
    58 X 0
    59 X X
    60 X X
    /*
    61 Y X
    62 Y X
    63 Y 1
    */
    /*
    64 Z X
    65 Z X
    66 Z 0
    */



    Anyhow, if ignored bus_key = 'R', your requirements may be
    For all groups of bus_key which include a row with curr_ind = 'X',
    Set curr_ind of last row(based on sequence of REC_SEQ_NBR) in a bus_key group to '1'
    and
    set curr_ind of other rows to '0'
    End

    Is it right?
    Last edited by tonkuma; 12-13-11 at 06:47. Reason: Comment out rows with bus_key = 'E'

  14. #14
    Join Date
    Nov 2011
    Posts
    31

    Help in optimizing long running query

    My apologies.

    Output for row 42 (rec_seq_nbr) after script executes should be 42 R 1. Output for rows 28,29 & 30 after execution is correct.

    As for your assumption:

    Anyhow, if ignored bus_key = 'R', your requirements may be For all groups of bus_key which include a row with curr_ind = 'X', Set curr_ind of last row(based on sequence of REC_SEQ_NBR) in a bus_key group to '1'
    and
    set curr_ind of other rows to '0'
    End

    Is it right?
    ***************
    -------------------
    Yes, you have it.

    thanks for your interest

  15. #15
    Join Date
    Nov 2011
    Posts
    31

    Help in Optimizing long running query

    Logically speaking I want to do the following more efficiently:

    1. First pass I want to select all rows where the CURR_IND = 'X'. I've already built an index on this column. This column is a good index candidate as less than 3% of the 2 billion rows have this value. So at the most, the result set should be around 6 million rows.

    2. The second pass thru, using the business keys associated with the rows extracted in step 1 above join on the same table. The result set should still be relatively small (let's say 10 million tops). They're already is an index out there that will satisfy the business key.

    3. Lastly using the result set from step 2, set CURR_IND to '1' on the max REC_SEQ_NBR of all related business keys and the remaining rows set to '0'.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •