If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Help in Optimizing long running query

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 12-10-11, 11:26
chippib chippib is offline
Registered User
 
Join Date: Nov 2011
Posts: 21
Help in Optimizing long running query

Table 1 has 3 columns (REC_SEQ_NBR, BUS_KEY, CURR_IND)

Last 2 columns of table below represents value of CURR_IND after pass #1 and pass #2

1 A 1 1 1
2 B 1 0 0
3 C 1 0 0
4 D 1 1 1
--- --- -- --- --
1600000000 G 1 1 1
1600000001 C X 0 0
1600000002 C X X 1
1600000003 B X X 1
1600000004 E X X 1
1600000005 F X 0 0
1600000006 F X X 1



Background

1.Table contains approximately 2 billion rows

2. Process 1 inserts approximately 3 million rows with a value of ‘X’ for column CURR_IND. In table above this would be represented by PK REC_SEQ_NBR rows 1600000001 thru 1600000006. This executes fast and isn’t a bottleneck.

3. Table is partitioned on a date column (1 partition/month). Clustering key is this same column

4. Non PK index is on BUS_KEY column only.

5. CURR_IND is not indexed. It has a possibility of 3 values (‘0’, ‘1’, ‘X’). After process 1 completes, the cardinality is as follows (‘0’ 300,000 rows, ‘1’ 1.6 billion rows, ‘X’ 3.5 million rows).

6. It is the rows with a CURR_IND = ‘X’ that I’m primarily concerned with

7. While the business key consists of 8 columns, for simplicity I have only included one (i.e. BUS_KEY).

8. Note from above table we have duplicate rows having the same business key (BUS_KEY column).

9. 1st pass SQL that reads every row and takes 6+ hours to complete.

update t1 set curr_ind = '0'
where rec_seq_nbr in
(select b.rec_seq_nbr
from (select rec_seq_nbr from t1 where curr_ind = 'X') as a, t1 as b
where a.bus_key = b.bus_key
and b.rec_seq_nbr < a.rec_seq_nbr)

10. 2nd pass (which is very fast) simply scans the entire table and changes all rows that have a CURR_IND = ‘X’ to ‘1’


Challenge
1. How can I avoid reading every row as part of the UPDATE statement? I’m only interested in rows that have a CURR_IND = ‘X’ and related BUS_KEY rows (see REC_SEQ_NBR 2 & 3 in table above). Essentially out of a 1.6 billion row table I’ll only ever update < 2% of these rows. It seems like such a waste of processing power to read the entire table.

2. Also, how could I accomplish pass #1 and pass #2 in a single pass?
Reply With Quote
  #2 (permalink)  
Old 12-10-11, 13:28
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
The question seems similar to the thread
Help with long running UPDATE statement

If so, PLEASE try the idea in the thread, like...
(Combined some of my posts and modified.)
1) Create an unique index (CURR_IND , BUS_KEY , REC_SEQ_NBR DESC)

2) Update statement to accomplish pass #1 and pass #2.
Example 1:
Code:
UPDATE t1 t
   SET curr_ind
     = COALESCE(
          (SELECT MAX( '0' )
            FROM  t1 s
            WHERE s.bus_key     = t.bus_key
              AND s.rec_seq_nbr > t.rec_seq_nbr
              AND s.curr_ind    = 'X'
          )
        , '1'
       )
 WHERE curr_ind    = 'X'
;

Last edited by tonkuma; 12-10-11 at 14:33. Reason: Add unique for a index.
Reply With Quote
  #3 (permalink)  
Old 12-10-11, 14:42
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
The Example 2 was showed better performance than Example 1 on my simplified test with 100000 row.

Example 2: unique index (CURR_IND , BUS_KEY , REC_SEQ_NBR DESC) was used.
Code:
UPDATE
(SELECT *
  FROM  (SELECT curr_ind
              , ROW_NUMBER()
                   OVER(PARTITION BY bus_key
                            ORDER BY rec_seq_nbr
                       ) AS rnum
          FROM  t1 t
          WHERE curr_ind = 'X'
        )
)
SET curr_ind
    = CASE rnum
      WHEN 1 THEN '0'
      ELSE        '1'
      END
;
Reply With Quote
  #4 (permalink)  
Old 12-10-11, 16:02
chippib chippib is offline
Registered User
 
Join Date: Nov 2011
Posts: 21
help in optimizing long running query

Thanks for your help!! Sorry for the confusion I've caused but I don't see how your first example will update rows other than CURR_IND = 'X'.
Please look at output column which represents value of CURR_IND after 1st pass.

Note in the sample table output that I may also have to update related rows that have the same business key.

For example, REC_SEQ_NBR 1600000003 has a BUS_KEY of 'B' and a CURR_IND of 'X'. Also note that REC_SEQ_NBR 2 also has the same BUS_KEY of 'B'
where it's CURR_IND is set to '1',meaning most current. So when processing REC_SEQ_NBR 1600000003 I also need to reset the CURR_IND of
REC_SEQ_NBR 2 from '1' to '0'. This implies not current.

My aplogies for the confusion. In essence the driving force is rows that have a CURR_IND = 'X' but I also need to reset the previous related BUS_KEY(s)
from ('1' or 'X') to '0'.
Reply With Quote
  #5 (permalink)  
Old 12-10-11, 20:53
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
1.
Quote:
Table 1 has 3 columns (REC_SEQ_NBR, BUS_KEY, CURR_IND)

Last 2 columns of table below represents value of CURR_IND after pass #1 and pass #2

1 A 1 1 1
2 B 1 0 0
3 C 1 0 0
4 D 1 1 1
--- --- -- --- --
1600000000 G 1 1 1
1600000001 C X 0 0
1600000002 C X X 1
1600000003 B X X 1
1600000004 E X X 1
1600000005 F X 0 0
1600000006 F X X 1
1-1) Your example has contradictions.
"has 3 columns" vs "2 B 1 0 0" vs "1600000000 G 1 1 1"

1-2) I couldn't find the reason of marked bold.


2.
You wrote in first(original) post
Quote:
I’m only interested in rows that have a CURR_IND = ‘X’ and related BUS_KEY rows (see REC_SEQ_NBR 2 & 3 in table above). Essentially out of a 1.6 billion row table I’ll only ever update < 2% of these rows. It seems like such a waste of processing power to read the entire table.
and wrote in last post
Quote:
In essence the driving force is rows that have a CURR_IND = 'X' but I also need to reset the previous related BUS_KEY(s)
from ('1' or 'X') to '0'.
I thought that both are completely different.
"only interested in rows that have a CURR_IND = ‘X’"
vs
"also need to reset the previous related BUS_KEY(s)
from ('1' or 'X') to '0'"


3.
Quote:
but I don't see how your first example will update rows other than CURR_IND = 'X'.
Anyhow, PLEASE TRY the examples(Example 1 and Example 2) with small test data,
if you couldn't understand them fully.
And see the result wheather it worked without error message, wheather the gotton result meet your required result or not.

If the result was not your expected result.
Please write
3-1) your test data
3-2) executed statements
Because you might changed table name or column name(for rxample bus_key).
3-3) If you got error message, copy and paste the error message.
3-4) expected result
3-5) actually gotton result

Last edited by tonkuma; 12-10-11 at 21:29. Reason: Change 1-1)
Reply With Quote
  #6 (permalink)  
Old 12-10-11, 21:13
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
If you supplied test data and expected final result(I don't need intemediate results),
it would help greatly to understand your requirements
and test my examples by myself before show in this forum.

The test data should include at least...
(bus_key group with one, two, three or more rows)
*
(bus_key group having only curr_ind = 'X', only curr_ind = '0', only curr_ind = '1', two combination of current_ind values, all possible values of current_ind)

= 3group(one row in a group) * 1 + 6group(two rows in a group) * 2 + 7group(three rows in a group) * 3 = 36 rows.
If some combinations are not present in the table, write explicitly the fact.

Last edited by tonkuma; 12-10-11 at 21:27. Reason: Change calculation of number of rows in a test data.
Reply With Quote
  #7 (permalink)  
Old 12-11-11, 01:34
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Here, I prepared test data(assumed just before applying required update).

Would you please copy and modify the result of the select statement to meet required final result.
(I'm not interested in intermediate result.)

If some combination(or sequence) of curr_ind in a bus_key group should not exists in your table,
please comment out the rows corresponding to the bus_key.
Code:
------------------------------ Commands Entered ------------------------------
CREATE TABLE chippib.table1
( rec_seq_nbr INTEGER  NOT NULL PRIMARY KEY
, bus_key     CHAR( 8) NOT NULL
, curr_ind    CHAR( 1) NOT NULL
              CHECK( curr_ind IN ('0' , '1' , 'X') )
);
------------------------------------------------------------------------------
DB20000I  The SQL command completed successfully.
Code:
------------------------------ Commands Entered ------------------------------
SELECT * FROM chippib.table1;
------------------------------------------------------------------------------

REC_SEQ_NBR BUS_KEY  CURR_IND
----------- -------- --------
          1 A        0       
          2 B        1       
          3 C        X       
          4 D        0       
          5 D        0       
          6 E        1       
          7 E        1       
          8 F        X       
          9 F        X       
         10 G        1       
         11 G        X       
         12 H        0       
         13 H        X       
         14 I        0       
         15 I        1       
         16 J        0       
         17 J        0       
         18 J        0       
         19 K        1       
         20 K        1       
         21 K        1       
         22 L        X       
         23 L        X       
         24 L        X       
         25 M        1       
         26 M        1       
         27 M        X       
         28 N        0       
         29 N        0       
         30 N        X       
         31 O        0       
         32 O        0       
         33 O        1       
         34 P        0       
         35 P        1       
         36 P        X       
         37 Q        X       
         38 Q        1       
         39 Q        1       
         40 R        X       
         41 R        0       
         42 R        0       
         43 S        1       
         44 S        0       
         45 S        0       
         46 T        1       
         47 T        X       
         48 T        1       
         49 U        0       
         50 U        X       
         51 U        0       
         52 V        0       
         53 V        1       
         54 V        0       
         55 W        1       
         56 W        X       
         57 W        X       
         58 X        0       
         59 X        X       
         60 X        X       
         61 Y        X       
         62 Y        X       
         63 Y        1       
         64 Z        X       
         65 Z        X       
         66 Z        0       

  66 record(s) selected.

Last edited by tonkuma; 12-11-11 at 02:00. Reason: Add test data(rec_seq_nbr = 55 to 66, bus_key = 'W' to 'Z')
Reply With Quote
  #8 (permalink)  
Old 12-11-11, 22:09
fengsun2 fengsun2 is offline
Registered User
 
Join Date: Nov 2011
Posts: 124
hi

Plz post the exactly access plan for your 1st update here。
I guess there may be two point in the SQL we must concern
first , (select rec_seq_nbr from t1 where curr_ind = 'X') as a ,
if there is no index key on curr_ind , DB2 must use entire table scan to build table a。SO you can create a index on column ( curr_ind ) but ,if the the cluster ratio of this index is low enough, db2 will still use table scan in stead of using this index ), or you can build a temorary tables to hold the inserted rows manually (it will be better ) . fisrt you insert your rows into both temorary table and table t1 then update t1 use tempary table ........

second (I dont know which method DB2 chose to join table a and b while you
did not put access plan here。) If it is NL join,maybe a index on column ( bus_key, rec_seq_nbr) will give some help for your update statement .....

also your can run db2advis on your update sql to see whether there are some helpful indexes......
Reply With Quote
  #9 (permalink)  
Old 12-12-11, 10:32
chippib chippib is offline
Registered User
 
Join Date: Nov 2011
Posts: 21
Help in Optimizing long running query

Thank you for your interest. The cardinality on CURR_IND is as follows:

0 (not current)- 300,000
1 (current) - 1.8 billion
X (rows to be processed along with related business keys) - 3 million



Note, I've updated curr_ind on some of your test data.
#1 It is not possible to have rows that have a identical business key
where all similar business key rows are considered current '1'. It is however possible to have
non unique values of '0' and 'X' values (CURR_IND) for the same business key.
#2 Rows where CURR_IND = 'X' will always have a REC_SEQ_NBR greater than similar rows having same business key.

Test data before script executes

SELECT * FROM chippib.table1;
-------------------------------

REC_SEQ_NBR BUS_KEY CURR_IND
----------- -------- --------
1 A 0
2 B 1
3 C X
4 D 0
5 D 0
6 E 0
7 E 1
8 F X
9 F X
10 G 1
11 G X
12 H 0
13 H X
14 I 0
15 I 1
16 J 0
17 J 0
18 J 0
19 K 0
20 K 1
21 K X
22 L X
23 L X
24 L X
25 M 0
26 M 1
27 M X
28 N 0
29 N 0
30 N X
31 O 0
32 O 0
33 O 1
34 P 0
35 P 1
36 P X
37 Q 0
38 Q 0
39 Q 1
40 R 0
41 R 0
42 R X
43 S 0
44 S 0
45 S 0
46 T 1
47 T X
48 T X
49 U 0
50 U 0
51 U 0
52 V 0
53 V 0
54 V 0
55 W 1
56 W X
57 W X
58 X 0
59 X X
60 X X
61 Y 1
62 Y X
63 Y X
64 Z X
65 Z X
66 Z X

66 record(s) selected.

----------------------------------------------------------------------------------
Test data AFTER script executes

SELECT * FROM chippib.table1;
-------------------------------

REC_SEQ_NBR BUS_KEY CURR_IND
----------- -------- --------
1 A 0
2 B 1
3 C 1
4 D 0
5 D 0
6 E 0
7 E 1
8 F 0
9 F 1
10 G 0
11 G 1
12 H 0
13 H 1
14 I 0
15 I 1
16 J 0
17 J 0
18 J 0
19 K 0
20 K 0
21 K 1
22 L 0
23 L 0
24 L 1
25 M 0
26 M 0
27 M 1
28 N 0
29 N 0
30 N 1
31 O 0
32 O 0
33 O 1
34 P 0
35 P 0
36 P 1
37 Q 0
38 Q 0
39 Q 1
40 R 0
41 R 0
42 R 0
43 S 0
44 S 0
45 S 0
46 T 0
47 T 0
48 T 1
49 U 0
50 U 0
51 U 0
52 V 0
53 V 0
54 V 0
55 W 0
56 W 0
57 W 1
58 X 0
59 X 0
60 X 1
61 Y 0
62 Y 0
63 Y 1
64 Z 0
65 Z 0
66 Z 1

66 record(s) selected.
Reply With Quote
  #10 (permalink)  
Old 12-12-11, 13:10
chippib chippib is offline
Registered User
 
Join Date: Nov 2011
Posts: 21
Help in optimizing long running query

Results of db2advis

--
--
-- LIST OF RECOMMENDED INDEXES
-- ===========================
-- index[1], 2563.122MB
CREATE INDEX "CHIPPIB "."IDX1112121650040" ON "CHIPPIB"."T1"
("CURR_IND" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
COMMIT WORK ;
-- index[2], 6409.372MB
CREATE UNIQUE INDEX "CHIPPIB "."IDX1112121651510"
ON "CHIPPIB"."T1" ("REC_SEQ_NBR" ASC) INCLUDE
("CURR_IND") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
COMMIT WORK ;


--
--
-- RECOMMENDED EXISTING INDEXES
-- ============================
-- RUNSTATS ON TABLE "CHIPPIB"."T1" FOR SAMPLED DETAILED INDEX "CHIPPIB"."XT101P" ;
-- COMMIT WORK ;
-- RUNSTATS ON TABLE "CHIPPIB"."T1" FOR SAMPLED DETAILED INDEX "CHIPPIB"."XT102" ;
-- COMMIT WORK ;
Reply With Quote
  #11 (permalink)  
Old 12-12-11, 15:13
chippib chippib is offline
Registered User
 
Join Date: Nov 2011
Posts: 21
Help in Optimizing long running query

I've implemented recommended indexes (db2advis) with no discernible reduction in overall cost. Regarding the Explain plan, I don't see any elegant way to get the Explain results pasted into this thread (using both Control Center and Data Studio). Please advise.
Reply With Quote
  #12 (permalink)  
Old 12-12-11, 20:15
fengsun2 fengsun2 is offline
Registered User
 
Join Date: Nov 2011
Posts: 124
You can use clp command to get the plan,,,,
db2 connect to <DBNAME>
db2 set current explain mode explain
run your sql
db2 set current explain mode no
db2exfmt -d <dbname> -1 > exfmt.out

and post the exfmt.out as attachements.....
Reply With Quote
  #13 (permalink)  
Old 12-13-11, 03:55
tonkuma tonkuma is offline
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Your AFTER data has a contradiction.

Before: N and R are same pattern
sequence of curr_ind : 0 0 X

After : N and R are different
sequence of curr_ind of N: 0 0 1
sequence of curr_ind of R: 0 0 0


Quote:
Test data before script executes

SELECT * FROM chippib.table1;
-------------------------------

REC_SEQ_NBR BUS_KEY CURR_IND
----------- -------- --------
28 N 0
29 N 0
30 N X

40 R 0
41 R 0
42 R X

Test data AFTER script executes

SELECT * FROM chippib.table1;
-------------------------------

REC_SEQ_NBR BUS_KEY CURR_IND
----------- -------- --------
28 N 0
29 N 0
30 N 1

40 R 0
41 R 0
42 R 0
I wrote
Quote:
Originally Posted by tonkuma View Post
If some combination(or sequence) of curr_ind in a bus_key group should not exists in your table,
please comment out the rows corresponding to the bus_key.
I expected like...

Test data before script executes

SELECT * FROM chippib.table1;
-------------------------------

REC_SEQ_NBR BUS_KEY CURR_IND
----------- -------- --------
1 A 0
2 B 1
3 C X
4 D 0
5 D 0
/*
6 E 1
7 E 1
*/
8 F X
9 F X
10 G 1
11 G X
12 H 0
13 H X
14 I 0
15 I 1
16 J 0
17 J 0
18 J 0
/*
19 K 1
20 K 1
21 K 1
*/
22 L X
23 L X
24 L X
/*
25 M 1
26 M 1
27 M X
*/
28 N 0
29 N 0
30 N X
31 O 0
32 O 0
33 O 1
34 P 0
35 P 1
36 P X
/*
37 Q X
38 Q 1
39 Q 1
*/
/*
40 R X
41 R 0
42 R 0
*/
/*
43 S 1
44 S 0
45 S 0
*/
/*
46 T 1
47 T X
48 T 1
*/
/*
49 U 0
50 U X
51 U 0
*/
/*
52 V 0
53 V 1
54 V 0
*/
55 W 1
56 W X
57 W X
58 X 0
59 X X
60 X X
/*
61 Y X
62 Y X
63 Y 1
*/
/*
64 Z X
65 Z X
66 Z 0
*/



Anyhow, if ignored bus_key = 'R', your requirements may be
For all groups of bus_key which include a row with curr_ind = 'X',
Set curr_ind of last row(based on sequence of REC_SEQ_NBR) in a bus_key group to '1'
and
set curr_ind of other rows to '0'
End

Is it right?

Last edited by tonkuma; 12-13-11 at 05:47. Reason: Comment out rows with bus_key = 'E'
Reply With Quote
  #14 (permalink)  
Old 12-13-11, 06:00
chippib chippib is offline
Registered User
 
Join Date: Nov 2011
Posts: 21
Help in optimizing long running query

My apologies.

Output for row 42 (rec_seq_nbr) after script executes should be 42 R 1. Output for rows 28,29 & 30 after execution is correct.

As for your assumption:

Anyhow, if ignored bus_key = 'R', your requirements may be For all groups of bus_key which include a row with curr_ind = 'X', Set curr_ind of last row(based on sequence of REC_SEQ_NBR) in a bus_key group to '1'
and
set curr_ind of other rows to '0'
End

Is it right?
***************
-------------------
Yes, you have it.

thanks for your interest
Reply With Quote
  #15 (permalink)  
Old 12-13-11, 10:37
chippib chippib is offline
Registered User
 
Join Date: Nov 2011
Posts: 21
Help in Optimizing long running query

Logically speaking I want to do the following more efficiently:

1. First pass I want to select all rows where the CURR_IND = 'X'. I've already built an index on this column. This column is a good index candidate as less than 3% of the 2 billion rows have this value. So at the most, the result set should be around 6 million rows.

2. The second pass thru, using the business keys associated with the rows extracted in step 1 above join on the same table. The result set should still be relatively small (let's say 10 million tops). They're already is an index out there that will satisfy the business key.

3. Lastly using the result set from step 2, set CURR_IND to '1' on the max REC_SEQ_NBR of all related business keys and the remaining rows set to '0'.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On