Results 1 to 15 of 24

121011, 12:26 #1Registered User
 Join Date
 Nov 2011
 Posts
 31
Unanswered: Help in Optimizing long running query
Table 1 has 3 columns (REC_SEQ_NBR, BUS_KEY, CURR_IND)
Last 2 columns of table below represents value of CURR_IND after pass #1 and pass #2
1 A 1 1 1
2 B 1 0 0
3 C 1 0 0
4 D 1 1 1
    
1600000000 G 1 1 1
1600000001 C X 0 0
1600000002 C X X 1
1600000003 B X X 1
1600000004 E X X 1
1600000005 F X 0 0
1600000006 F X X 1
Background
1.Table contains approximately 2 billion rows
2. Process 1 inserts approximately 3 million rows with a value of X for column CURR_IND. In table above this would be represented by PK REC_SEQ_NBR rows 1600000001 thru 1600000006. This executes fast and isnt a bottleneck.
3. Table is partitioned on a date column (1 partition/month). Clustering key is this same column
4. Non PK index is on BUS_KEY column only.
5. CURR_IND is not indexed. It has a possibility of 3 values (0, 1, X). After process 1 completes, the cardinality is as follows (0 300,000 rows, 1 1.6 billion rows, X 3.5 million rows).
6. It is the rows with a CURR_IND = X that Im primarily concerned with
7. While the business key consists of 8 columns, for simplicity I have only included one (i.e. BUS_KEY).
8. Note from above table we have duplicate rows having the same business key (BUS_KEY column).
9. 1st pass SQL that reads every row and takes 6+ hours to complete.
update t1 set curr_ind = '0'
where rec_seq_nbr in
(select b.rec_seq_nbr
from (select rec_seq_nbr from t1 where curr_ind = 'X') as a, t1 as b
where a.bus_key = b.bus_key
and b.rec_seq_nbr < a.rec_seq_nbr)
10. 2nd pass (which is very fast) simply scans the entire table and changes all rows that have a CURR_IND = X to 1
Challenge
1. How can I avoid reading every row as part of the UPDATE statement? Im only interested in rows that have a CURR_IND = X and related BUS_KEY rows (see REC_SEQ_NBR 2 & 3 in table above). Essentially out of a 1.6 billion row table Ill only ever update < 2% of these rows. It seems like such a waste of processing power to read the entire table.
2. Also, how could I accomplish pass #1 and pass #2 in a single pass?

121011, 14:28 #2Registered User
 Join Date
 Feb 2008
 Location
 Japan
 Posts
 3,483
The question seems similar to the thread
http://www.dbforums.com/db2/1672196...statement.html
If so, PLEASE try the idea in the thread, like...
(Combined some of my posts and modified.)
1) Create an unique index (CURR_IND , BUS_KEY , REC_SEQ_NBR DESC)
2) Update statement to accomplish pass #1 and pass #2.
Example 1:
Code:UPDATE t1 t SET curr_ind = COALESCE( (SELECT MAX( '0' ) FROM t1 s WHERE s.bus_key = t.bus_key AND s.rec_seq_nbr > t.rec_seq_nbr AND s.curr_ind = 'X' ) , '1' ) WHERE curr_ind = 'X' ;
Last edited by tonkuma; 121011 at 15:33. Reason: Add unique for a index.

121011, 15:42 #3Registered User
 Join Date
 Feb 2008
 Location
 Japan
 Posts
 3,483
The Example 2 was showed better performance than Example 1 on my simplified test with 100000 row.
Example 2: unique index (CURR_IND , BUS_KEY , REC_SEQ_NBR DESC) was used.
Code:UPDATE (SELECT * FROM (SELECT curr_ind , ROW_NUMBER() OVER(PARTITION BY bus_key ORDER BY rec_seq_nbr ) AS rnum FROM t1 t WHERE curr_ind = 'X' ) ) SET curr_ind = CASE rnum WHEN 1 THEN '0' ELSE '1' END ;

121011, 17:02 #4Registered User
 Join Date
 Nov 2011
 Posts
 31
help in optimizing long running query
Thanks for your help!! Sorry for the confusion I've caused but I don't see how your first example will update rows other than CURR_IND = 'X'.
Please look at output column which represents value of CURR_IND after 1st pass.
Note in the sample table output that I may also have to update related rows that have the same business key.
For example, REC_SEQ_NBR 1600000003 has a BUS_KEY of 'B' and a CURR_IND of 'X'. Also note that REC_SEQ_NBR 2 also has the same BUS_KEY of 'B'
where it's CURR_IND is set to '1',meaning most current. So when processing REC_SEQ_NBR 1600000003 I also need to reset the CURR_IND of
REC_SEQ_NBR 2 from '1' to '0'. This implies not current.
My aplogies for the confusion. In essence the driving force is rows that have a CURR_IND = 'X' but I also need to reset the previous related BUS_KEY(s)
from ('1' or 'X') to '0'.

121011, 21:53 #5Registered User
 Join Date
 Feb 2008
 Location
 Japan
 Posts
 3,483
1.
Table 1 has 3 columns (REC_SEQ_NBR, BUS_KEY, CURR_IND)
Last 2 columns of table below represents value of CURR_IND after pass #1 and pass #2
1 A 1 1 1
2 B 1 0 0
3 C 1 0 0
4 D 1 1 1
    
1600000000 G 1 1 1
1600000001 C X 0 0
1600000002 C X X 1
1600000003 B X X 1
1600000004 E X X 1
1600000005 F X 0 0
1600000006 F X X 1
"has 3 columns" vs "2 B 1 0 0" vs "1600000000 G 1 1 1"
12) I couldn't find the reason of marked bold.
2.
You wrote in first(original) post
I’m only interested in rows that have a CURR_IND = ‘X’ and related BUS_KEY rows (see REC_SEQ_NBR 2 & 3 in table above). Essentially out of a 1.6 billion row table I’ll only ever update < 2% of these rows. It seems like such a waste of processing power to read the entire table.
In essence the driving force is rows that have a CURR_IND = 'X' but I also need to reset the previous related BUS_KEY(s)
from ('1' or 'X') to '0'.
"only interested in rows that have a CURR_IND = ‘X’"
vs
"also need to reset the previous related BUS_KEY(s)
from ('1' or 'X') to '0'"
3.
but I don't see how your first example will update rows other than CURR_IND = 'X'.
if you couldn't understand them fully.
And see the result wheather it worked without error message, wheather the gotton result meet your required result or not.
If the result was not your expected result.
Please write
31) your test data
32) executed statements
Because you might changed table name or column name(for rxample bus_key).
33) If you got error message, copy and paste the error message.
34) expected result
35) actually gotton resultLast edited by tonkuma; 121011 at 22:29. Reason: Change 11)

121011, 22:13 #6Registered User
 Join Date
 Feb 2008
 Location
 Japan
 Posts
 3,483
If you supplied test data and expected final result(I don't need intemediate results),
it would help greatly to understand your requirements
and test my examples by myself before show in this forum.
The test data should include at least...
(bus_key group with one, two, three or more rows)
*
(bus_key group having only curr_ind = 'X', only curr_ind = '0', only curr_ind = '1', two combination of current_ind values, all possible values of current_ind)
= 3group(one row in a group) * 1 + 6group(two rows in a group) * 2 + 7group(three rows in a group) * 3 = 36 rows.
If some combinations are not present in the table, write explicitly the fact.Last edited by tonkuma; 121011 at 22:27. Reason: Change calculation of number of rows in a test data.

121111, 02:34 #7Registered User
 Join Date
 Feb 2008
 Location
 Japan
 Posts
 3,483
Here, I prepared test data(assumed just before applying required update).
Would you please copy and modify the result of the select statement to meet required final result.
(I'm not interested in intermediate result.)
If some combination(or sequence) of curr_ind in a bus_key group should not exists in your table,
please comment out the rows corresponding to the bus_key.
Code: Commands Entered  CREATE TABLE chippib.table1 ( rec_seq_nbr INTEGER NOT NULL PRIMARY KEY , bus_key CHAR( 8) NOT NULL , curr_ind CHAR( 1) NOT NULL CHECK( curr_ind IN ('0' , '1' , 'X') ) );  DB20000I The SQL command completed successfully.
Code: Commands Entered  SELECT * FROM chippib.table1;  REC_SEQ_NBR BUS_KEY CURR_IND    1 A 0 2 B 1 3 C X 4 D 0 5 D 0 6 E 1 7 E 1 8 F X 9 F X 10 G 1 11 G X 12 H 0 13 H X 14 I 0 15 I 1 16 J 0 17 J 0 18 J 0 19 K 1 20 K 1 21 K 1 22 L X 23 L X 24 L X 25 M 1 26 M 1 27 M X 28 N 0 29 N 0 30 N X 31 O 0 32 O 0 33 O 1 34 P 0 35 P 1 36 P X 37 Q X 38 Q 1 39 Q 1 40 R X 41 R 0 42 R 0 43 S 1 44 S 0 45 S 0 46 T 1 47 T X 48 T 1 49 U 0 50 U X 51 U 0 52 V 0 53 V 1 54 V 0 55 W 1 56 W X 57 W X 58 X 0 59 X X 60 X X 61 Y X 62 Y X 63 Y 1 64 Z X 65 Z X 66 Z 0 66 record(s) selected.
Last edited by tonkuma; 121111 at 03:00. Reason: Add test data(rec_seq_nbr = 55 to 66, bus_key = 'W' to 'Z')

121111, 23:09 #8Registered User
 Join Date
 Nov 2011
 Posts
 334
hi
Plz post the exactly access plan for your 1st update here。
I guess there may be two point in the SQL we must concern
first , (select rec_seq_nbr from t1 where curr_ind = 'X') as a ，
if there is no index key on curr_ind , DB2 must use entire table scan to build table a。SO you can create a index on column ( curr_ind ) but ,if the the cluster ratio of this index is low enough, db2 will still use table scan in stead of using this index ), or you can build a temorary tables to hold the inserted rows manually (it will be better ) . fisrt you insert your rows into both temorary table and table t1 then update t1 use tempary table ........
second (I dont know which method DB2 chose to join table a and b while you
did not put access plan here。） If it is NL join，maybe a index on column ( bus_key, rec_seq_nbr) will give some help for your update statement .....
also your can run db2advis on your update sql to see whether there are some helpful indexes......

121211, 11:32 #9Registered User
 Join Date
 Nov 2011
 Posts
 31
Help in Optimizing long running query
Thank you for your interest. The cardinality on CURR_IND is as follows:
0 (not current) 300,000
1 (current)  1.8 billion
X (rows to be processed along with related business keys)  3 million
Note, I've updated curr_ind on some of your test data.
#1 It is not possible to have rows that have a identical business key
where all similar business key rows are considered current '1'. It is however possible to have
non unique values of '0' and 'X' values (CURR_IND) for the same business key.
#2 Rows where CURR_IND = 'X' will always have a REC_SEQ_NBR greater than similar rows having same business key.
Test data before script executes
SELECT * FROM chippib.table1;

REC_SEQ_NBR BUS_KEY CURR_IND
  
1 A 0
2 B 1
3 C X
4 D 0
5 D 0
6 E 0
7 E 1
8 F X
9 F X
10 G 1
11 G X
12 H 0
13 H X
14 I 0
15 I 1
16 J 0
17 J 0
18 J 0
19 K 0
20 K 1
21 K X
22 L X
23 L X
24 L X
25 M 0
26 M 1
27 M X
28 N 0
29 N 0
30 N X
31 O 0
32 O 0
33 O 1
34 P 0
35 P 1
36 P X
37 Q 0
38 Q 0
39 Q 1
40 R 0
41 R 0
42 R X
43 S 0
44 S 0
45 S 0
46 T 1
47 T X
48 T X
49 U 0
50 U 0
51 U 0
52 V 0
53 V 0
54 V 0
55 W 1
56 W X
57 W X
58 X 0
59 X X
60 X X
61 Y 1
62 Y X
63 Y X
64 Z X
65 Z X
66 Z X
66 record(s) selected.

Test data AFTER script executes
SELECT * FROM chippib.table1;

REC_SEQ_NBR BUS_KEY CURR_IND
  
1 A 0
2 B 1
3 C 1
4 D 0
5 D 0
6 E 0
7 E 1
8 F 0
9 F 1
10 G 0
11 G 1
12 H 0
13 H 1
14 I 0
15 I 1
16 J 0
17 J 0
18 J 0
19 K 0
20 K 0
21 K 1
22 L 0
23 L 0
24 L 1
25 M 0
26 M 0
27 M 1
28 N 0
29 N 0
30 N 1
31 O 0
32 O 0
33 O 1
34 P 0
35 P 0
36 P 1
37 Q 0
38 Q 0
39 Q 1
40 R 0
41 R 0
42 R 0
43 S 0
44 S 0
45 S 0
46 T 0
47 T 0
48 T 1
49 U 0
50 U 0
51 U 0
52 V 0
53 V 0
54 V 0
55 W 0
56 W 0
57 W 1
58 X 0
59 X 0
60 X 1
61 Y 0
62 Y 0
63 Y 1
64 Z 0
65 Z 0
66 Z 1
66 record(s) selected.

121211, 14:10 #10Registered User
 Join Date
 Nov 2011
 Posts
 31
Help in optimizing long running query
Results of db2advis


 LIST OF RECOMMENDED INDEXES
 ===========================
 index[1], 2563.122MB
CREATE INDEX "CHIPPIB "."IDX1112121650040" ON "CHIPPIB"."T1"
("CURR_IND" ASC) ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
COMMIT WORK ;
 index[2], 6409.372MB
CREATE UNIQUE INDEX "CHIPPIB "."IDX1112121651510"
ON "CHIPPIB"."T1" ("REC_SEQ_NBR" ASC) INCLUDE
("CURR_IND") ALLOW REVERSE SCANS COLLECT SAMPLED DETAILED STATISTICS;
COMMIT WORK ;


 RECOMMENDED EXISTING INDEXES
 ============================
 RUNSTATS ON TABLE "CHIPPIB"."T1" FOR SAMPLED DETAILED INDEX "CHIPPIB"."XT101P" ;
 COMMIT WORK ;
 RUNSTATS ON TABLE "CHIPPIB"."T1" FOR SAMPLED DETAILED INDEX "CHIPPIB"."XT102" ;
 COMMIT WORK ;

121211, 16:13 #11Registered User
 Join Date
 Nov 2011
 Posts
 31
Help in Optimizing long running query
I've implemented recommended indexes (db2advis) with no discernible reduction in overall cost. Regarding the Explain plan, I don't see any elegant way to get the Explain results pasted into this thread (using both Control Center and Data Studio). Please advise.

121211, 21:15 #12Registered User
 Join Date
 Nov 2011
 Posts
 334
You can use clp command to get the plan,,,,
db2 connect to <DBNAME>
db2 set current explain mode explain
run your sql
db2 set current explain mode no
db2exfmt d <dbname> 1 > exfmt.out
and post the exfmt.out as attachements.....

121311, 04:55 #13Registered User
 Join Date
 Feb 2008
 Location
 Japan
 Posts
 3,483
Your AFTER data has a contradiction.
Before: N and R are same pattern
sequence of curr_ind : 0 0 X
After : N and R are different
sequence of curr_ind of N: 0 0 1
sequence of curr_ind of R: 0 0 0
Test data before script executes
SELECT * FROM chippib.table1;

REC_SEQ_NBR BUS_KEY CURR_IND
  
28 N 0
29 N 0
30 N X
40 R 0
41 R 0
42 R X
Test data AFTER script executes
SELECT * FROM chippib.table1;

REC_SEQ_NBR BUS_KEY CURR_IND
  
28 N 0
29 N 0
30 N 1
40 R 0
41 R 0
42 R 0
I expected like...
Test data before script executes
SELECT * FROM chippib.table1;

REC_SEQ_NBR BUS_KEY CURR_IND
  
1 A 0
2 B 1
3 C X
4 D 0
5 D 0
/*
6 E 1
7 E 1
*/
8 F X
9 F X
10 G 1
11 G X
12 H 0
13 H X
14 I 0
15 I 1
16 J 0
17 J 0
18 J 0
/*
19 K 1
20 K 1
21 K 1
*/
22 L X
23 L X
24 L X
/*
25 M 1
26 M 1
27 M X
*/
28 N 0
29 N 0
30 N X
31 O 0
32 O 0
33 O 1
34 P 0
35 P 1
36 P X
/*
37 Q X
38 Q 1
39 Q 1
*/
/*
40 R X
41 R 0
42 R 0
*/
/*
43 S 1
44 S 0
45 S 0
*/
/*
46 T 1
47 T X
48 T 1
*/
/*
49 U 0
50 U X
51 U 0
*/
/*
52 V 0
53 V 1
54 V 0
*/
55 W 1
56 W X
57 W X
58 X 0
59 X X
60 X X
/*
61 Y X
62 Y X
63 Y 1
*/
/*
64 Z X
65 Z X
66 Z 0
*/
Anyhow, if ignored bus_key = 'R', your requirements may be
For all groups of bus_key which include a row with curr_ind = 'X',
Set curr_ind of last row(based on sequence of REC_SEQ_NBR) in a bus_key group to '1'
and
set curr_ind of other rows to '0'
End
Is it right?Last edited by tonkuma; 121311 at 06:47. Reason: Comment out rows with bus_key = 'E'

121311, 07:00 #14Registered User
 Join Date
 Nov 2011
 Posts
 31
Help in optimizing long running query
My apologies.
Output for row 42 (rec_seq_nbr) after script executes should be 42 R 1. Output for rows 28,29 & 30 after execution is correct.
As for your assumption:
Anyhow, if ignored bus_key = 'R', your requirements may be For all groups of bus_key which include a row with curr_ind = 'X', Set curr_ind of last row(based on sequence of REC_SEQ_NBR) in a bus_key group to '1'
and
set curr_ind of other rows to '0'
End
Is it right?
***************

Yes, you have it.
thanks for your interest

121311, 11:37 #15Registered User
 Join Date
 Nov 2011
 Posts
 31
Help in Optimizing long running query
Logically speaking I want to do the following more efficiently:
1. First pass I want to select all rows where the CURR_IND = 'X'. I've already built an index on this column. This column is a good index candidate as less than 3% of the 2 billion rows have this value. So at the most, the result set should be around 6 million rows.
2. The second pass thru, using the business keys associated with the rows extracted in step 1 above join on the same table. The result set should still be relatively small (let's say 10 million tops). They're already is an index out there that will satisfy the business key.
3. Lastly using the result set from step 2, set CURR_IND to '1' on the max REC_SEQ_NBR of all related business keys and the remaining rows set to '0'.