If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > SubQuery Performance

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-25-10, 20:44
mail4mz mail4mz is offline
Registered User
 
Join Date: Aug 2010
Posts: 17
SubQuery Performance

Hi All,

I have a poor performance SQL as following. The table has about 200M records, each employee have average 100 records. The query lasts about 3 hours. All I want is to update the flag for highest version of each client's record. Any suggestion is welcome!

Thanks,

Mike


====SQL===========
update empTbl A
set flag=1
where
rec_ver =
( select max(rec_ver)
from empTbl
where empNo = A.empNo)



===Table empTbl=====

empTbl
{
int empNo;
int flag;
char[256] empDesc;
int rec_ver;
}
Reply With Quote
  #2 (permalink)  
Old 08-25-10, 21:09
Stealth_DBA Stealth_DBA is offline
Registered User
 
Join Date: May 2009
Posts: 472
mail4mz, try this (it works in DB2 V9.7)
Code:
UPDATE EMPTBL
SET FLAG = 1
WHERE (EMPNO, REC_VER) IN(SELECT EMPNO, MAX(REC_VER)
                          FROM EMPTBL
                          GROUP BY EMPNO
                         )
If you have an index on EMPNO, REC_VER, it would also help.
Reply With Quote
  #3 (permalink)  
Old 08-25-10, 21:18
mail4mz mail4mz is offline
Registered User
 
Join Date: Aug 2010
Posts: 17
Actually, The subquery "group by" itself takes hour to complete.

SELECT EMPNO, MAX(REC_VER)
FROM EMPTBL
GROUP BY EMPNO

Why do you think the "group by" subquery will be faster? Is it because my previous subquery is correlated query?


By the way, I simplized the structure of the table empTbl. There are many more columns in the table. Both EMPNO and REC_VER are part of the primary key and index.

Thanks!
Reply With Quote
  #4 (permalink)  
Old 08-25-10, 22:01
Stealth_DBA Stealth_DBA is offline
Registered User
 
Join Date: May 2009
Posts: 472
With the correlated subquery, the Max is processed for every row in the table.
i.e read row 1 and get the Max value for that row's EMPNO.
read row 2 and get Max value for that row's EMPNO
repeat until end of table.
NOTE: I believe DB2 can shortcut this a little if the EMPNO value doesn't change between rows and can use the already generated max value.

With the non-correlated subquery, the max is processed once.

Generally, I have found the correlated sub-query work better in most cases since it usually can avoid a sort. A non-correlated sub-query with an IN clause will do a sort to remove duplicates (if there isn't a unique index that can be used to avoid this). If one method doesn't work well, I try the other to see if it would work better.

PS Depending on the things you 'simplified' out, I am very surprised to hear it takes an hour to process a col, MAX() where both columns are the primary key. However, you do have approximately 1 million distinct EMPNO values (200 million rows divided by 100 rows per EMPNO).
Reply With Quote
  #5 (permalink)  
Old 08-25-10, 23:33
mandar_57 mandar_57 is offline
Registered User
 
Join Date: Aug 2010
Posts: 3
Yes. Stealth_DBA is correct.

Also as said earlier, if columns mentioned in where clause are part of index then it would help much. Make sure, if possible, to use full index instead of partial index.
Reply With Quote
  #6 (permalink)  
Old 08-26-10, 08:50
wilsonfv wilsonfv is offline
Registered User
 
Join Date: Apr 2009
Posts: 42
take a look at the following case:
if you update two columns in a single update and col1 sits far away from col2, meaning in the table creation script, between col1 and col2 there are many columns. in such case, the update will generate much more log than that if col1 sits just besides column 2.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On