I have a poor performance SQL as following. The table has about 200M records, each employee have average 100 records. The query lasts about 3 hours. All I want is to update the flag for highest version of each client's record. Any suggestion is welcome!
update empTbl A
( select max(rec_ver)
where empNo = A.empNo)
With the correlated subquery, the Max is processed for every row in the table.
i.e read row 1 and get the Max value for that row's EMPNO.
read row 2 and get Max value for that row's EMPNO
repeat until end of table.
NOTE: I believe DB2 can shortcut this a little if the EMPNO value doesn't change between rows and can use the already generated max value.
With the non-correlated subquery, the max is processed once.
Generally, I have found the correlated sub-query work better in most cases since it usually can avoid a sort. A non-correlated sub-query with an IN clause will do a sort to remove duplicates (if there isn't a unique index that can be used to avoid this). If one method doesn't work well, I try the other to see if it would work better.
PS Depending on the things you 'simplified' out, I am very surprised to hear it takes an hour to process a col, MAX() where both columns are the primary key. However, you do have approximately 1 million distinct EMPNO values (200 million rows divided by 100 rows per EMPNO).
take a look at the following case:
if you update two columns in a single update and col1 sits far away from col2, meaning in the table creation script, between col1 and col2 there are many columns. in such case, the update will generate much more log than that if col1 sits just besides column 2.