| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

12-09-08, 16:13
|
|
Registered User
|
|
Join Date: Dec 2008
Location: Toronto, Canada
Posts: 4
|
|
|
Advice for running huge updates
|
|
Hello DB2 gang,
I'm no DB2 expert, but have been tasked with figuring out a way to merge 20 massive tables into one. There is one parent table, and about 20 'child tables' joined with a common pk. So, I've created a copy of the initial table, along with some shiny new columns to hold the data from the other tables.
The sql I've written works nearly instantly on 10k records, but takes days on 10 million. Poor performance on a massive dataset is not unexpected, but I was wondering if any experts could take a look at my update statement and hopefully suggest another, more efficient way to update en-masse.
The initial insert into my new table (table1) isn't the issue, it's updating the extra columns after the fact. Here's a sample of an update that takes up to 10 hours.
Code:
UPDATE TABLE1 T1
SET (A, B, C, D, E)
=
(SELECT A,B,C,D,E
FROM TABLE2 T2
WHERE T1.PK = T2.PK)
WHERE EXISTS (SELECT 1 FROM TABLE2 A
WHERE T1.PK = A.PK);
I've tried breaking each update into batches, and had some limited success (at least I stopped running out of log file space!) - but not enough such that you could call the improvement 'dramatic' which is what I'm praying for.
Anyways, pardon my noobishness here, I'm happy to even take any pointers to 'TFM' if someone can tell me where to 'R' it  and thanks in advance for the help!
oh right - system information:
Code:
DB21085I Instance "DB2" uses "64" bits and DB2 code release "SQL08025" with
level identifier "03060106".
Informational tokens are "DB2 v8.1.12.99", "s060429", "WR21370", and FixPak
"12".
Product Identifier = "DB2WSUE"
Jeff
|
Last edited by fmagnet; 12-09-08 at 16:18.
|

12-09-08, 16:27
|
|
Registered User
|
|
Join Date: Dec 2007
Location: Richmond, VA
Posts: 782
|
|
Let's see if I've got this right. They want you to denormalize all these tables into one, so that they can be lazy with the SQL in the programs that read from these 20 or so tables? Then they want to update/insert to these child tables and have those changes reflected in this denormalized table? If that's the case, then I hope you have a big server with lots of CPU to burn. You might want to look at triggers on the child tables and make your updates to this denormalized table when they occur, that way you can update based on the key and save some time on the update process, but now you are adding time on to the transaction that initiates the change.
Dave
|
|

12-09-08, 16:39
|
|
Registered User
|
|
Join Date: Dec 2008
Location: Toronto, Canada
Posts: 4
|
|
|
|
More or less - The intent is to have a table for reporting that can be somewhat stale. Reporting requirements (as does tend to happen) have become so complex that it's no longer..um..sane to join across those 20 tables. So, since they want a report that basically IS all that data denormalized, that's what they're getting - haha!
Anyways, the belief is that once the decade's worth of data is populated (erm, denormalized), that doing an overnight synch on 'yesterday's data only' (or whatever the time window ends up being) won't be as time/resource intensive. The idea of keeping it in synch via triggers is also being considered.
It's the initial population of that table that's giving so much grief.
|
|

12-09-08, 19:34
|
|
:-)
|
|
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
|
|
The way you've stated the problem, a LOAD FROM ... CURSOR should do it quite nicely.
|
|

12-10-08, 06:03
|
|
Registered User
|
|
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
|
|
Maybe that's a stupid question, but have you considered MQTs? They take care of doing the synchnorization/update stuff.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
|
|

12-10-08, 11:18
|
|
Registered User
|
|
Join Date: Dec 2008
Location: Toronto, Canada
Posts: 4
|
|
Nick - Thanks for the suggestion. LOAD FROM CURSOR seems to do inserts only, correct? By this I mean, I can't write an update that would work with this.
It's not a huge deal to rewrite the queries so that they're a series of inserts from a result set, it's just a different approach. I just thought that building the entire record would be more 'resource intensive' than running a series of small updates. Anyways, I'll give it a try and see how it goes now that I've got my head around the syntax.
I used the information at the link below as my resource. If you know of a better one, I'd love to see it.
http://www.informatik.uni-jena.de/db...ert7015-a4.pdf
Knut - Absolutely not a stupid question. I didn't tell everyone the secret ending to this story, since I thought it would direct people away from the issue at hand. Once the synchronization is in place, and some application code rewritten, the 20 source tables will be dropped and we will be using only the giant denormalized table. So since you can't (and obviously wouldn't want to) write to an MQT, I can't use it. Otherwise, an awesome suggestion and something very cool that I didn't know about. I apologize again for implying that it was 'only' a reporting table. Sometimes you don't know just how much information to give in a post.
I read up on MQTs here:
http://www.ibmdatabasemag.com/shared...cleID=12803220
Thank you both for your help.
|
|

12-10-08, 11:31
|
|
:-)
|
|
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
|
|
Quote:
|
Originally Posted by fmagnet
I just thought that building the entire record would be more 'resource intensive' than running a series of small updates.
|
I doubt that, but the proof will be in the pudding, as always.
|
|

12-10-08, 14:08
|
|
Registered User
|
|
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
|
|
Normalization is usually a very good thing because you avoid quite a lot of problems (search for "insert/update/delete anomalies"). I would think that you really do not want to denormalize things. Instead, I would keep the tables normalized and then work with MQTs and the like to get the performance to the level that you need.
Or is your reporting table in a completely different system? If so, you have pretty much what is called a Data Warehouse. In that case, we may want to be talking about some details and potential offerings. ;-)
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
|
|

12-10-08, 15:00
|
|
Registered User
|
|
Join Date: Dec 2008
Posts: 76
|
|
For an initial load:
write a select query that exports the data from all tables in the structure you want it to a file, then use load to populate the table. It should be much more efficient than trying to consolidate it on line with updates.
|
|

12-11-08, 15:45
|
|
Registered User
|
|
Join Date: Dec 2008
Location: Toronto, Canada
Posts: 4
|
|
To finalize the part about my question(s) - Rewriting the queries to select the full row and then using load to insert it via file (or cursor as I ended up using) was orders of magnitude faster. Thanks to all for setting me straight!
Knut: I tried to sell the higher-ups on investing in a data warehouse before we went down this path. I know this solution is not ideal, but sometimes we don't get to make the choices.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|