Results 1 to 10 of 10
  1. #1
    Join Date
    Dec 2008
    Location
    Toronto, Canada
    Posts
    4

    Unanswered: Advice for running huge updates

    Hello DB2 gang,

    I'm no DB2 expert, but have been tasked with figuring out a way to merge 20 massive tables into one. There is one parent table, and about 20 'child tables' joined with a common pk. So, I've created a copy of the initial table, along with some shiny new columns to hold the data from the other tables.

    The sql I've written works nearly instantly on 10k records, but takes days on 10 million. Poor performance on a massive dataset is not unexpected, but I was wondering if any experts could take a look at my update statement and hopefully suggest another, more efficient way to update en-masse.

    The initial insert into my new table (table1) isn't the issue, it's updating the extra columns after the fact. Here's a sample of an update that takes up to 10 hours.
    Code:
    UPDATE TABLE1 T1
    
    	SET (A,	B, C, D, E)
      = 
    	(SELECT A,B,C,D,E
    
    	 FROM TABLE2 T2
    
    	 WHERE T1.PK = T2.PK)
    
    	 WHERE EXISTS (SELECT 1 FROM TABLE2 A 
    	 WHERE T1.PK = A.PK);
    I've tried breaking each update into batches, and had some limited success (at least I stopped running out of log file space!) - but not enough such that you could call the improvement 'dramatic' which is what I'm praying for.

    Anyways, pardon my noobishness here, I'm happy to even take any pointers to 'TFM' if someone can tell me where to 'R' it and thanks in advance for the help!

    oh right - system information:
    Code:
    DB21085I  Instance "DB2" uses "64" bits and DB2 code release "SQL08025" with
    level identifier "03060106".
    Informational tokens are "DB2 v8.1.12.99", "s060429", "WR21370", and FixPak
    "12".
    Product Identifier                      = "DB2WSUE"
    Jeff
    Last edited by fmagnet; 12-09-08 at 17:18.

  2. #2
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    Let's see if I've got this right. They want you to denormalize all these tables into one, so that they can be lazy with the SQL in the programs that read from these 20 or so tables? Then they want to update/insert to these child tables and have those changes reflected in this denormalized table? If that's the case, then I hope you have a big server with lots of CPU to burn. You might want to look at triggers on the child tables and make your updates to this denormalized table when they occur, that way you can update based on the key and save some time on the update process, but now you are adding time on to the transaction that initiates the change.

    Dave

  3. #3
    Join Date
    Dec 2008
    Location
    Toronto, Canada
    Posts
    4
    More or less - The intent is to have a table for reporting that can be somewhat stale. Reporting requirements (as does tend to happen) have become so complex that it's no longer..um..sane to join across those 20 tables. So, since they want a report that basically IS all that data denormalized, that's what they're getting - haha!

    Anyways, the belief is that once the decade's worth of data is populated (erm, denormalized), that doing an overnight synch on 'yesterday's data only' (or whatever the time window ends up being) won't be as time/resource intensive. The idea of keeping it in synch via triggers is also being considered.

    It's the initial population of that table that's giving so much grief.

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    The way you've stated the problem, a LOAD FROM ... CURSOR should do it quite nicely.
    ---
    "It does not work" is not a valid problem statement.

  5. #5
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Maybe that's a stupid question, but have you considered MQTs? They take care of doing the synchnorization/update stuff.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  6. #6
    Join Date
    Dec 2008
    Location
    Toronto, Canada
    Posts
    4
    Nick - Thanks for the suggestion. LOAD FROM CURSOR seems to do inserts only, correct? By this I mean, I can't write an update that would work with this.

    It's not a huge deal to rewrite the queries so that they're a series of inserts from a result set, it's just a different approach. I just thought that building the entire record would be more 'resource intensive' than running a series of small updates. Anyways, I'll give it a try and see how it goes now that I've got my head around the syntax.

    I used the information at the link below as my resource. If you know of a better one, I'd love to see it.

    http://www.informatik.uni-jena.de/db...ert7015-a4.pdf

    Knut - Absolutely not a stupid question. I didn't tell everyone the secret ending to this story, since I thought it would direct people away from the issue at hand. Once the synchronization is in place, and some application code rewritten, the 20 source tables will be dropped and we will be using only the giant denormalized table. So since you can't (and obviously wouldn't want to) write to an MQT, I can't use it. Otherwise, an awesome suggestion and something very cool that I didn't know about. I apologize again for implying that it was 'only' a reporting table. Sometimes you don't know just how much information to give in a post.

    I read up on MQTs here:

    http://www.ibmdatabasemag.com/shared...cleID=12803220

    Thank you both for your help.

  7. #7
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by fmagnet
    I just thought that building the entire record would be more 'resource intensive' than running a series of small updates.
    I doubt that, but the proof will be in the pudding, as always.
    ---
    "It does not work" is not a valid problem statement.

  8. #8
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Normalization is usually a very good thing because you avoid quite a lot of problems (search for "insert/update/delete anomalies"). I would think that you really do not want to denormalize things. Instead, I would keep the tables normalized and then work with MQTs and the like to get the performance to the level that you need.

    Or is your reporting table in a completely different system? If so, you have pretty much what is called a Data Warehouse. In that case, we may want to be talking about some details and potential offerings. ;-)
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  9. #9
    Join Date
    Dec 2008
    Posts
    76
    For an initial load:
    write a select query that exports the data from all tables in the structure you want it to a file, then use load to populate the table. It should be much more efficient than trying to consolidate it on line with updates.

  10. #10
    Join Date
    Dec 2008
    Location
    Toronto, Canada
    Posts
    4
    To finalize the part about my question(s) - Rewriting the queries to select the full row and then using load to insert it via file (or cursor as I ended up using) was orders of magnitude faster. Thanks to all for setting me straight!

    Knut: I tried to sell the higher-ups on investing in a data warehouse before we went down this path. I know this solution is not ideal, but sometimes we don't get to make the choices.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •