Results 1 to 15 of 15
  1. #1
    Join Date
    Sep 2002
    Location
    Wisconsin
    Posts
    37

    Unanswered: Sequential view of a large DB2 table?

    I need a way to get a sequential view of a large DB2 table. I have a batch job that runs forever because it uses a flat file to do cursor selects from a large DB2 table. Each record on the flat file can select hundreds of rows from the DB2 table. There are thousands of records in the flat file and millions of records in the DB2 table, so the batch job can run longer than 24 hours.... and it's a daily job.

    If I can get a sequential view of this huge DB2 table, I can apply simple sort type matching logic against the flat file and avoid the creation of thousands of cursors. I thought of creating a cursor that selects everything with an ORDER BY clause, but that didn't seem particularly economical.

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Create a table corresponding to the flat file.
    Load the flat file into the new table.
    Then, join the new table and the large DB2 table.

  3. #3
    Join Date
    Sep 2002
    Location
    Wisconsin
    Posts
    37
    That's an interesting idea. But, I still need the flat file and a cursor to select rows from the big DB3 table for daily processing. Creating the join would take forever. I just found out the DB2 table contains 2 billion rows. That adds some unexpected wrinkles to how to handle it.

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Balr14 View Post
    I still need the flat file and a cursor to select rows from the big DB3 table for daily processing.
    No you don't.

    Quote Originally Posted by Balr14 View Post
    Creating the join would take forever. I just found out the DB2 table contains 2 billion rows. That adds some unexpected wrinkles to how to handle it.
    What makes you think that you can perform a join over 2 billion rows faster than the database engine?

  5. #5
    Join Date
    Oct 2010
    Posts
    6

    More information would help

    Balr14,

    Here's something worth looking into
    Does the table (2 billion row table) have indexes that match the cursor query? If not, maybe a combination of db2expln and db2advis would help in determining the performance pain points as well as suggest an index that may help .
    The 24hr+ processing time seems a bit much, depending on what the process is doing.
    Is all the time spent inside DB2 or does the process (non db2 part) account for a significant portion of the time.

    Would appreciate more information on what exactly your process does
    Alex

  6. #6
    Join Date
    Sep 2002
    Location
    Wisconsin
    Posts
    37
    Quote Originally Posted by n_i View Post
    No you don't.


    What makes you think that you can perform a join over 2 billion rows faster than the database engine?
    If I don't have the sequential input file how do I select rows for processing?

    I was merely pointing out that a join of this size is not trivial effort for the database engine.
    Last edited by Balr14; 11-01-10 at 17:49.

  7. #7
    Join Date
    Sep 2002
    Location
    Wisconsin
    Posts
    37
    Quote Originally Posted by Alex JP View Post
    Balr14,

    Here's something worth looking into
    Does the table (2 billion row table) have indexes that match the cursor query? If not, maybe a combination of db2expln and db2advis would help in determining the performance pain points as well as suggest an index that may help .
    The 24hr+ processing time seems a bit much, depending on what the process is doing.
    Is all the time spent inside DB2 or does the process (non db2 part) account for a significant portion of the time.

    Would appreciate more information on what exactly your process does
    Alex

    Selection criteria for satisfying overnight information requests is added to an input flat file. That flat file is passed against a 2 billion row primary table, that may have 1 - 500 rows that match the selection criteria. Each record in the flat file is used to build a cursor. The results of the cursor is used to drive table selects from other tables. The selection criteria is based on contents of 6 columns. The stats show DB2 usage is very high. The granularity of the selection criteria reduces the efficiency of the indexes.

  8. #8
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    As Tonkuma suggested is the reason that global temp tables were created. It was to bring in large flat files to join to other DB2 tables and speed up your processing.
    Dave Nance

  9. #9
    Join Date
    Sep 2002
    Location
    Wisconsin
    Posts
    37
    Quote Originally Posted by dav1mo View Post
    As Tonkuma suggested is the reason that global temp tables were created. It was to bring in large flat files to join to other DB2 tables and speed up your processing.
    Dave Nance
    I must be missing something. How do I access this global temp table sequentially? The join process takes forever. How can I speed up processing by doing this every day? I'm certainly willing to explore this route if I understand it better.
    Last edited by Balr14; 11-01-10 at 18:27.

  10. #10
    Join Date
    Oct 2010
    Posts
    6
    Assuming that the joins now are using TABLESCAN to generate the result sets

    Quote Originally Posted by Balr14 View Post
    ... That flat file is passed against a 2 billion row primary table, that may have 1 - 500 rows that match the selection criteria. Each record in the flat file is used to build a cursor.
    An index on the columns that are part of the selection criteria on the 2 billion row table will definitely help. (since it is really looking for at most 500 rows out of 2 billion)

    Quote Originally Posted by Balr14 View Post
    ... The stats show DB2 usage is very high.
    Again this makes sense if DB2 is doing a table scan for every one of the cursors.

    One more thing you could try after the indexes is to break your flat file into 2 or more parts and run multiple identical processes in parallel.

    And from reading your other posts, your company really ought to think about hiring a DBA

    just my 2c
    Alex

  11. #11
    Join Date
    Sep 2002
    Location
    Wisconsin
    Posts
    37
    Quote Originally Posted by Alex JP View Post
    Assuming that the joins now are using TABLESCAN to generate the result sets



    An index on the columns that are part of the selection criteria on the 2 billion row table will definitely help. (since it is really looking for at most 500 rows out of 2 billion)



    Again this makes sense if DB2 is doing a table scan for every one of the cursors.

    One more thing you could try after the indexes is to break your flat file into 2 or more parts and run multiple identical processes in parallel.

    And from reading your other posts, your company really ought to think about hiring a DBA

    just my 2c
    Alex
    I'm working on getting permission to get a DBA to help out. The problem is this is a new direction for us and I don't know how long it will last. The current projects I have make extensive use of DB2. But, once these are done in another month or two, I might not see anything requiring this level of DB2 interaction for another year.

  12. #12
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Balr14 View Post
    How do I access this global temp table sequentially?
    You don't need to access the temporary table (or any other table, for that matter) sequentially. Database engines are optimized for set operations, such as
    Code:
    merge into
     targettbl tgt
    using 
     temptable tmp
    on
     tgt.id = tmp.id
    when matched and othercriterium = 'YES'
    update 
     set tgt.something = tmp.somethingelse + 1
    as opposed to sequential (loop) processing.

  13. #13
    Join Date
    Sep 2002
    Location
    Wisconsin
    Posts
    37
    Quote Originally Posted by n_i View Post
    You don't need to access the temporary table (or any other table, for that matter) sequentially. Database engines are optimized for set operations, such as
    Code:
    merge into
     targettbl tgt
    using 
     temptable tmp
    on
     tgt.id = tmp.id
    when matched and othercriterium = 'YES'
    update 
     set tgt.something = tmp.somethingelse + 1
    as opposed to sequential (loop) processing.
    So I can perform my other processing on a matched condition? But, I don't really have anything to merge or update in the 2B row table. The flat file is used for selection of data from the 2B row table, to be used for processing other tables and report generation.

  14. #14
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Balr14 View Post
    The flat file is used for selection of data from the 2B row table, to be used for processing other tables
    Code:
    merge into
     othertable tgt
    using 
     (select * from 
      temptable tmp 
      inner join 
      twoblnrowstable twob on tmp.id = twob.id) t
    on
     tgt.id = t.anotherid
    when matched and t.othercriterium = 'YES'
    update 
     set tgt.something = t.somethingelse * 3.1415926
    Sorry, I cannot give you a more specific example as my telepathic gland does not function well in freezing temperatures.

  15. #15
    Join Date
    Sep 2002
    Location
    Wisconsin
    Posts
    37
    I'm not trying to intentionally be vague or misleading. What I am working on is a "proof of concept" demo for a client's database. I have to demonstrate I can speed up their daily processing without touching their DB2 environment. They have been living with this problem for over a year and have tweaked it many times and have tried many methods. It's also written entirely in Spanish and there are few people in the client's IT area that speak English. So, communications are difficult.

    I thank you all for the suggestions and I will keep them in mind for some other projects I have to deal with. You are certainly helping me see a side of DB2 I don't usually deal with. It's quite an education and a real change from the internals. I have a way to handle this problem that gives about a 30% improvement. I just wanted to make sure there wasn't a technique I had overlooked.
    Last edited by Balr14; 11-02-10 at 10:55.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •