Results 1 to 6 of 6
  1. #1
    Join Date
    Oct 2003
    Posts
    4

    Unanswered: Increase Performance of data handling

    Hi,
    we've got about 2GB research data per simulation, whose structure is (long int vox, long int lor). My duty is to program an utility, which looks for identical pairs and counts them. In the end, we should have a sorted table with unique entries of (long int vox, long int lor, short int weight).

    Because of the huge data amount, i was thinking of using databases, such as db2. So i read the data in (with INSERT commands, because the original data is binary), perform data handling (create a view with GROUP BY) and read them finally out.

    But still it takes about 6 days to process one 2GB file.

    Is there a better way, to get the wanted table? What could i optimize?

    I'm quite new to this area, so i'd appreciate any help.

    cu, MelD

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    I am not exactly sure if you are trying to create a separate table of unique pairs, or to just have a query that counts them. You might want to consider count distinct instead on group by, depending on what you are trying to do. Not sure why you need the view.

    With processing like this, you need to make sure you tune the DB2 configuration parameters, particularly the bufferpools and the temporary tablespace. I would create a temporary tablespace that is at least as large as your largest table. You also need to make sure your sort heap size is sufficient. I would look at the Performance Wizard to help you with some of these parameters.

    It would help if you could get an experienced DBA to help you, because it is a bit difficult to provide every piece of information that you need for an efficient system via this forum if you are a novice with DB2.

  3. #3
    Join Date
    Oct 2003
    Posts
    4
    Hi,
    the result should be a separate table. I'm using a view, because it's still actual after adding new simulation data to the referring table. It's command is:
    "SELECT vox, lor, count(*) from table1 GROUP BY vox, lor".

    cu, Melanie

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Are you actually wanting to put the data in a seperate physical table? For example:

    Create new_table
    (vox bigint,
    lor bigint,
    count integer);

    Insert into new_table SELECT vox, lor, count(*) from table1 GROUP BY vox, lor

    When you run a group by on such a large table you will need to have a very large temporary tablespace and plenty of sortheap size.

  5. #5
    Join Date
    Sep 2003
    Location
    canada
    Posts
    230
    Originally posted by Marcus_A
    Are you actually wanting to put the data in a seperate physical table? For example:

    Create new_table
    (vox bigint,
    lor bigint,
    count integer);

    Insert into new_table SELECT vox, lor, count(*) from table1 GROUP BY vox, lor

    When you run a group by on such a large table you will need to have a very large temporary tablespace and plenty of sortheap size.
    Hi Marcus,
    If I use SMS for tablespaces what do you mean <<large temporary tablespaces>> the size is not under my control, you meant we asign large bufferpool to temporary tablespace?

    Thanks

  6. #6
    Join Date
    Oct 2003
    Posts
    4
    Hi,

    I'll read through managing tablespaces and how to set the sortheap size. Thanks for the tip.

    Below are the sql commands i use (within CLI). Everything else is (at the moment) standard configuration.

    in the beginning, i create a new table:
    CREATE TABLE tab1 (INT vox, INT lor)

    than i start inserting the arrays of (vox, lor) pairs from my simulation. This part takes most (~70%) of the processing time.
    INSERT INTO tab1 VALUES (?, ?).

    after all data is inserted, i create the view. It is faster, than creating a new physical table.
    CREATE VIEW tab1_view FROM (SELECT vox, lor, count(*) from tab1 GROUP BY vox, lor)

    finally i start exporting the data to a new binary file.

    db2 is running on a raid, where about 300GB is available.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •