| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

10-31-03, 07:46
|
|
Registered User
|
|
Join Date: Oct 2003
Posts: 4
|
|
|
Increase Performance of data handling
|
|
Hi,
we've got about 2GB research data per simulation, whose structure is (long int vox, long int lor). My duty is to program an utility, which looks for identical pairs and counts them. In the end, we should have a sorted table with unique entries of (long int vox, long int lor, short int weight).
Because of the huge data amount, i was thinking of using databases, such as db2. So i read the data in (with INSERT commands, because the original data is binary), perform data handling (create a view with GROUP BY) and read them finally out.
But still it takes about 6 days to process one 2GB file.
Is there a better way, to get the wanted table? What could i optimize?
I'm quite new to this area, so i'd appreciate any help.
cu, MelD
|
|

10-31-03, 08:09
|
|
Registered User
|
|
Join Date: May 2003
Location: USA
Posts: 5,198
|
|
I am not exactly sure if you are trying to create a separate table of unique pairs, or to just have a query that counts them. You might want to consider count distinct instead on group by, depending on what you are trying to do. Not sure why you need the view.
With processing like this, you need to make sure you tune the DB2 configuration parameters, particularly the bufferpools and the temporary tablespace. I would create a temporary tablespace that is at least as large as your largest table. You also need to make sure your sort heap size is sufficient. I would look at the Performance Wizard to help you with some of these parameters.
It would help if you could get an experienced DBA to help you, because it is a bit difficult to provide every piece of information that you need for an efficient system via this forum if you are a novice with DB2.
|
|

10-31-03, 08:58
|
|
Registered User
|
|
Join Date: Oct 2003
Posts: 4
|
|
|
|
Hi,
the result should be a separate table. I'm using a view, because it's still actual after adding new simulation data to the referring table. It's command is:
"SELECT vox, lor, count(*) from table1 GROUP BY vox, lor".
cu, Melanie
|
|

10-31-03, 09:17
|
|
Registered User
|
|
Join Date: May 2003
Location: USA
Posts: 5,198
|
|
Are you actually wanting to put the data in a seperate physical table? For example:
Create new_table
(vox bigint,
lor bigint,
count integer);
Insert into new_table SELECT vox, lor, count(*) from table1 GROUP BY vox, lor
When you run a group by on such a large table you will need to have a very large temporary tablespace and plenty of sortheap size.
|
|

10-31-03, 09:30
|
|
Registered User
|
|
Join Date: Sep 2003
Location: canada
Posts: 230
|
|
Quote:
Originally posted by Marcus_A
Are you actually wanting to put the data in a seperate physical table? For example:
Create new_table
(vox bigint,
lor bigint,
count integer);
Insert into new_table SELECT vox, lor, count(*) from table1 GROUP BY vox, lor
When you run a group by on such a large table you will need to have a very large temporary tablespace and plenty of sortheap size.
|
Hi Marcus,
If I use SMS for tablespaces what do you mean <<large temporary tablespaces>> the size is not under my control, you meant we asign large bufferpool to temporary tablespace?
Thanks
|
|

10-31-03, 09:47
|
|
Registered User
|
|
Join Date: Oct 2003
Posts: 4
|
|
Hi,
I'll read through managing tablespaces and how to set the sortheap size. Thanks for the tip.
Below are the sql commands i use (within CLI). Everything else is (at the moment) standard configuration.
in the beginning, i create a new table:
CREATE TABLE tab1 (INT vox, INT lor)
than i start inserting the arrays of (vox, lor) pairs from my simulation. This part takes most (~70%) of the processing time.
INSERT INTO tab1 VALUES (?, ?).
after all data is inserted, i create the view. It is faster, than creating a new physical table.
CREATE VIEW tab1_view FROM (SELECT vox, lor, count(*) from tab1 GROUP BY vox, lor)
finally i start exporting the data to a new binary file.
db2 is running on a raid, where about 300GB is available.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|