| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

06-24-11, 02:47
|
|
Registered User
|
|
Join Date: Dec 2010
Posts: 11
|
|
|
DB2 Purescale. Does it help performance?
|
|
It appears IBM's Purescale claim to fame is:
1. Unlimited Capacity
2. Application Transparency
3. Continuous Availability
There is no real mention of how it boosts performance?
For example, if it take me 24 hours to process a particular batch job with millions of records, will a 2 node Purescale cluster accomplish that in 12 hours? Does a 3 node setup accomplish that in 8 hours?, etc? Are there any official stats published anywhere how it increases performance based on how many nodes are added to the cluster?
|
|

06-24-11, 03:52
|
|
Registered User
|
|
Join Date: Feb 2008
Location: Japan
Posts: 2,193
|
|
Quote:
|
..., if it take me 24 hours to process a particular batch job with millions of records, ...
|
Apart from Purescale, "millions of records" itself is not so big data.
For example, it took about 10 seconds(46.142000 - 36.989000) to make summary report of a 1000000 rows table on my laptop machine.
Code:
------------------------------ Commands Entered ------------------------------
VALUES current_timestamp;
SELECT COUNT(*) number_of_rows
, MIN(search) min_search
, MAX(search) max_search
, COUNT(DISTINCT search) cnt_search
, MIN( LENGTH(desc) ) min_len_desc
, MAX( LENGTH(desc) ) max_len_desc
FROM test_in_list
;
VALUES current_timestamp;
------------------------------------------------------------------------------
VALUES current_timestamp
1
--------------------------
2011-06-24-16.39.36.989000
1 record(s) selected.
SELECT COUNT(*) number_of_rows , MIN(search) min_search , MAX(search) max_search , COUNT(DISTINCT search) cnt_search , MIN( LENGTH(desc) ) min_len_desc , MAX( LENGTH(desc) ) max_len_desc FROM test_in_list
NUMBER_OF_ROWS MIN_SEARCH MAX_SEARCH CNT_SEARCH MIN_LEN_DESC MAX_LEN_DESC
-------------- ---------- ---------- ----------- ------------ ------------
1000000 0000000000 0000099996 32767 0 29
1 record(s) selected.
VALUES current_timestamp
1
--------------------------
2011-06-24-16.39.46.142000
1 record(s) selected.
|
|

06-24-11, 06:51
|
|
Registered User
|
|
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
|
|
|
|
The thing is that an answer to the question depends on oh so many factors: logical and physical database design (i.e. table structure, indexes, data placement, partitioning, ...). But also the actual workload is important.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
|
|

06-24-11, 07:22
|
|
:-)
|
|
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
|
|
Quote:
Originally Posted by jamster
For example, if it take me 24 hours to process a particular batch job with millions of records, will a 2 node Purescale cluster accomplish that in 12 hours? Does a 3 node setup accomplish that in 8 hours?, etc?
|
pureScale benefits environments that are CPU- and memory-bound, as it employs a shared-data architecture, that is, I/O subsystem does not scale up when you add members. In other words, it's good for OLTP systems. 24 hour long batch jobs are typical for data warehouse environments, where InfoSphere Warehouse (i.e. DB2 DPF) is a better option.
|
|

06-24-11, 08:28
|
|
Registered User
|
|
Join Date: May 2003
Location: USA
Posts: 5,198
|
|
Quote:
Originally Posted by jamster
There is no real mention of how it boosts performance?
For example, if it take me 24 hours to process a particular batch job with millions of records, will a 2 node Purescale cluster accomplish that in 12 hours? Does a 3 node setup accomplish that in 8 hours?, etc? Are there any official stats published anywhere how it increases performance based on how many nodes are added to the cluster?
|
PureScale is not a parallel environment (as is DPF), so a single batch job does not perform any faster on a 2 node PureScale cluster than a regular single node database (aside from capacity issues caused by other tasks running at the same time). Each connection is going to run on a single one of the nodes in a PureScale cluster, just as it would in a single node environment. In fact, a single batch job may run a little slower on PureScale (but not much).
If you had a lot of other batch jobs (or any transactions) running at the same time from different connectons, that might be a different story, since your total workload capacity can be increased with PureScale.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
|
Last edited by Marcus_A; 06-24-11 at 08:32.
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|