If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > DB2 Purescale. Does it help performance?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-24-11, 02:47
jamster jamster is offline
Registered User
 
Join Date: Dec 2010
Posts: 11
DB2 Purescale. Does it help performance?

It appears IBM's Purescale claim to fame is:

1. Unlimited Capacity
2. Application Transparency
3. Continuous Availability

There is no real mention of how it boosts performance?

For example, if it take me 24 hours to process a particular batch job with millions of records, will a 2 node Purescale cluster accomplish that in 12 hours? Does a 3 node setup accomplish that in 8 hours?, etc? Are there any official stats published anywhere how it increases performance based on how many nodes are added to the cluster?
Reply With Quote
  #2 (permalink)  
Old 06-24-11, 03:52
tonkuma tonkuma is online now
Registered User
 
Join Date: Feb 2008
Location: Japan
Posts: 2,193
Quote:
..., if it take me 24 hours to process a particular batch job with millions of records, ...
Apart from Purescale, "millions of records" itself is not so big data.

For example, it took about 10 seconds(46.142000 - 36.989000) to make summary report of a 1000000 rows table on my laptop machine.
Code:
------------------------------ Commands Entered ------------------------------
VALUES current_timestamp;

SELECT COUNT(*)               number_of_rows
     , MIN(search)            min_search
     , MAX(search)            max_search
     , COUNT(DISTINCT search) cnt_search
     , MIN( LENGTH(desc) )    min_len_desc
     , MAX( LENGTH(desc) )    max_len_desc
 FROM  test_in_list
;

VALUES current_timestamp;
------------------------------------------------------------------------------
VALUES current_timestamp

1                         
--------------------------
2011-06-24-16.39.36.989000

  1 record(s) selected.


SELECT COUNT(*)               number_of_rows , MIN(search)            min_search , MAX(search)            max_search , COUNT(DISTINCT search) cnt_search , MIN( LENGTH(desc) )    min_len_desc , MAX( LENGTH(desc) )    max_len_desc FROM  test_in_list

NUMBER_OF_ROWS MIN_SEARCH MAX_SEARCH CNT_SEARCH  MIN_LEN_DESC MAX_LEN_DESC
-------------- ---------- ---------- ----------- ------------ ------------
       1000000 0000000000 0000099996       32767            0           29

  1 record(s) selected.


VALUES current_timestamp

1                         
--------------------------
2011-06-24-16.39.46.142000

  1 record(s) selected.
Reply With Quote
  #3 (permalink)  
Old 06-24-11, 06:51
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
The thing is that an answer to the question depends on oh so many factors: logical and physical database design (i.e. table structure, indexes, data placement, partitioning, ...). But also the actual workload is important.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #4 (permalink)  
Old 06-24-11, 07:22
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by jamster View Post

For example, if it take me 24 hours to process a particular batch job with millions of records, will a 2 node Purescale cluster accomplish that in 12 hours? Does a 3 node setup accomplish that in 8 hours?, etc?
pureScale benefits environments that are CPU- and memory-bound, as it employs a shared-data architecture, that is, I/O subsystem does not scale up when you add members. In other words, it's good for OLTP systems. 24 hour long batch jobs are typical for data warehouse environments, where InfoSphere Warehouse (i.e. DB2 DPF) is a better option.
Reply With Quote
  #5 (permalink)  
Old 06-24-11, 08:28
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,198
Quote:
Originally Posted by jamster View Post
There is no real mention of how it boosts performance?

For example, if it take me 24 hours to process a particular batch job with millions of records, will a 2 node Purescale cluster accomplish that in 12 hours? Does a 3 node setup accomplish that in 8 hours?, etc? Are there any official stats published anywhere how it increases performance based on how many nodes are added to the cluster?
PureScale is not a parallel environment (as is DPF), so a single batch job does not perform any faster on a 2 node PureScale cluster than a regular single node database (aside from capacity issues caused by other tasks running at the same time). Each connection is going to run on a single one of the nodes in a PureScale cluster, just as it would in a single node environment. In fact, a single batch job may run a little slower on PureScale (but not much).

If you had a lot of other batch jobs (or any transactions) running at the same time from different connectons, that might be a different story, since your total workload capacity can be increased with PureScale.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390

Last edited by Marcus_A; 06-24-11 at 08:32.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On