Results 1 to 5 of 5
  1. #1
    Join Date
    Dec 2010
    Posts
    11

    Unanswered: DB2 Purescale. Does it help performance?

    It appears IBM's Purescale claim to fame is:

    1. Unlimited Capacity
    2. Application Transparency
    3. Continuous Availability

    There is no real mention of how it boosts performance?

    For example, if it take me 24 hours to process a particular batch job with millions of records, will a 2 node Purescale cluster accomplish that in 12 hours? Does a 3 node setup accomplish that in 8 hours?, etc? Are there any official stats published anywhere how it increases performance based on how many nodes are added to the cluster?

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    ..., if it take me 24 hours to process a particular batch job with millions of records, ...
    Apart from Purescale, "millions of records" itself is not so big data.

    For example, it took about 10 seconds(46.142000 - 36.989000) to make summary report of a 1000000 rows table on my laptop machine.
    Code:
    ------------------------------ Commands Entered ------------------------------
    VALUES current_timestamp;
    
    SELECT COUNT(*)               number_of_rows
         , MIN(search)            min_search
         , MAX(search)            max_search
         , COUNT(DISTINCT search) cnt_search
         , MIN( LENGTH(desc) )    min_len_desc
         , MAX( LENGTH(desc) )    max_len_desc
     FROM  test_in_list
    ;
    
    VALUES current_timestamp;
    ------------------------------------------------------------------------------
    VALUES current_timestamp
    
    1                         
    --------------------------
    2011-06-24-16.39.36.989000
    
      1 record(s) selected.
    
    
    SELECT COUNT(*)               number_of_rows , MIN(search)            min_search , MAX(search)            max_search , COUNT(DISTINCT search) cnt_search , MIN( LENGTH(desc) )    min_len_desc , MAX( LENGTH(desc) )    max_len_desc FROM  test_in_list
    
    NUMBER_OF_ROWS MIN_SEARCH MAX_SEARCH CNT_SEARCH  MIN_LEN_DESC MAX_LEN_DESC
    -------------- ---------- ---------- ----------- ------------ ------------
           1000000 0000000000 0000099996       32767            0           29
    
      1 record(s) selected.
    
    
    VALUES current_timestamp
    
    1                         
    --------------------------
    2011-06-24-16.39.46.142000
    
      1 record(s) selected.

  3. #3
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    The thing is that an answer to the question depends on oh so many factors: logical and physical database design (i.e. table structure, indexes, data placement, partitioning, ...). But also the actual workload is important.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by jamster View Post

    For example, if it take me 24 hours to process a particular batch job with millions of records, will a 2 node Purescale cluster accomplish that in 12 hours? Does a 3 node setup accomplish that in 8 hours?, etc?
    pureScale benefits environments that are CPU- and memory-bound, as it employs a shared-data architecture, that is, I/O subsystem does not scale up when you add members. In other words, it's good for OLTP systems. 24 hour long batch jobs are typical for data warehouse environments, where InfoSphere Warehouse (i.e. DB2 DPF) is a better option.

  5. #5
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by jamster View Post
    There is no real mention of how it boosts performance?

    For example, if it take me 24 hours to process a particular batch job with millions of records, will a 2 node Purescale cluster accomplish that in 12 hours? Does a 3 node setup accomplish that in 8 hours?, etc? Are there any official stats published anywhere how it increases performance based on how many nodes are added to the cluster?
    PureScale is not a parallel environment (as is DPF), so a single batch job does not perform any faster on a 2 node PureScale cluster than a regular single node database (aside from capacity issues caused by other tasks running at the same time). Each connection is going to run on a single one of the nodes in a PureScale cluster, just as it would in a single node environment. In fact, a single batch job may run a little slower on PureScale (but not much).

    If you had a lot of other batch jobs (or any transactions) running at the same time from different connectons, that might be a different story, since your total workload capacity can be increased with PureScale.
    Last edited by Marcus_A; 06-24-11 at 09:32.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •