If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > that's how they do it

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-16-11, 07:39
dr_te_z dr_te_z is offline
Registered User
 
Join Date: Jan 2009
Location: Zoetermeer, Holland
Posts: 555
that's how they do it

Always educational reading TPC-C - Result Details

See how the goeroes at IBM configure their database trying to win such a contest.

(always wanted to use this word in english, glad I do not have to pronounce it )

PECULIAR:
- IBM proudly introduced table-partitioning in V9. They still use the good-old union all view technique themselves, I wonder why
- They have a 1 to 5 ratio (1 bufferpool serves 5 tablespaces). I wonder why
- STMM.... no STMM. I wonder why, is Scott Hayes right afterall or is it the large number of bufferpools?

Are their more things that caught your attention?
Reply With Quote
  #2 (permalink)  
Old 08-16-11, 07:56
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by dr_te_z View Post
(always wanted to use this word in english, glad I do not have to pronounce it )
which word? goeroes?

it's pronounced exactly as spelled
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 08-16-11, 08:02
dr_te_z dr_te_z is offline
Registered User
 
Join Date: Jan 2009
Location: Zoetermeer, Holland
Posts: 555
Quote:
Originally Posted by r937 View Post
which word? goeroes?

it's pronounced exactly as spelled
Peculiar. That's the word.
Reply With Quote
  #4 (permalink)  
Old 08-16-11, 08:12
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Quote:
Originally Posted by dr_te_z View Post
- STMM.... no STMM. I wonder why, is Scott Hayes right afterall or is it the large number of bufferpools?
If you know exactly the workload and have tuned your system to fit that workload, why would you want to use STMM? It adds overhead (even if very small), which you don't want to have in such a situation.

p.s: I don't know the rationale for the other decisions, so I can't comment.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #5 (permalink)  
Old 08-16-11, 13:15
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
Quote:
Originally Posted by dr_te_z View Post
Always educational reading TPC-C - Result Details

See how the goeroes at IBM configure their database trying to win such a contest.

(always wanted to use this word in english, glad I do not have to pronounce it )

PECULIAR:
- IBM proudly introduced table-partitioning in V9. They still use the good-old union all view technique themselves, I wonder why
- They have a 1 to 5 ratio (1 bufferpool serves 5 tablespaces). I wonder why
- STMM.... no STMM. I wonder why, is Scott Hayes right afterall or is it the large number of bufferpools?

Are their more things that caught your attention?
Most of the tables (8 of the 10 tables that make up the TPC-C schema) are horizontally spread accross 100 different tables by IBM based on the PK range, and organized by Key Sequence for maximum performance. This means there are 100 different Customer tables, etc (Cusotmer0, Customer1, Customer2, ... Customer99). Even though there is a UNION ALL view on them, I believe the application logic may have some intellegence built into to determine which table has the data based on the key (but not 100% sure about that). The design used by IBM in this TPC_C benchmark assumes that one knows in advance all the key values in advance (or almost all of them), and they are not signficantly increased over time since all the 100 customer tables (and the 7 others that split into 100 tables) are created and populated in advance with all the key values and ranges (a flaw in the TPC-C benchmark IMO). Using a design like this is probably not practical for most of our applications.

Table partitioning has adminstrative advantages such as being able to detach old partitions, but this data is partitioned by key range (not date) and there is no clean-up of old data required as part of TPC-C.

I can't believe that anyone thought that STMM is actually more efficient than manually tuning by someone who knows what they are doing. STMM is just marketing hype to deal with the criticism that DB2 requires significant DBA resources that are hard to find.

The other interesting thing about this benchmark compared to others previously done by IBM is use of massive number of SSD storage devices for data.

This particular benchmark does not set any total throughput records, but does set a record in cost/transcation when total tpmC is at least 2 million.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #6 (permalink)  
Old 08-17-11, 02:40
dr_te_z dr_te_z is offline
Registered User
 
Join Date: Jan 2009
Location: Zoetermeer, Holland
Posts: 555
Quote:
Originally Posted by Marcus_A View Post
I believe the application logic may have some intellegence built into to determine which table has the data based on the key (but not 100% sure about that).
Right, some kind of "do it yourself" partition elimination.
A normal application would never be coded like that so we (simple mortals) should go on using data-partitioning.

How about the huge number of buffer-pools? That does not strike anybody as odd? Or do you do that all the time: every tablespace it's own dedicated bufferpool?
Reply With Quote
  #7 (permalink)  
Old 08-17-11, 03:25
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
Quote:
Originally Posted by dr_te_z View Post
Right, some kind of "do it yourself" partition elimination.
A normal application would never be coded like that so we (simple mortals) should go on using data-partitioning.

How about the huge number of buffer-pools? That does not strike anybody as odd? Or do you do that all the time: every tablespace it's own dedicated bufferpool?
The bufferpools are related to the "partitioning" of the tables, so that there are 100 customer tables (each with a tablespace), and also 100 tables for WAREHOUSE, DISTRICT, STOCK, CUSTOMER, HISTORY, ORDERS, ORDERLINE, and NEWORDER which is 8 out the 10 entities in the data model).

The issue is not whether they manually partitioned the data (instead of using Table Partitioning), it is that they partitioned the data by PK key range (which does not include date). In a normal database, one does not know all the key ranges in advance to partition the data this way, but there is a flaw in the TPC-C benchmark that allows IBM (and also Oracle has done this) to partition the data manually by range in advance for 8 of the 10 tables in the TPC-C schema. If there were only 10 physical tables in the schema, instead of 802 tables the way IBM did it (8 x 100 + 2 not partitioned), there wouldn't be so many bufferpools.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #8 (permalink)  
Old 08-17-11, 04:57
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
For the buffer pools, you also want to avoid problems if 1 table/partition is more heavily hit. That could cause pages of other tables to be evicted while a separate buffer pool would keep those pages in memory. For things like that, you'd want to have many (smaller) buffer pools.

Basically, such a decision depends on the internal knowledge of the BP's eviction strategy.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #9 (permalink)  
Old 08-17-11, 04:59
dr_te_z dr_te_z is offline
Registered User
 
Join Date: Jan 2009
Location: Zoetermeer, Holland
Posts: 555
Quote:
Originally Posted by Marcus_A View Post
802 tables the way IBM did it (8 x 100 + 2 not partitioned), there wouldn't be so many bufferpools.
I understand.
But I wonder why they did not define 1 big bufferpool shared by all tablespaces because they all have the same characteristics. Now they define a new/fresh bufferpool for each 6 to 8 tablespaces. It seems to me that 10 bufferpools of 100 pages each is more efficient than 1 bufferpool with the size of 1000 pages. That is the impression I get, but I've never heard if that is the case or not.
Reply With Quote
  #10 (permalink)  
Old 08-17-11, 05:30
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
Quote:
Originally Posted by dr_te_z View Post
I understand.
But I wonder why they did not define 1 big bufferpool shared by all tablespaces because they all have the same characteristics. Now they define a new/fresh bufferpool for each 6 to 8 tablespaces. It seems to me that 10 bufferpools of 100 pages each is more efficient than 1 bufferpool with the size of 1000 pages. That is the impression I get, but I've never heard if that is the case or not.
10 bufferpools of 100 pages each is NOT more efficient than 1 bufferpool with the size of 1000 pages. However, in this particular benchmark, all the data is created in advance and access is randomly distributed across all the data at extremely high transaction rates, something that is not likely in a real application.

Whether there are performance issues with really large bufferpools, I have heard that is not a problem with DB2, although it is a problem with Oracle. But I can't say for sure whether multiple bufferpools might perform very slightly better when they exceed a really large size. However, keep in mind that even if that were the case, lots of smaller bufferpools tend to have poorer memory utilization than fewer large bufferpools, which would usually offset any issues with addressing large memory pools. However, as stated above, in the artificial TPC-C benchmark, memory utilization of smaller bufferpools is not an issue as it would be with a normal database/application.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #11 (permalink)  
Old 08-17-11, 08:56
dr_te_z dr_te_z is offline
Registered User
 
Join Date: Jan 2009
Location: Zoetermeer, Holland
Posts: 555
Thank you guys. Bottom line: those benchmark reports are interesting reads, but not make the mistake to treat them as your example on how-to configure your server. Too much ... how shall I call it ... non-agile stuff in there
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On