Results 1 to 11 of 11
  1. #1
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746

    Unanswered: that's how they do it

    Always educational reading TPC-C - Result Details

    See how the goeroes at IBM configure their database trying to win such a contest.

    (always wanted to use this word in english, glad I do not have to pronounce it )

    PECULIAR:
    - IBM proudly introduced table-partitioning in V9. They still use the good-old union all view technique themselves, I wonder why
    - They have a 1 to 5 ratio (1 bufferpool serves 5 tablespaces). I wonder why
    - STMM.... no STMM. I wonder why, is Scott Hayes right afterall or is it the large number of bufferpools?

    Are their more things that caught your attention?

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by dr_te_z View Post
    (always wanted to use this word in english, glad I do not have to pronounce it )
    which word? goeroes?

    it's pronounced exactly as spelled
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    Quote Originally Posted by r937 View Post
    which word? goeroes?

    it's pronounced exactly as spelled
    Peculiar. That's the word.

  4. #4
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by dr_te_z View Post
    - STMM.... no STMM. I wonder why, is Scott Hayes right afterall or is it the large number of bufferpools?
    If you know exactly the workload and have tuned your system to fit that workload, why would you want to use STMM? It adds overhead (even if very small), which you don't want to have in such a situation.

    p.s: I don't know the rationale for the other decisions, so I can't comment.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  5. #5
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by dr_te_z View Post
    Always educational reading TPC-C - Result Details

    See how the goeroes at IBM configure their database trying to win such a contest.

    (always wanted to use this word in english, glad I do not have to pronounce it )

    PECULIAR:
    - IBM proudly introduced table-partitioning in V9. They still use the good-old union all view technique themselves, I wonder why
    - They have a 1 to 5 ratio (1 bufferpool serves 5 tablespaces). I wonder why
    - STMM.... no STMM. I wonder why, is Scott Hayes right afterall or is it the large number of bufferpools?

    Are their more things that caught your attention?
    Most of the tables (8 of the 10 tables that make up the TPC-C schema) are horizontally spread accross 100 different tables by IBM based on the PK range, and organized by Key Sequence for maximum performance. This means there are 100 different Customer tables, etc (Cusotmer0, Customer1, Customer2, ... Customer99). Even though there is a UNION ALL view on them, I believe the application logic may have some intellegence built into to determine which table has the data based on the key (but not 100% sure about that). The design used by IBM in this TPC_C benchmark assumes that one knows in advance all the key values in advance (or almost all of them), and they are not signficantly increased over time since all the 100 customer tables (and the 7 others that split into 100 tables) are created and populated in advance with all the key values and ranges (a flaw in the TPC-C benchmark IMO). Using a design like this is probably not practical for most of our applications.

    Table partitioning has adminstrative advantages such as being able to detach old partitions, but this data is partitioned by key range (not date) and there is no clean-up of old data required as part of TPC-C.

    I can't believe that anyone thought that STMM is actually more efficient than manually tuning by someone who knows what they are doing. STMM is just marketing hype to deal with the criticism that DB2 requires significant DBA resources that are hard to find.

    The other interesting thing about this benchmark compared to others previously done by IBM is use of massive number of SSD storage devices for data.

    This particular benchmark does not set any total throughput records, but does set a record in cost/transcation when total tpmC is at least 2 million.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  6. #6
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    Quote Originally Posted by Marcus_A View Post
    I believe the application logic may have some intellegence built into to determine which table has the data based on the key (but not 100% sure about that).
    Right, some kind of "do it yourself" partition elimination.
    A normal application would never be coded like that so we (simple mortals) should go on using data-partitioning.

    How about the huge number of buffer-pools? That does not strike anybody as odd? Or do you do that all the time: every tablespace it's own dedicated bufferpool?

  7. #7
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by dr_te_z View Post
    Right, some kind of "do it yourself" partition elimination.
    A normal application would never be coded like that so we (simple mortals) should go on using data-partitioning.

    How about the huge number of buffer-pools? That does not strike anybody as odd? Or do you do that all the time: every tablespace it's own dedicated bufferpool?
    The bufferpools are related to the "partitioning" of the tables, so that there are 100 customer tables (each with a tablespace), and also 100 tables for WAREHOUSE, DISTRICT, STOCK, CUSTOMER, HISTORY, ORDERS, ORDERLINE, and NEWORDER which is 8 out the 10 entities in the data model).

    The issue is not whether they manually partitioned the data (instead of using Table Partitioning), it is that they partitioned the data by PK key range (which does not include date). In a normal database, one does not know all the key ranges in advance to partition the data this way, but there is a flaw in the TPC-C benchmark that allows IBM (and also Oracle has done this) to partition the data manually by range in advance for 8 of the 10 tables in the TPC-C schema. If there were only 10 physical tables in the schema, instead of 802 tables the way IBM did it (8 x 100 + 2 not partitioned), there wouldn't be so many bufferpools.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  8. #8
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    For the buffer pools, you also want to avoid problems if 1 table/partition is more heavily hit. That could cause pages of other tables to be evicted while a separate buffer pool would keep those pages in memory. For things like that, you'd want to have many (smaller) buffer pools.

    Basically, such a decision depends on the internal knowledge of the BP's eviction strategy.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  9. #9
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    Quote Originally Posted by Marcus_A View Post
    802 tables the way IBM did it (8 x 100 + 2 not partitioned), there wouldn't be so many bufferpools.
    I understand.
    But I wonder why they did not define 1 big bufferpool shared by all tablespaces because they all have the same characteristics. Now they define a new/fresh bufferpool for each 6 to 8 tablespaces. It seems to me that 10 bufferpools of 100 pages each is more efficient than 1 bufferpool with the size of 1000 pages. That is the impression I get, but I've never heard if that is the case or not.

  10. #10
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by dr_te_z View Post
    I understand.
    But I wonder why they did not define 1 big bufferpool shared by all tablespaces because they all have the same characteristics. Now they define a new/fresh bufferpool for each 6 to 8 tablespaces. It seems to me that 10 bufferpools of 100 pages each is more efficient than 1 bufferpool with the size of 1000 pages. That is the impression I get, but I've never heard if that is the case or not.
    10 bufferpools of 100 pages each is NOT more efficient than 1 bufferpool with the size of 1000 pages. However, in this particular benchmark, all the data is created in advance and access is randomly distributed across all the data at extremely high transaction rates, something that is not likely in a real application.

    Whether there are performance issues with really large bufferpools, I have heard that is not a problem with DB2, although it is a problem with Oracle. But I can't say for sure whether multiple bufferpools might perform very slightly better when they exceed a really large size. However, keep in mind that even if that were the case, lots of smaller bufferpools tend to have poorer memory utilization than fewer large bufferpools, which would usually offset any issues with addressing large memory pools. However, as stated above, in the artificial TPC-C benchmark, memory utilization of smaller bufferpools is not an issue as it would be with a normal database/application.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  11. #11
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    Thank you guys. Bottom line: those benchmark reports are interesting reads, but not make the mistake to treat them as your example on how-to configure your server. Too much ... how shall I call it ... non-agile stuff in there

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •