Results 1 to 8 of 8
  1. #1
    Join Date
    Sep 2006
    Posts
    26

    Unanswered: Data Warehouse Scalability

    I'm doing a graduate school project regarding data warehouses and getting some initial data to compare scalability for Oracle, Teradata, and DB2. It's been fairly easy to get some idea of the data volume limits.

    Oracle (11g)- "hundreds of terabytes"
    Teradata (5650) -- 86 petabytes
    DB2 - "hundreds of terabytes"

    I realize there's more to scalability than how much data the warehouse will hold. It's been more difficult trying to get benchmark data for:

    Workload
    Oracle -- 198,907.5 QphH @ 3000 GB
    - can't find info for the others

    Query Complexity
    Availability
    Data Latency

    Can anyone point to a source of scalability benchmark data that would provide some more information?

    Thanks.

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    I am pretty sure that DB2 DPF (Data Partitioning Feature, but also known as DB2 InfoSphere Warehouse), can store at least as much data as the others. DB2 and Teradata are "share nothing" architectures and will scale in a linear fashion, unlike Oracle.

    You can get some information from the http://www.tpc.org/tpch/default.asp website, looking at the TPC-H benchmarks for data warehousing. Keep in mind that the benchmarks typically use different hardware, and are not always up to date (newer hardware is faster and cheaper), so don't draw any incorrect conclusions about which ones scale the best.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    You could have a look here: TPC-H - Top Ten Performance Results TPC is an organization dedicated to benchmarks, and this page shows you some results for data wareshouses up to 30TB.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by stolze View Post
    ...TPC is an organization dedicated to benchmarks...
    But keep in mind that each hardware vendor is responsible for conducting and publishing the benchmarks (which must be audited). I think that now that Teradata no longer has their own proprietary hardware requirement (as they did in the past when they were owned by NCR) it looks like there are no more Teradata benchmarks.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  5. #5
    Join Date
    Sep 2006
    Posts
    26
    So I see that Teradata and IBM are both share-nothing architectures, while Oracle is Shared-Disk. So, basically you would expect that IBM and Teradata would scale better than Oracle, however Oracle would be expected to provide faster peformance for queries that have to join across multiple partitions. What would the differences be between Teradata and IBM? It seems futile to depend on published benchmarks, so probably a more qualitative approach makes more sense.

  6. #6
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by ken2834 View Post
    So I see that Teradata and IBM are both share-nothing architectures, while Oracle is Shared-Disk. So, basically you would expect that IBM and Teradata would scale better than Oracle, however Oracle would be expected to provide faster peformance for queries that have to join across multiple partitions. What would the differences be between Teradata and IBM? It seems futile to depend on published benchmarks, so probably a more qualitative approach makes more sense.
    There are two ways to deal with cross partition joins in DB2 and Teradata:

    1) Reference tables (Dimensions, etc) and other small/medium size tables can be replicated on each partition using MQT's (Materialized Query Tables) with distribute option. The table is typically created on partition 0 (which does not contain any large partitioned tables) and then DB2 automatically keeps a copy of each table on each partition where the large tables are located (typically stored in bufferpool memory all of the time). The DB2 optimizer is smart enough to automatically look for the local copy of the MQT if it exists and use it to avoid the cross partition join.

    2). Large fact tables that might be joined together are typically designed so that they have one column in common that is part of the primary key, and that one column is made the partitioning key so that when a join takes place, the rows being joined have been hashed to the same partition. This may sound a little strange if you are not familiar with this design technique, but it is not that difficult to do in most circumstances.

    In addition, in cases where inter-partition communication needs to take place, a high speed private network (in addition to the public network) is set up between all the different physical servers to handle this traffic.

    Also, each physical server can typically handle 4-8 partitions quite easily, so not all inter-partition traffic needs to go through TCP/IP via the private network to another physical server. In fact, it is quite feasible in some cases to have all partitions on a single large machine with lots of CPU cores, and lots of memory so there is no issue whatsoever with cross partition joins.

    One of the main considerations in defining a share nothing architecture (aside from what is discussed above) is to have separate disk arrays for each partition, since they typically all operate in parallel with each are accessed together at the exact same time. This would be true regardless of how many different physical servers were configured.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  7. #7
    Join Date
    Sep 2006
    Posts
    26
    On the integration front, what would you say DB2 can do that Teradata and Oracle cannot do? It was somewhat of a struggle for me trying to define "integration" -- seems like it includes data access (basic ETL), data federation (bringing views of disparate systems together), and change data capture (updating data across the enterprise in real-time or near-real-time as changes happen).

    Seems to me, in general, you can get an ETL tool that will allow you to load from any source into any of these EDWs. They all seem to be working toward real time solutions. Is there any glaring difference that would make one stick out? Thanks.

  8. #8
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by ken2834 View Post
    On the integration front, what would you say DB2 can do that Teradata and Oracle cannot do? It was somewhat of a struggle for me trying to define "integration" -- seems like it includes data access (basic ETL), data federation (bringing views of disparate systems together), and change data capture (updating data across the enterprise in real-time or near-real-time as changes happen).

    Seems to me, in general, you can get an ETL tool that will allow you to load from any source into any of these EDWs. They all seem to be working toward real time solutions. Is there any glaring difference that would make one stick out? Thanks.
    Teradata and DB2 DPF can scale linearly to a large number of partitions when multiple servers are needed because they use a share nothing architecture. Oracle cannot.

    In most other respects, you can implement most data warehouse solutions with any of these products (including SQL Server). I haven't worked with Teradata recently, so not sure exactly how it compares with DB2 DPF on all features, but the architecture is similar.

    Equally important, are things such as the cost of the database products, quality of support provided by the vendor, whether you have existing skills in that database product already in house, etc.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •