Results 1 to 8 of 8
  1. #1
    Join Date
    Sep 2011
    Posts
    5

    Unanswered: Questions about DPF

    I have a few questions about the DPF feature. Is it correct to think that the DPF feature is responsible for partitioning the actual data in the database across nodes, rather than providing multiple instances on different servers to access the database the way that Oracle's RAC feature does?

    If that's the case, does it make any sense to "connect" to different partitions ? What exactly does the set client connect_dbpartitionnum 0 / 1 / 2 do exactly, if it is merely that the data is being partitioned across nodes?

    Finally, does one connect to partitions as you connect to Oracle RAC instances, or do you merely connect to the database and this partitioning stuff is handled completely internally ?

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by theowne View Post
    Is it correct to think that the DPF feature is responsible for partitioning the actual data in the database across nodes, rather than providing multiple instances on different servers to access the database the way that Oracle's RAC feature does?
    Yes.

    Quote Originally Posted by theowne View Post
    does it make any sense to "connect" to different partitions ? What exactly does the set client connect_dbpartitionnum 0 / 1 / 2 do exactly, if it is merely that the data is being partitioned across nodes?
    It might make sense from the load balancing point of view, although it is uncommon. Also, if you are only interested in the data you know resides on a given partition, connecting to that partition will save you some network traffic.


    Quote Originally Posted by theowne View Post
    Finally, does one connect to partitions as you connect to Oracle RAC instances, or do you merely connect to the database and this partitioning stuff is handled completely internally ?
    You connect to the database.

  3. #3
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by theowne View Post
    I have a few questions about the DPF feature. Is it correct to think that the DPF feature is responsible for partitioning the actual data in the database across nodes, rather than providing multiple instances on different servers to access the database the way that Oracle's RAC feature does?
    There is no Oracle equivalent for DPF, which physically partitions a single table across multiple nodes and (like Teradata) is a shared nothing environment (not shared disk like Oracle RAC). It is primarily designed for data warehouses and has been renamed to be called DB2 InfoSphere Warehouse (even though it is really just DB2 ESE with the DPF license added).

    If that's the case, does it make any sense to "connect" to different partitions ? What exactly does the set client connect_dbpartitionnum 0 / 1 / 2 do exactly, if it is merely that the data is being partitioned across nodes?
    You could connect to a particular node, but for a data warehouse usually one just connects to node 0, which is the catalog owning node. If you were using DPF for OLTP and were only getting a small number of rows returned that you knew where all on the same node, and knew which partition/node that would be on, you could connect directly to that node.

    Finally, does one connect to partitions as you connect to Oracle RAC instances, or do you merely connect to the database and this partitioning stuff is handled completely internally ?
    With DPF you just connect to the database, and DB2 initiates parallel processes to get the answer from each partition/node and consolidates the results into a single answer to be returned to the client, so it is completely transparent to the application. The whole idea of DPF is to parallelize the processing of a single SQL statement.

    DB2 PureScale is the equitant to Oracle RAC, with multiple nodes using a shared disk architecture.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  4. #4
    Join Date
    Sep 2011
    Posts
    5
    Thanks for your replies, you two. Very helpful. From what you two are saying, I understand that DPF is a way to physically partition table data across nodes. I take it that connecting to one partition does not restrict you to accessing data stored on this partition, only that it can benefit performance if most of the current data resides there.

    I am currently working with XA transactions to DB2 databases, and I am noticing that the DPF system has trouble handling tightly coupled transactions (i.e., multiple transaction branches share the same locks to prevent deadlocks within a transaction). The system seems to rollback its branch and generate an exception to the application. Do you have any ideas as to what sort of architectural quirks might cause a DPF database to have problems with sharing locks between connections, while a non-partitioned database has no issue?

  5. #5
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by theowne View Post
    Thanks for your replies, you two. Very helpful. From what you two are saying, I understand that DPF is a way to physically partition table data across nodes. I take it that connecting to one partition does not restrict you to accessing data stored on this partition, only that it can benefit performance if most of the current data resides there.

    I am currently working with XA transactions to DB2 databases, and I am noticing that the DPF system has trouble handling tightly coupled transactions (i.e., multiple transaction branches share the same locks to prevent deadlocks within a transaction). The system seems to rollback its branch and generate an exception to the application. Do you have any ideas as to what sort of architectural quirks might cause a DPF database to have problems with sharing locks between connections, while a non-partitioned database has no issue?
    Hard to say without more information, but DPF (now called InfoSphere Data Warehouse) is not designed for OLTP. It is for data warehouses were large table scans are common for read only queries, which is why individual queries are parallelized accross the nodes. I imagine that XA connections to DPF are a bit unusual for that reason.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  6. #6
    Join Date
    Sep 2011
    Posts
    5
    Thanks, Marcus. Is it possible to specify which partition to connect to through JDBC?

  7. #7
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by theowne View Post
    Thanks, Marcus. Is it possible to specify which partition to connect to through JDBC?
    On modern SMP systems with multiple CPU's per server and multiple cores per CPU, there are typically multiple DPF partitions per physical node, even if there are multiple physical nodes. For example, you might have a DPF system with 16 logical nodes (partitions), with 4 physical servers, so that would be 4 logical nodes (partitions) per server. This is done becasue normally one would have 1 or 2 CPU cores per partition.

    When you connect to a DPF server using the host name, the coordinator logical node (partition) for that physical node handles the connection for all the logical nodes on that server. It will be the first partition (lowest partition number) on that server.

    So if you have multiple physical servers, you can connect to the one you want by using the host name, but not connecting directly to the partition. Because communication between partitions on the same server is much faster than communication between different servers via a network, this is not a problem, especially in a data warehouse envionment.

    If you are trying to do OLTP on DPF there "may" be a way to specify the partition you want to connect to using an enhancement IBM made specifically for TPC-C benchmark test on TPC - Homepage. You can look at the full report for the latest clustered solution using IBM/DB2 and see how that works, although as I said before, DPF is really designed for data warehouses.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  8. #8
    Join Date
    Aug 2008
    Location
    Toronto, Canada
    Posts
    2,367
    Quote Originally Posted by theowne View Post
    Thanks, Marcus. Is it possible to specify which partition to connect to through JDBC?
    Not sure if this helps or not... setting some parameter in db2cli.ini for type-2 JDBC: IBM Specifying the database partition server in a partitioned database environment with multiple logical partitions on one physical machine - United States

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •