If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Questions about DPF

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-26-11, 16:57
theowne theowne is offline
Registered User
 
Join Date: Sep 2011
Posts: 5
Questions about DPF

I have a few questions about the DPF feature. Is it correct to think that the DPF feature is responsible for partitioning the actual data in the database across nodes, rather than providing multiple instances on different servers to access the database the way that Oracle's RAC feature does?

If that's the case, does it make any sense to "connect" to different partitions ? What exactly does the set client connect_dbpartitionnum 0 / 1 / 2 do exactly, if it is merely that the data is being partitioned across nodes?

Finally, does one connect to partitions as you connect to Oracle RAC instances, or do you merely connect to the database and this partitioning stuff is handled completely internally ?
Reply With Quote
  #2 (permalink)  
Old 10-26-11, 18:41
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by theowne View Post
Is it correct to think that the DPF feature is responsible for partitioning the actual data in the database across nodes, rather than providing multiple instances on different servers to access the database the way that Oracle's RAC feature does?
Yes.

Quote:
Originally Posted by theowne View Post
does it make any sense to "connect" to different partitions ? What exactly does the set client connect_dbpartitionnum 0 / 1 / 2 do exactly, if it is merely that the data is being partitioned across nodes?
It might make sense from the load balancing point of view, although it is uncommon. Also, if you are only interested in the data you know resides on a given partition, connecting to that partition will save you some network traffic.


Quote:
Originally Posted by theowne View Post
Finally, does one connect to partitions as you connect to Oracle RAC instances, or do you merely connect to the database and this partitioning stuff is handled completely internally ?
You connect to the database.
Reply With Quote
  #3 (permalink)  
Old 10-26-11, 18:44
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,198
Quote:
Originally Posted by theowne View Post
I have a few questions about the DPF feature. Is it correct to think that the DPF feature is responsible for partitioning the actual data in the database across nodes, rather than providing multiple instances on different servers to access the database the way that Oracle's RAC feature does?
There is no Oracle equivalent for DPF, which physically partitions a single table across multiple nodes and (like Teradata) is a shared nothing environment (not shared disk like Oracle RAC). It is primarily designed for data warehouses and has been renamed to be called DB2 InfoSphere Warehouse (even though it is really just DB2 ESE with the DPF license added).

Quote:
If that's the case, does it make any sense to "connect" to different partitions ? What exactly does the set client connect_dbpartitionnum 0 / 1 / 2 do exactly, if it is merely that the data is being partitioned across nodes?
You could connect to a particular node, but for a data warehouse usually one just connects to node 0, which is the catalog owning node. If you were using DPF for OLTP and were only getting a small number of rows returned that you knew where all on the same node, and knew which partition/node that would be on, you could connect directly to that node.

Quote:
Finally, does one connect to partitions as you connect to Oracle RAC instances, or do you merely connect to the database and this partitioning stuff is handled completely internally ?
With DPF you just connect to the database, and DB2 initiates parallel processes to get the answer from each partition/node and consolidates the results into a single answer to be returned to the client, so it is completely transparent to the application. The whole idea of DPF is to parallelize the processing of a single SQL statement.

DB2 PureScale is the equitant to Oracle RAC, with multiple nodes using a shared disk architecture.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #4 (permalink)  
Old 10-26-11, 19:27
theowne theowne is offline
Registered User
 
Join Date: Sep 2011
Posts: 5
Thanks for your replies, you two. Very helpful. From what you two are saying, I understand that DPF is a way to physically partition table data across nodes. I take it that connecting to one partition does not restrict you to accessing data stored on this partition, only that it can benefit performance if most of the current data resides there.

I am currently working with XA transactions to DB2 databases, and I am noticing that the DPF system has trouble handling tightly coupled transactions (i.e., multiple transaction branches share the same locks to prevent deadlocks within a transaction). The system seems to rollback its branch and generate an exception to the application. Do you have any ideas as to what sort of architectural quirks might cause a DPF database to have problems with sharing locks between connections, while a non-partitioned database has no issue?
Reply With Quote
  #5 (permalink)  
Old 10-26-11, 20:38
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,198
Quote:
Originally Posted by theowne View Post
Thanks for your replies, you two. Very helpful. From what you two are saying, I understand that DPF is a way to physically partition table data across nodes. I take it that connecting to one partition does not restrict you to accessing data stored on this partition, only that it can benefit performance if most of the current data resides there.

I am currently working with XA transactions to DB2 databases, and I am noticing that the DPF system has trouble handling tightly coupled transactions (i.e., multiple transaction branches share the same locks to prevent deadlocks within a transaction). The system seems to rollback its branch and generate an exception to the application. Do you have any ideas as to what sort of architectural quirks might cause a DPF database to have problems with sharing locks between connections, while a non-partitioned database has no issue?
Hard to say without more information, but DPF (now called InfoSphere Data Warehouse) is not designed for OLTP. It is for data warehouses were large table scans are common for read only queries, which is why individual queries are parallelized accross the nodes. I imagine that XA connections to DPF are a bit unusual for that reason.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #6 (permalink)  
Old 10-31-11, 11:11
theowne theowne is offline
Registered User
 
Join Date: Sep 2011
Posts: 5
Thanks, Marcus. Is it possible to specify which partition to connect to through JDBC?
Reply With Quote
  #7 (permalink)  
Old 10-31-11, 13:43
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,198
Quote:
Originally Posted by theowne View Post
Thanks, Marcus. Is it possible to specify which partition to connect to through JDBC?
On modern SMP systems with multiple CPU's per server and multiple cores per CPU, there are typically multiple DPF partitions per physical node, even if there are multiple physical nodes. For example, you might have a DPF system with 16 logical nodes (partitions), with 4 physical servers, so that would be 4 logical nodes (partitions) per server. This is done becasue normally one would have 1 or 2 CPU cores per partition.

When you connect to a DPF server using the host name, the coordinator logical node (partition) for that physical node handles the connection for all the logical nodes on that server. It will be the first partition (lowest partition number) on that server.

So if you have multiple physical servers, you can connect to the one you want by using the host name, but not connecting directly to the partition. Because communication between partitions on the same server is much faster than communication between different servers via a network, this is not a problem, especially in a data warehouse envionment.

If you are trying to do OLTP on DPF there "may" be a way to specify the partition you want to connect to using an enhancement IBM made specifically for TPC-C benchmark test on TPC - Homepage. You can look at the full report for the latest clustered solution using IBM/DB2 and see how that works, although as I said before, DPF is really designed for data warehouses.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #8 (permalink)  
Old 10-31-11, 16:34
db2girl db2girl is offline
∞∞∞∞∞∞
 
Join Date: Aug 2008
Location: Toronto, Canada
Posts: 1,816
Quote:
Originally Posted by theowne View Post
Thanks, Marcus. Is it possible to specify which partition to connect to through JDBC?
Not sure if this helps or not... setting some parameter in db2cli.ini for type-2 JDBC: IBM Specifying the database partition server in a partitioned database environment with multiple logical partitions on one physical machine - United States
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On