If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > DB2 UDB - 9.x Partitioning (AIX)

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-27-10, 21:12
getback0 getback0 is offline
Registered User
 
Join Date: Oct 2010
Posts: 73
DB2 UDB - 9.x Partitioning (AIX)

Hello,

I am looking for validation of DB2 9.x concepts related to table partitioning - I've not been able to get a clear description from the manuals - and a lot of people I seek advice from, seem to mix up partitioning with DPF (guess 9.x is still recent??). Here're the questions -
  1. When a user table is defined with its data on multiple table spaces - each pointing to a container defined to a distinct physical disk - do inserts to this table initiate parallel I/O to each of the table spaces/containers??
  2. Does a user table have to be defined to multiple table spaces to invoke parallel I/O? Is it possible to have multiple containers on distinct physical disks under the one table space support parallel I/O? The Admin manuals for 9.5 state that DB2 will write to one container through the end of the extent and then jump to another container - suggesting this would be a serial operation.

  3. Is the DDL clause "Partition by" in UDB 9.x inconsequential for tables defined to a single table space? even if the TS is defined to multiple containers/disks?

Your guidance is much appreciated!

- G
Reply With Quote
  #2 (permalink)  
Old 10-27-10, 21:52
Marcus_A Marcus_A is online now
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,198
There are at least three different concepts here, that are independent of eachother:
  • DPF - Database Partitioning Feature. You mention that, but I am not sure if you are considering implementing this (requires a extra cost DPF license for DB2 ESE 9.1 and 9.5, or equivilent InfoSphere Warehouse in 9.7).
  • Table Partitioning, also known as range partitioning. Typically tables are partitioned by a date or timestamp column with one month per partition, but you can partition on other columns.
  • Multiple Containers per tablespace. This can be used for DPF, Table Partitioning, or just regular non-partitioned tables. This is not exactly the same thing as parallelism.

To answer your questions:
  1. Inserts don't exactly occur in parallel since one row is inserted at a time, unless you have mutliple applications doing the inserts, and in that case it doesn't matter what kind of partitioning/containers are used. The exception would be in DPF, where data can be loaded in parallel.
  2. No, the issue of multiple containers versus parallelism are really different considerations, although having multiple containers can reduce physical disk contention in a similar way that disk stripping reduces disk spindle contention.
  3. No, it is perfectly reasonable to use the same tablespace for partitions in Table Partitioning. I would have a separate tablespace for the indexes (at a minimum) to allow you to place it in a different bufferpool than the table data.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390

Last edited by Marcus_A; 10-27-10 at 22:01.
Reply With Quote
  #3 (permalink)  
Old 10-28-10, 01:50
getback0 getback0 is offline
Registered User
 
Join Date: Oct 2010
Posts: 73
Marcus,

Thanks much for the detailed response! At the current DB2 shop, I can see we have both DPF and table partitioning options available. We are using DPF actively, and we also have the "partition by" clause coded in our table DDLs. However, since every table has data on no more than 1 tablespace (which does have more than 1 container), I keep thinking we are writing sequentially to each of these containers.

Here are the alternatives I am trying to compare (and you might have already answered this for me, so apologies for any redundancy) -

In the illustration below, every 'container' is a separate physical disk -

A. Insert 1 million rows into a table T1 with data in 1 tablespace with 2 containers

B. Insert 1 million rows into a table T2 (identical to T1 in structure) with data in 2 tablespaces - each with its own container

Which of the above will experience higher IO?

I am drawn to review this as I started seeing iowait% of 10 - 15% consistently during loads. Our buffer pools are still being utilized well - suggesting our IO was taking the hit. With my inexperience though, I COULD BE WAY OFF!!!!!

- G
Reply With Quote
  #4 (permalink)  
Old 10-28-10, 02:14
Marcus_A Marcus_A is online now
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,198
It often makes perfect sense (especially with DB2 LUW 9.7) to use Table Partitioning and DPF together on large tables.

The parallelism comes into play by using Database Partitioning (DPF), irrespective of the number of containers or tablespaces. By definition, even a single tablespace has multiple parts (one for each DPF partition). Remember that the inserts are writting data to bufferpools, and only later are the pages for a tablespace written to disk.

Most shops use SAN storage these days, so a single mount point is really a RAID array of multiple disks, so it not as important to have multiple containers as it would be with JBOD, especially if you only have one CPU core for each DPF partition. If you have 2 CPU cores per DPF partition, then 2 containers might be better (at least theoretically). But it also depends on how many host bus adapters (and its bandwidth) there are from your server to your storage sub-system.

However, if you want to experiment with different configurations, go ahead and run some tests. The results might depend somewhat on whether you trying to optimize inserts vs selects. Also, you will get an order of magnitude faster results with DB2 LOAD command compared to inserts. If you need to do inserts, it would likely help to start multiple threads to insert the data (which the LOAD command does for you in a DPF environment).

I don't think Table Partitioning comes into play for insert/load performance. It is mostly for administrative ease (remove old data with partition detach) and for speeding up queries that only need to look at a single Table Partition. You can get query parallelism (intra-partition parallelism) with some select queries using Table Partitioning if the query needs data from multiple Table Partitions and you have intra-partitioning query parallelism configuration parms enabled for the database (you can google or search the docs on how to do this).
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #5 (permalink)  
Old 10-28-10, 12:33
getback0 getback0 is offline
Registered User
 
Join Date: Oct 2010
Posts: 73
Got it! Thanks again!

One last thing - if the tablespace layout for partitioning doesn't do it, do you think I could try relocating our log files on to a separate disk from the data files? I am told (I am not allowed access to DB parms and system tables!!!) that the log files and the data files are on the same array - could this be behind the IO wait? or could relocating the files alleviate the IO situation?

Thanks for all the help,

- G
Reply With Quote
  #6 (permalink)  
Old 10-28-10, 13:46
getback0 getback0 is offline
Registered User
 
Join Date: Oct 2010
Posts: 73
I must also share this - seems very interesting to me.

Select dbpartitionnum(DPFkey), count(1) from table
group by dbpartitionnum(DPFkey) -- above query returns 48 tuples confirming the 48 database partitions in the partition group for the table



Select datapartitionnum(Partition key), count(1) from table
group by datapartitionnum(Partition key) -- above query returns 1 tuple - suggesting there is only 1 partition - this implies there is only container assigned to all database partitions, correct?

- G
Reply With Quote
  #7 (permalink)  
Old 10-28-10, 13:52
Marcus_A Marcus_A is online now
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,198
DB2 active recovery files should definitely be on different disk spindles than the table data for redundancy purposes, otherwise you would not be able to have crash and/or rollforward recovery in case a disk (or disk array) was corrupted.

Aside from the above reason, your concern is correct that the active logs should be on the fastest disk possible (usually high speed RAID 10), which is usually different than the type of RAID 5 arrays typically used for regular data. Whether this is causing your bottleneck, I don't know. There could be bottlenecks anywhere in the Storage infrastructure including the host bus adapters, or other systems/applications using the same disks.

When using parallelism (inter-partition parallelism with DPF, or intra-partition parallelism with Table Partitioning) you should strive for a shared-nothing architecture, especially with your disk subsystem. That means that for each partition that uses parallelism, each partition should have its own disk spindles, which means separate arrays for each partition in a SAN environment. As I mentioned previously, many use Table Partitioning for other reasons besides parallelism, and it will probably not help to parallelize Table Parturitions if you don't have more than once CPU core for each partition in a parallel query (same concept as DPF). Parallel inserts are a completely different matter (requires DPF and/or multiple threads doing inserting to get insert parallelism). But remember the discussion above about the difference between doing an insert that updates data in bufferpools, and the actual updating of pages on disk at a later time with asynchronous page cleaners.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #8 (permalink)  
Old 10-28-10, 14:06
getback0 getback0 is offline
Registered User
 
Join Date: Oct 2010
Posts: 73
Very helpful pointers! Thank you sir
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On