Results 1 to 8 of 8
  1. #1
    Join Date
    Oct 2010
    Posts
    94

    Unanswered: DB2 UDB - 9.x Partitioning (AIX)

    Hello,

    I am looking for validation of DB2 9.x concepts related to table partitioning - I've not been able to get a clear description from the manuals - and a lot of people I seek advice from, seem to mix up partitioning with DPF (guess 9.x is still recent??). Here're the questions -

    1. When a user table is defined with its data on multiple table spaces - each pointing to a container defined to a distinct physical disk - do inserts to this table initiate parallel I/O to each of the table spaces/containers??
    2. Does a user table have to be defined to multiple table spaces to invoke parallel I/O? Is it possible to have multiple containers on distinct physical disks under the one table space support parallel I/O? The Admin manuals for 9.5 state that DB2 will write to one container through the end of the extent and then jump to another container - suggesting this would be a serial operation.

    3. Is the DDL clause "Partition by" in UDB 9.x inconsequential for tables defined to a single table space? even if the TS is defined to multiple containers/disks?


    Your guidance is much appreciated!

    - G

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    There are at least three different concepts here, that are independent of eachother:

    • DPF - Database Partitioning Feature. You mention that, but I am not sure if you are considering implementing this (requires a extra cost DPF license for DB2 ESE 9.1 and 9.5, or equivilent InfoSphere Warehouse in 9.7).
    • Table Partitioning, also known as range partitioning. Typically tables are partitioned by a date or timestamp column with one month per partition, but you can partition on other columns.
    • Multiple Containers per tablespace. This can be used for DPF, Table Partitioning, or just regular non-partitioned tables. This is not exactly the same thing as parallelism.


    To answer your questions:

    1. Inserts don't exactly occur in parallel since one row is inserted at a time, unless you have mutliple applications doing the inserts, and in that case it doesn't matter what kind of partitioning/containers are used. The exception would be in DPF, where data can be loaded in parallel.
    2. No, the issue of multiple containers versus parallelism are really different considerations, although having multiple containers can reduce physical disk contention in a similar way that disk stripping reduces disk spindle contention.
    3. No, it is perfectly reasonable to use the same tablespace for partitions in Table Partitioning. I would have a separate tablespace for the indexes (at a minimum) to allow you to place it in a different bufferpool than the table data.
    Last edited by Marcus_A; 10-27-10 at 23:01.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Oct 2010
    Posts
    94
    Marcus,

    Thanks much for the detailed response! At the current DB2 shop, I can see we have both DPF and table partitioning options available. We are using DPF actively, and we also have the "partition by" clause coded in our table DDLs. However, since every table has data on no more than 1 tablespace (which does have more than 1 container), I keep thinking we are writing sequentially to each of these containers.

    Here are the alternatives I am trying to compare (and you might have already answered this for me, so apologies for any redundancy) -

    In the illustration below, every 'container' is a separate physical disk -

    A. Insert 1 million rows into a table T1 with data in 1 tablespace with 2 containers

    B. Insert 1 million rows into a table T2 (identical to T1 in structure) with data in 2 tablespaces - each with its own container

    Which of the above will experience higher IO?

    I am drawn to review this as I started seeing iowait% of 10 - 15% consistently during loads. Our buffer pools are still being utilized well - suggesting our IO was taking the hit. With my inexperience though, I COULD BE WAY OFF!!!!!

    - G

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    It often makes perfect sense (especially with DB2 LUW 9.7) to use Table Partitioning and DPF together on large tables.

    The parallelism comes into play by using Database Partitioning (DPF), irrespective of the number of containers or tablespaces. By definition, even a single tablespace has multiple parts (one for each DPF partition). Remember that the inserts are writting data to bufferpools, and only later are the pages for a tablespace written to disk.

    Most shops use SAN storage these days, so a single mount point is really a RAID array of multiple disks, so it not as important to have multiple containers as it would be with JBOD, especially if you only have one CPU core for each DPF partition. If you have 2 CPU cores per DPF partition, then 2 containers might be better (at least theoretically). But it also depends on how many host bus adapters (and its bandwidth) there are from your server to your storage sub-system.

    However, if you want to experiment with different configurations, go ahead and run some tests. The results might depend somewhat on whether you trying to optimize inserts vs selects. Also, you will get an order of magnitude faster results with DB2 LOAD command compared to inserts. If you need to do inserts, it would likely help to start multiple threads to insert the data (which the LOAD command does for you in a DPF environment).

    I don't think Table Partitioning comes into play for insert/load performance. It is mostly for administrative ease (remove old data with partition detach) and for speeding up queries that only need to look at a single Table Partition. You can get query parallelism (intra-partition parallelism) with some select queries using Table Partitioning if the query needs data from multiple Table Partitions and you have intra-partitioning query parallelism configuration parms enabled for the database (you can google or search the docs on how to do this).
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  5. #5
    Join Date
    Oct 2010
    Posts
    94
    Got it! Thanks again!

    One last thing - if the tablespace layout for partitioning doesn't do it, do you think I could try relocating our log files on to a separate disk from the data files? I am told (I am not allowed access to DB parms and system tables!!!) that the log files and the data files are on the same array - could this be behind the IO wait? or could relocating the files alleviate the IO situation?

    Thanks for all the help,

    - G

  6. #6
    Join Date
    Oct 2010
    Posts
    94
    I must also share this - seems very interesting to me.

    Select dbpartitionnum(DPFkey), count(1) from table
    group by dbpartitionnum(DPFkey) -- above query returns 48 tuples confirming the 48 database partitions in the partition group for the table



    Select datapartitionnum(Partition key), count(1) from table
    group by datapartitionnum(Partition key) -- above query returns 1 tuple - suggesting there is only 1 partition - this implies there is only container assigned to all database partitions, correct?

    - G

  7. #7
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    DB2 active recovery files should definitely be on different disk spindles than the table data for redundancy purposes, otherwise you would not be able to have crash and/or rollforward recovery in case a disk (or disk array) was corrupted.

    Aside from the above reason, your concern is correct that the active logs should be on the fastest disk possible (usually high speed RAID 10), which is usually different than the type of RAID 5 arrays typically used for regular data. Whether this is causing your bottleneck, I don't know. There could be bottlenecks anywhere in the Storage infrastructure including the host bus adapters, or other systems/applications using the same disks.

    When using parallelism (inter-partition parallelism with DPF, or intra-partition parallelism with Table Partitioning) you should strive for a shared-nothing architecture, especially with your disk subsystem. That means that for each partition that uses parallelism, each partition should have its own disk spindles, which means separate arrays for each partition in a SAN environment. As I mentioned previously, many use Table Partitioning for other reasons besides parallelism, and it will probably not help to parallelize Table Parturitions if you don't have more than once CPU core for each partition in a parallel query (same concept as DPF). Parallel inserts are a completely different matter (requires DPF and/or multiple threads doing inserting to get insert parallelism). But remember the discussion above about the difference between doing an insert that updates data in bufferpools, and the actual updating of pages on disk at a later time with asynchronous page cleaners.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  8. #8
    Join Date
    Oct 2010
    Posts
    94
    Very helpful pointers! Thank you sir

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •