Results 1 to 15 of 15
  1. #1
    Join Date
    Oct 2010
    Posts
    94

    Unanswered: Multi container table space read operations

    Hello,

    We have a data warehouse that is hosted off a db2 database with 1 co-ordinator node and 6 data nodes ( if thats what they are called!). The 6 nodes implement 8 partitions each so this is a 48 partition database forming a BCU.

    Also, this is a hub and spoke warehouse so all operational data is refreshed in the ODS daily. Then individual marts pull data from the hub at different intervals ... Some daily, others weekly, yet others monthly. All of them are on the same database.

    I just realized that all the tablespaces, index spaces for the hub and the marts are on the same disks. Specifically, each of the 6 nodes has 32 LUNS. these 32 LUNS are made available as 32 different filesystems (although the same LUN seems to be made a part of a couple of different volume groups) on each node - 4 containers for each logical partition.

    Recently due to performance concerns we starting taking a closer look at the nmon logs and find a significant percentage of CPU waits ... the application server shows no signs of waits or peak capacity operations eliminating that from the list of reasons why the DB CPUs show sizeable wait times. nmon or iostat dont show 100% busy disks (except for some 30 sec spikes) but they do show the same CPU waits that the CPU sections show.

    Also, the transaction logs for individual nodes are also on the same 32 LUNs that host the file systems for all the table spaces in the environment, and it has been established that a very small fraction of the ETL application actually performs bulk loads - most of the traffic is Logged inserts, updates and deletes.

    My question - would it not help if we seggregated the hub from the marts in LUNs ... sure we'll reduce the number of containers available to each mart per se, but at least they wont have to contend with 50 other sessions trying to get their data in/out???? Thoughts?????

    Sorry, I meant to include -

    DB2 version 9.5 FP 3
    Operating System AIX 5.3
    Last edited by getback0; 05-18-12 at 01:15. Reason: Sorry - ran out of power!!!

  2. #2
    Join Date
    Apr 2012
    Posts
    1,035
    Provided Answers: 18
    ------------
    Last edited by db2mor; 05-20-12 at 05:44. Reason: corrected

  3. #3
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by getback0 View Post
    Hello,

    We have a data warehouse that is hosted off a db2 database with 1 co-ordinator node and 6 data nodes ( if thats what they are called!). The 6 nodes implement 8 partitions each so this is a 48 partition database forming a BCU.

    Also, this is a hub and spoke warehouse so all operational data is refreshed in the ODS daily. Then individual marts pull data from the hub at different intervals ... Some daily, others weekly, yet others monthly. All of them are on the same database.

    I just realized that all the tablespaces, index spaces for the hub and the marts are on the same disks. Specifically, each of the 6 nodes has 32 LUNS. these 32 LUNS are made available as 32 different filesystems (although the same LUN seems to be made a part of a couple of different volume groups) on each node - 4 containers for each logical partition.

    Recently due to performance concerns we starting taking a closer look at the nmon logs and find a significant percentage of CPU waits ... the application server shows no signs of waits or peak capacity operations eliminating that from the list of reasons why the DB CPUs show sizeable wait times. nmon or iostat dont show 100% busy disks (except for some 30 sec spikes) but they do show the same CPU waits that the CPU sections show.

    Also, the transaction logs for individual nodes are also on the same 32 LUNs that host the file systems for all the table spaces in the environment, and it has been established that a very small fraction of the ETL application actually performs bulk loads - most of the traffic is Logged inserts, updates and deletes.

    My question - would it not help if we seggregated the hub from the marts in LUNs ... sure we'll reduce the number of containers available to each mart per se, but at least they wont have to contend with 50 other sessions trying to get their data in/out???? Thoughts?????

    Sorry, I meant to include -

    DB2 version 9.5 FP 3
    Operating System AIX 5.3
    If your partitions are sharing disk for tablespaces, you don't have a BCU (Balanced Configuration Unit).

    In a shared nothing architecture like DPF, each partition should have its own disk, essentially its own disk array(s) dedicated to that partition. In addition, transaction logs should be on a different disk array than the data (but usually they can be shared among all the partitions on a single physical machine). Setting up disk properly for a DPF BCU is something that storage administrators in most companies don't understand, since they are used to sharing spindles across many different mount points on many different servers, which is fine for most OLTP systems, and even some data warehouse system, but not for DB2 DPF.

    Likewise, you need to have sufficient disk controllers per each machine. If you have 8 partitions per machine, you would probably be fine with 4 disk controllers per machine just for the tablespace arrays, which would be 1 controller for each every 2 DPF partitions (each with their own disk array(s)). Having 2 disk controllers per server might be OK, but could be a bottleneck. Each of the controllers should be assigned to specific disk array (or group of disk arrays) which in turn are assigned to specific partitions. If you had 3 disk controllers assigned to the disk arrays for the partition tablespaces, that would be unbalanced by definition, since 8/3 is not a whole number. But you could have 3 if 2 were assigned to the partition arrays and one to the log arrays on each server. But for a highly active system, 2 disk controller for 8 partitions is a little light. So the disk I/O controller/adapter could be a bottleneck in your system, but obviously I cannot know that from my vantage point at this time. If you load data with inserts, rather than with the LOAD utility, you may need a different resource configuration to handle that.

    Also important is the number of CPU cores per each physical server (which would indicate how many per each DPF partition). You need 1-2 CPU cores per partition, so that would be 8-16 CPU cores per physical machine.

    One thing that is necessary to understand when building BCU's is how much data is assigned to each partition, and how large are the tables are. With 48 partitions, there could be a lot of overhead for each SQL, so it is possible that you have have too many partitions for the size of tables you have, relative to the nature of SQL that runs against them.

    DPF contention can also occur at the private network level to handle inter-server communication for DPF partitions needing data on other DPF partitions. The private network for this should be as fast as possible and not shared with the public network where applications interface to the data warehouse. Having 6 different physical servers requires a pretty hefty private network in most cases.

    Even if you have a good private network for inter-partition communications across servers, you need to minimize inter-partition traffic needed for table joins, FK validation, etc. This done by proper table design, which is usually specific to DPF systems (and counter-intuitive to most data modelers), and by using MQT's to collocate smaller reference tables to all the partitions to reduce such inter-partition communication requirements.

    There are a lot of other things to look at, so I would recommend you hire an IBM or independent consultant for a few days or a week to make an assessment of your whole system configuration, database design, and workload.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  4. #4
    Join Date
    Oct 2010
    Posts
    94
    FIRST UP, Thanks a lot Marcus .. like always .. I cannot agree more with the fact that we need an expert more than a speculation or forum troubleshooting drill (like yours truly is running ) .. but unfortunately, like I shared before, there seems to be some mgmt stuff going on arnd this problem that is holding up progress in 'hiring the expert' for this issue.


    I would recommend you hire an IBM or independent consultant for a few days or a week to make an assessment of your whole system configuration, database design, and workload.
    In fact there was an IBM assessment done recently, but we got only the following back -
    I suspect the system might be experiencing: Unexpected paging space growth
    using DB2 version 9.1 or later on AIX platforms
    The MEMUSE tab in nmon is showing %comp greater then %maxperm and %numperm below/near to %minperm. This is an indicator of a memory issue. This is aggravated by high CPU pushing up the runqueue where all those processes hold onto memory.

    Approaches to addressing memory issues include:
    1. add memory
    2. reduce the database memory consumers (e.g. bufferpools)
    3. reduce the process memory consumers by controlling the number of queries allowed to run concurrently
    I hear from the admin staff we made one change #3 to reduce 16K bufferpools from 80GB to 70GB and the nmon logs are now being analyzed to assess the improvement, if any.

    I want to share one more thing as to why I am still looking at the IO even when IBM suggested a plausible memory issue (which does make sense the memory consumption %comp is around 80 through the day even when the workload is easing off) ..

    when I run 'vmstat -Iwt 2' or iostat or nmon , they all report a sizeable % of CPUs waiting on disk IO .. in the specific column the command documents reporting a disk IO wait versus a CPU idle wait. Hence we did want to pursue this in-house too since the IBM resource indicated that was a plausible RCA and not conclusive.

    So, getting back to your comments Marcus

    If your partitions are sharing disk for tablespaces, you don't have a BCU (Balanced Configuration Unit).
    I mis-stated that - I took a closer look at the mapping from database schema a.k.a hub or data mart tables to table space to containers on all partitions and cross referenced that to the file system - Logical Volume and down to Physical Volume/LUN information in storage and see there is no sharing. All LUNs are RAID 5

    Physical Node 1 => Logical Partition 1 => Container 1 => File System 1 => Logical Volume 1 => Physical Volume 1 => LUN 1 = 400 GB
    Physical Node 1 => Logical Partition 1 => Container 2 => File System 2 => Logical Volume 2 => Physical Volume 2 => LUN 2 = 400 GB
    Physical Node 1 => Logical Partition 1 => Container 3 => File System 3 => Logical Volume 3 => Physical Volume 3 => LUN 3 = 400 GB
    Physical Node 1 => Logical Partition 1 => Container 4 => File System 4 => Logical Volume 4 => Physical Volume 4 => LUN 4 = 400 GB

    Physical Node 1 => Logical Partition 2 => Container 5 => File System 5 => Logical Volume 5 => Physical Volume 5 => LUN 5 = 400 GB
    ....
    so on for the logical partitions 2- 8 on physical node 1

    .......
    and on for the logical partitions 9-16 on physical node 2
    and on for the logical partitions 17-24 on physical node 3
    and on for the logical partitions 25-32 on physical node 4
    and on for the logical partitions 33-40 on physical node 5
    and on for the logical partitions 41-48 on physical node 6

    ......

    Some interesting facts (to me anyways)

    1) We have a little over 13TB on each of the nodes (400GB *32 * 8 LUNs on each physical node) - so thats overall a 104TB storage. There is an additional 6.5 TB on Node 0
    2) All LUNs are about 87% full utilized
    ************************************************** *******************
    3) Each node accesses its own 32 LUNs with only 2 disk controllers
    ************************************************** *******************
    To your point -
    Having 2 disk controllers per server might be OK, but could be a bottleneck. Each of the controllers should be assigned to specific disk array (or group of disk arrays) which in turn are assigned to specific partitions.
    I can see we are alternating between disk controllers on the LUNs but I am waiting for a final mapping from storage that will map the 'hdisks' I know the mappings for to the LUNs we know the controllers for .. will update with the results

    4)
    Also important is the number of CPU cores per each physical server (which would indicate how many per each DPF partition). You need 1-2 CPU cores per partition, so that would be 8-16 CPU cores per physical machine.
    we have 4 dual core CPUs on each of the 6 nodes - they are all IBM,9117-MMA POWER 6 4.7G with 31GB physical RAM


    One thing that is necessary to understand when building BCU's is how much data is assigned to each partition, and how large are the tables are. With 48 partitions, there could be a lot of overhead for each SQL, so it is possible that you have have too many partitions for the size of tables you have, relative to the nature of SQL that runs against them.
    There are some gynormous tables in our DW .. as you can tell from the disk utilization % - some in excess of 4 billion records - I think that points back to an archiving issue with our architecture - I dont know that our Business Intelligence is doing a good job implementing a proactive retention policy - I suspect some of these marts are storing way more data than is being really used .. although every byte of storage out there is adding to the workload


    Even if you have a good private network for inter-partition communications across servers, you need to minimize inter-partition traffic needed for table joins, FK validation, etc. This done by proper table design, which is usually specific to DPF systems (and counter-intuitive to most data modelers), and by using MQT's to collocate smaller reference tables to all the partitions to reduce such inter-partition communication requirements.
    From first hand experience I know these nodes are LPARs on the same rack and connected by fibre optic cables to storage, also local ..
    Agree with the inter partition traffic item too - the DBAs did mention this point too - and I have been advocating our applications to run ETL in a partition-aware mode since our product does have the ability to run with APIs that allow dedicated connections from the ETL server straight to the data nodes - instead of the hops to and from the co-ordinator node - at least when the joins are collocated .... that measure is underway and we are seeing improvement as we keep cleaning up the code there
    Last edited by getback0; 05-18-12 at 18:20.

  5. #5
    Join Date
    Oct 2010
    Posts
    94
    Ok the hdisk to LUN mappings are here - this is the updated mapping with the Controller information -

    Physical Node 1 => Logical Partition 1 => Container 1 => File System 1 => Logical Volume 1 => Physical Volume 1 => Controller A => LUN 1 = 400 GB
    Physical Node 1 => Logical Partition 1 => Container 2 => File System 2 => Logical Volume 2 => Physical Volume 2 => Controller B => LUN 2 = 400 GB
    Physical Node 1 => Logical Partition 1 => Container 3 => File System 3 => Logical Volume 3 => Physical Volume 3 => Controller A => LUN 3 = 400 GB
    Physical Node 1 => Logical Partition 1 => Container 4 => File System 4 => Logical Volume 4 => Physical Volume 4 => Controller B => LUN 4 = 400 GB

    Physical Node 1 => Logical Partition 2 => Container 5 => File System 5 => Logical Volume 5 => Physical Volume 5 => LUN 5 = 400 GB
    ....
    so on for the logical partitions 2- 8 on physical node 1

    .......
    and on for the logical partitions 9-16 on physical node 2
    and on for the logical partitions 17-24 on physical node 3
    and on for the logical partitions 25-32 on physical node 4
    and on for the logical partitions 33-40 on physical node 5
    and on for the logical partitions 41-48 on physical node 6

  6. #6
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by getback0 View Post
    I hear from the admin staff we made one change #3 to reduce 16K bufferpools from 80GB to 70GB
    Quote Originally Posted by getback0 View Post
    we have 4 dual core CPUs on each of the 6 nodes - they are all IBM,9117-MMA POWER 6 4.7G with 31GB physical RAM
    Bufferpools are usually defined per DB2 node (member), and if my calculations are correct, you allocate to the 16K bufferpool 70 * 8 nodes / 31 ~ 18.5 times more memory than physically available?

  7. #7
    Join Date
    Oct 2010
    Posts
    94
    So no one thinks CPU waits for IO (reported by all the utilities - vmstat, iostat, nmon) would have anything to do with the fact that - data, indexes, and transaction logs all reside on the same LUNs for the data ware house and 6 data marts???? when practically all the ETL in the shop is issuing logged transactions (and using LOADs for infinitesimal number of inserts)??

  8. #8
    Join Date
    Apr 2012
    Posts
    1,035
    Provided Answers: 18
    I suspect people are just trying to form a picture from the snippets of informaiton in different posts.
    Folks naturally try to eliminate the obvious things first, but that is hard to do without all the facts.

    If I have understood you correctly, it does seem odd to me that an 8-core Lpar with only 31 GB ram can properly sustain 6 db2-logical-partitions.

    I prefer that each db2-logical-partition has its own 32GB physical ram available.

    As n_i mentioned, if you have 31 GB ram per Lpar, do you *really* have very large bufferpools?

    Can you give more facts on that?

    Back that up with figures about swap utilization at busy periods.

    Can you also specify the original version of DB2 on which your database got created? (e.g. v9.1, v9.5 , v9.7) as distinct from migrated.

    For the I/O waits, what is the physical storage? SAN? local? If SAN, are you multi-pathing on the HBAs? If SAN, what do the admins tell you about how their resources are performing? The devil is in the details, which we do not have but you might have. There's lots of info in this paper:
    IBM Techdocs Technote: AIX disk queue depth tuning for performance

  9. #9
    Join Date
    Oct 2010
    Posts
    94
    Thanks for the link - it is very informative!

    As n_i mentioned, if you have 31 GB ram per Lpar, do you *really* have very large bufferpools? Can you give more facts on that? Back that up with figures about swap utilization at busy periods.
    The %comp memory stays up even when the workload has eased off (CPUs report ~0% busy) and % comp memory is steady at around 90%. IBM is looking for potential memory hogging issues.

    Can you also specify the original version of DB2 on which your database got created? (e.g. v9.1, v9.5 , v9.7) as distinct from migrated.
    V9.1

    For the I/O waits, what is the physical storage? SAN? local?
    SAN

    If SAN, are you multi-pathing on the HBAs?
    I will check and get back to you

    If SAN, what do the admins tell you about how their resources are performing?
    They say they are not seeing anything out of the ordinary ... but thats my contention - with everything (data for the warehouse and all the data marts, and then indexes, and then transaction logs), I am thinking we wont necessarily have poor IO service times, but really long wait queues?? no??

  10. #10
    Join Date
    Apr 2012
    Posts
    1,035
    Provided Answers: 18
    Please confirm if you have multiple db2-logical-partitions running in a physical-node (lpar) that has only 31GB ram total.

  11. #11
    Join Date
    Oct 2010
    Posts
    94
    Please confirm if you have multiple db2-logical-partitions running in a physical-node (lpar) that has only 31GB ram total.
    That is correct.

  12. #12
    Join Date
    Apr 2012
    Posts
    1,035
    Provided Answers: 18
    So if you have 6 logical-partitions in 31GB, that's around 5GB each of real memory.
    That just seems insane to me, because you cannot have reasonable sized bufferpools.
    Once again, are your nodes paging like mad (if you configured 70gb bufferpools)?

    Separately, can you confirm that your main tablespaces are using 'NO filesystem caching' (because this was not the default on v9.1 or v9.5 for jfs on Aix).

  13. #13
    Join Date
    Oct 2010
    Posts
    94
    Separately, can you confirm that your main tablespaces are using 'NO filesystem caching' (because this was not the default on v9.1 or v9.5 for jfs on Aix).
    Yes, the tablespaces are created with the 'NO FILESYSTEM CACHING'.


    So if you have 6 logical-partitions in 31GB, that's around 5GB each of real memory. That just seems insane to me, because you cannot have reasonable sized bufferpools. Once again, are your nodes paging like mad (if you configured 70gb bufferpools)?
    Actually, like I said, we have 8-logical partitions per member node .. i guess that makes it worse. I am going to consult the nmon captures again for the paging data

  14. #14
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by getback0 View Post
    I am going to consult the nmon captures again
    May be, as a DBA, you should try to consult db2pd as well.

  15. #15
    Join Date
    Oct 2010
    Posts
    94
    May be, as a DBA, you should try to consult db2pd as well.
    I am not ... I am jumping in to help as the DBA staff seems to have pretty much given up! I will check if they can give me the db2pd logs.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •