Results 1 to 5 of 5
  1. #1
    Join Date
    Nov 2009
    Posts
    20

    Unanswered: Card Count on SYSCAT.DATAPARTITIONS not matching - select count(*)

    This is a balanced warehouse - db2 9.7 we have 4 servers with 24 nodes.
    on AIX.

    I have loaded a partitioned table. we have 36 partitions. Aprox 3 years of monthly data.

    Once the load completed we did the runstats.

    Hash Key and Partition key
    DISTRIBUTE BY HASH ("MERCHANDISING_ID")
    PARTITION BY RANGE(LOAD_DATE NULLS LAST)

    the card value on the SYSCAT.DATAPARTITIONS does not match the values generated by
    select count(*), year(load_date), month(load_date)
    from XX.SALES
    group by year(load_date), month(load_date)
    order by 2,3
    with ur
    1 2 3
    ---------- ---- --
    8372 2012 2
    52661029 2012 3
    96387827 2012 4
    22410865 2012 5
    170530205 2012 6
    88096197 2012 7
    104658432 2012 8
    89644914 2012 9
    6165477 2012 10
    240835 2012 11
    1514474226 2012 12
    132758 2013 1
    2970639 2013 2
    232069016 2013 3
    92649289 2013 4
    454566025 2013 5
    92288592 2013 6
    100557164 2013 7
    112227889 2013 8
    32185285 2013 9

    SYSCAT.DATAPARTITIONS

    DATA_PRT_NME LOW_VAL HIGH_VAL ROWS_IN_TAB
    -------------------- --------------- --------------- -----------
    PART_106 '2010-10-01' '2010-11-01' 0
    PART_107 '2010-11-01' '2010-12-01' 0
    PART_108 '2010-12-01' '2011-01-01' 0
    PART_109 '2011-01-01' '2011-02-01' 0
    PART_110 '2011-02-01' '2011-03-01' 0
    PART_19 '2011-03-01' '2011-04-01' 0
    PART_20 '2011-04-01' '2011-05-01' 0
    PART_21 '2011-05-01' '2011-06-01' 0
    PART_22 '2011-06-01' '2011-07-01' 0
    PART_23 '2011-07-01' '2011-08-01' 0
    PART_24 '2011-08-01' '2011-09-01' 0
    PART_25 '2011-09-01' '2011-10-01' 0
    PART_26 '2011-10-01' '2011-11-01' 0
    PART_0 '2011-11-01' '2011-12-01' 0
    PART_1 '2011-12-01' '2012-01-01' 0
    PART_2 '2012-01-01' '2012-02-01' 0
    PART_3 '2012-02-01' '2012-03-01' 281
    PART_4 '2012-03-01' '2012-04-01' 1877008
    PART_5 '2012-04-01' '2012-05-01' 3641275
    PART_6 '2012-05-01' '2012-06-01' 829923
    PART_7 '2012-06-01' '2012-07-01' 6261730
    PART_8 '2012-07-01' '2012-08-01' 3313054
    PART_9 '2012-08-01' '2012-09-01' 3874667
    PART_11 '2012-09-01' '2012-10-01' 3262642
    PART_12 '2012-10-01' '2012-11-01' 241670
    PART_13 '2012-11-01' '2012-12-01' 9663
    PART_14 '2012-12-01' '2013-01-01' 55513070
    PART_15 '2013-01-01' '2013-02-01' 5460
    PART_16 '2013-02-01' '2013-03-01' 113944
    PART_17 '2013-03-01' '2013-04-01' 8733541
    PART_18 '2013-04-01' '2013-05-01' 3604030
    PART_100 '2013-05-01' '2013-06-01' 17274474
    PART_101 '2013-06-01' '2013-07-01' 3504793
    PART_102 '2013-07-01' '2013-08-01' 3875800
    PART_103 '2013-08-01' '2013-09-01' 4181872
    PART_104 '2013-09-01' '2013-10-01' 1178669
    PART_105 '2013-10-01' '2013-11-01' 0

    Why does the Card count not have all the values the select count returns?

  2. #2
    Join Date
    Apr 2012
    Posts
    1,034
    Provided Answers: 18
    what runstats command was used? any sampling?

  3. #3
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    In a DPF environment statistics are estimated based on a single partition -- the one where the RUNSTATS command is run or, if the table does not exist in that partition, the first partition of that table. As a result the total statistics will not exactly match row counts, especially if the distribution is skewed.

    RUNSTATS command
    ---
    "It does not work" is not a valid problem statement.

  4. #4
    Join Date
    Nov 2009
    Posts
    20
    Thanks - I will look at the runstats command.
    we are going to run db2_all "db2 connect to edwp; db2 RUNSTATS ON TABLE XXX ON ALL COLUMNS WITH DISTRIBUTION"
    and see what this returns

  5. #5
    Join Date
    Aug 2008
    Location
    Toronto, Canada
    Posts
    2,367
    Quote Originally Posted by Howardw View Post
    we are going to run db2_all "db2 connect to edwp; db2 RUNSTATS ON TABLE XXX ON ALL COLUMNS WITH DISTRIBUTION"
    You don't run it on all partitions. Check the following technote for information about where it's best to run runstats:
    IBM Table has cardinality of 0 after RUNSTATS is performed - United States


    "... RUNSTATS (by design) would calculate the value for the total number of rows in the table by simply extrapolating the count from that one partition and then multiplying it by the total number of partitions on which the tablespace is found.

    Therefore, to ensure that a non-zero and accurate cardinality is returned:

    1. Run the following query to determine how table data is distributed,

    select dbpartitionnum (<column>), count (*) from <schema>.<table> group by dbpartitionnum (<column>)

    then export DB2NODE to a partition that contains table data, then run RUNSTATS from that partition. "

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •