If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > RuNsTaTs Mysteries

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-10-08, 10:28
linakichi linakichi is offline
Registered User
 
Join Date: Aug 2008
Posts: 45
RuNsTaTs Mysteries

Hi, folks, I'm currently doing a study about RUNSTATS at DB2 v8.2 on Linux Redhat

I kinda wonder about the WITH DISTRIBUTION option that RUNSTATS had, which supposed to show us the number of occurences of a value (for the F part), but when I check the value's number of occurences with :
SELECT column,COUNT(column) FROM table GROUP BY column

It has a totally different value from the one at the SYSCAT.COLDIST has, for example :

COLVALUE : 66
VALCOUNT : 827

When I check using the above mentioned SQL statement, the value 66 only has 437 duplicates.

I wonder, if number of occurences and duplicates has different meaning ?

And also, about LIKE STATISTICS, when I saw the manual it supposed to update SUB_COUNT and SUB_DELIM_LENGTH columns in the SYSIBM.SYSCOLUMNS table, right ? I've done the RUNSTATS ... LIKE STATISTICS against a string-typed column, but as the result, those columns' value doesn't change at all. Does anyone has any clue ??
Reply With Quote
  #2 (permalink)  
Old 11-11-08, 08:25
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
Quote:
Originally Posted by linakichi
I kinda wonder about the WITH DISTRIBUTION option that RUNSTATS had
Not sure whether that's what DB2 8.2 for LUW provides, but the term "distribution statistics" normally refers to ranges, or bins, of values rather than individual value counts.

For columns with just a limited number of discrete values, the value count statistics ("basic statistics") are just fine. But for numeric and essentially continuous datatypes (like REAL, FLOAT, DECFLOAT, DECIMAL, or even INT), columns will tend to have lots of different values with almost none of them being equal. In those cases, knowing how many values fall between (say) 0 and 1, 1 and 2, 2 and 3, ... is more informative to the optimizer than knowing that (say) 1.5437 and 2.7345 are the only two values occurring more than once.

The boundary values (in my example: 0, 1, 2, 3) set up so called "bins", or quantiles, for which histogram (distribution) statistics can be computed given the full (or sampled) data.
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On