Quote:
|
Originally Posted by linakichi
I kinda wonder about the WITH DISTRIBUTION option that RUNSTATS had
|
Not sure whether that's what DB2 8.2 for LUW provides, but the term "distribution statistics" normally refers to ranges, or bins, of values rather than individual value counts.
For columns with just a limited number of discrete values, the value count statistics ("basic statistics") are just fine. But for numeric and essentially continuous datatypes (like REAL, FLOAT, DECFLOAT, DECIMAL, or even INT), columns will tend to have lots of different values with almost none of them being equal. In those cases, knowing how many values fall between (say) 0 and 1, 1 and 2, 2 and 3, ... is more informative to the optimizer than knowing that (say) 1.5437 and 2.7345 are the only two values occurring more than once.
The boundary values (in my example: 0, 1, 2, 3) set up so called "bins", or quantiles, for which histogram (distribution) statistics can be computed given the full (or sampled) data.