Unanswered: ProBlems with Large Data Distribution Statistics
Hi all... It's been quite a while since my last post. Happy New Year !!
I'm using DB2 version 8.2 on Linux Redhat
On this occassion, I have a question about distribution statistics in DB2.
About frequency statistics
According to the definition, the values' each frequency should be able
to be calculated as select count(*) from table where col=value;
The numbers was right when applying on small size data (less than 100)
About quantile statistics
We should be able to count the quantile as select count(*) from table
The numbers was also correct when applying on small sized data
The problem is when I applied the distribution statistics on 10000 rows of data (only a column of numbers), the values were different than what I thought, it's either less or more than the numbers produced from the queries.
I also managed to notice that when dealing with unique data, when the data number is small, the distribution statistics won't be taken, but if the data number large, the distribution statistics would be taken. I wonder from where did those numbers come...
Sorry for the long explanation, and if anyone have any references according this matter, please...