View Single Post
  #1 (permalink)  
Old 10-31-06, 19:50
jezemine jezemine is offline
another indirection layer
 
Join Date: May 2004
Location: Seattle
Posts: 1,311
beware of avg() for large datasets

Not a question, just a post about something I ran into today that surprised me a little.

I have a huge dataset - about 700m rows, one column of which is a tinyint. I needed to take an average of that column, so I did this:

Code:
select avg(mytinyint) from mytable
which returned:

Code:
Msg 8115, Level 16, State 2, Line 6
Arithmetic overflow error converting expression to data type int.
It turns out, if the sum() over the column is bigger than MAX_INT = 2147483647, then avg() will throw this error. This surprised me a little - I guess I was assuming sql server would use bigint internally when computing integer avg, but it uses int.

so you have to do this:

Code:
select avg(cast(mytinyint as bigint)) from mytable
what's annoying to me is, it's impossible that an average of type T could be larger or smaller than the max/min allowed value of a type T. This means sql server devs could have internally always used bigint for sum and count, and then returned cast(sum/count as T) from avg() without possibility of overflow (unless sum or count was bigger than the max bigint! you are on your own if that happens i guess!).

Because of this, you have to cast your column to a bigint to make it work if your dataset is large enough, even if the column is only 1 measly byte! kind of a waste of space if you ask me.

here's a little example of what I'm talking about, that doesn't require you to import a 700m row dataset

Code:
declare @t table (id int)
insert into @t select 1 union all select 2147483647
select avg(cast(id as bigint)) from @t  -- works
select avg(id) from @t -- fails
__________________
elsasoft.org
Reply With Quote