Results 1 to 3 of 3
  1. #1
    Join Date
    May 2004

    Unanswered: beware of avg() for large datasets

    Not a question, just a post about something I ran into today that surprised me a little.

    I have a huge dataset - about 700m rows, one column of which is a tinyint. I needed to take an average of that column, so I did this:

    select avg(mytinyint) from mytable
    which returned:

    Msg 8115, Level 16, State 2, Line 6
    Arithmetic overflow error converting expression to data type int.
    It turns out, if the sum() over the column is bigger than MAX_INT = 2147483647, then avg() will throw this error. This surprised me a little - I guess I was assuming sql server would use bigint internally when computing integer avg, but it uses int.

    so you have to do this:

    select avg(cast(mytinyint as bigint)) from mytable
    what's annoying to me is, it's impossible that an average of type T could be larger or smaller than the max/min allowed value of a type T. This means sql server devs could have internally always used bigint for sum and count, and then returned cast(sum/count as T) from avg() without possibility of overflow (unless sum or count was bigger than the max bigint! you are on your own if that happens i guess!).

    Because of this, you have to cast your column to a bigint to make it work if your dataset is large enough, even if the column is only 1 measly byte! kind of a waste of space if you ask me.

    here's a little example of what I'm talking about, that doesn't require you to import a 700m row dataset

    declare @t table (id int)
    insert into @t select 1 union all select 2147483647
    select avg(cast(id as bigint)) from @t  -- works
    select avg(id) from @t -- fails

  2. #2
    Join Date
    Nov 2005
    San Francisco, CA
    Success is the ability to go from one failure to another with no loss of enthusiasm.
    - Sir Winston Churchill

  3. #3
    Join Date
    Jun 2003
    Provided Answers: 1
    Very interesting. I will check it out tomorrow. Thanks for posting it.
    If it's not practically useful, then it's practically useless.

    blindman "sqlblindman"

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts