Page 1 of 2 12 LastLast
Results 1 to 15 of 16
  1. #1
    Join Date
    Mar 2007
    Posts
    212

    Unanswered: Strange results when working out average of a count

    Hi

    I have the following mySQL table:

    Code:
    CREATE TABLE `enquiries` (
      `enquiry_id` int(10) unsigned NOT NULL auto_increment,
      `enquiry_date` datetime NOT NULL,
      `enquiry_type` varchar(50) default NULL,
      PRIMARY KEY  (`enquiry_id`)
    )
    I am trying to work out the average enquiries by day of week. I use the following sql. This produces correct results. for example if i have a total of 10 enquires over 2 Mondays the average enquiries for a Monday will be 5. You will note from the following SQL, within the inline statement I have grouped the qury using the enquiry_date.

    Code:
    select day_of_week, 
           day_name,
           AVG(total_enquiries) as avg_enquiries
    from (select  dayofweek(enquiry_date) as day_of_week,
                  dayname(enquiry_date) as day_name,
                  count(*) as total_enquiries
          from enquiries
          group by 
          enquiry_date) il
    group by
      day_of_week, 
      day_name
    Now heres the question if I use the following SQL instead of an average I end up with a total i.e 10 enquires for a Monday. You will note in the following SQL, within the inline statment I have done the grouping this time using the aliases of day_of_week and day_name.

    Code:
    select day_of_week, 
           day_name,
           AVG(total_enquiries) as avg_enquiries
    from (select  dayofweek(enquiry_date) as day_of_week,
                  dayname(enquiry_date) as day_name,
                  count(*) as total_enquiries
          from enquiries
          group by 
          day_of_week,
          day_name) il
    group by
      day_of_week, 
      day_name
    Can any one explain to me why you get this phenomena. Why cant I get an average using either of the SQL statements.

    Any help would be appreciated. I have attached a dump to replicate.
    Attached Files Attached Files

  2. #2
    Join Date
    Nov 2003
    Posts
    2,934
    Provided Answers: 12
    Why do you post this in the ANSI SQL forum, if it's clearly a MySQL question?

  3. #3
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Your query is rather strange. This is the order in which a subselect is to be evaluated:

    1. FROM clause
    2. WHERE clause
    3. GROUP BY clause
    4. HAVING clause
    5. SELECT clause
    6. ORDER BY clause
    7. FETCH FIRST clause

    So trying to reference in the GROUP BY clause something in the SELECT list can't work because the SELECT list has not yet been evaluated. I am surprised that your database system actually accepts this syntax. I'd count it as a bug.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  4. #4
    Join Date
    Nov 2003
    Posts
    2,934
    Provided Answers: 12
    Quote Originally Posted by stolze View Post
    So trying to reference in the GROUP BY clause something in the SELECT list can't work because the SELECT list has not yet been evaluated.
    Hmm, you always reference something from the SELECT list in the GROUP BY.
    Code:
    SELECT col1, col2, count(*)
    FROM the_table
    GROUP BY col1, col2
    Some DBMS do allow to reference an alias from the SELECT list in the group by (in this case MySQL, but PostgreSQL allows it as well)

    The real difference between the two inner selects is that the first one does a wrong group by as it groups by an expression (inquiry_date) that is not part of the select list.

    Allowing unrelated (and thus wrong) GROUP BY expressions is a shortcoming (bug?) of MySQL.

    The second inner select is actually correct because it groups by both (non-aggregate) expressions in the select list (it's using the column alias for the group by but conceptionally it is correct)

  5. #5
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by shammat View Post
    Hmm, you always reference something from the SELECT list in the GROUP BY.
    Code:
    SELECT col1, col2, count(*)
    FROM the_table
    GROUP BY col1, col2
    No, you reference a column from the table "the_table" here. In fact, you can group on a column that is not even in the SELECT list.

    Some DBMS do allow to reference an alias from the SELECT list in the group by (in this case MySQL, but PostgreSQL allows it as well)
    I admit, I didn't search very long, but I couldn't find it in the ISO/ANSI standard. So you are probably talking about product-specific extensions.

    The real difference between the two inner selects is that the first one does a wrong group by as it groups by an expression (inquiry_date) that is not part of the select list.

    Allowing unrelated (and thus wrong) GROUP BY expressions is a shortcoming (bug?) of MySQL.
    No, that's perfectly valid. Grouping by an expression in the SELECT list only and using the alias for that expression in the GROUP BY clause is not standard SQL.

    The second inner select is actually correct because it groups by both (non-aggregate) expressions in the select list (it's using the column alias for the group by but conceptionally it is correct)
    You can have n columns/expressions in the SELECT list (aggegations or not) and you can have m completely different columns/expressions in the GROUP BY clause. Both lists are not tied to each other. In most cases, it doesn't make much sense to group on something not in the SELECT list or to leave out some non-aggregation column from the grouping sets, but semantically that is permissible. For example, if you have a functional dependency between columns C1 and C2, where C2 is fully dependent on C1, grouping by (C1, C2) is identical to grouping by C1 only because the combination cannot produce more distinct rows than C1 alone.

    If there is something to complain about, it is probably the order of the different clauses. If SELECT would come at the very end, it is (hopefully) easier to understand that it is evaluated after everything coming before it.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by stolze View Post
    Grouping by an expression in the SELECT list only and using the alias for that expression in the GROUP BY clause is not standard SQL.
    well i do not have a copy of the standard, so i always run my query through the mimer validator

    this query --
    Code:
    select a, b, c*1.1+d AS e
    from t
    group by e
    apparently conforms to SQL-2003 and even SQL-99

    now, maybe the mimer validator isn't accurate, and maybe something will work perfectly okay in some database systems but not others...

    ... but i wouldn't personally get too concerned about the standards
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Dec 2008
    Location
    At work...
    Posts
    92
    Quote Originally Posted by shammat View Post
    Some DBMS do allow to reference an alias from the SELECT list in the group by (in this case MySQL, but PostgreSQL allows it as well)
    But ANSI SQL doesn't. Column aliases are typically used to rename the result columns, and in the ORDER BY clause.

    As in Stolze's evaluation order:
    5. SELECT clause
    6. ORDER BY clause

  8. #8
    Join Date
    Dec 2008
    Location
    At work...
    Posts
    92
    Quote Originally Posted by r937 View Post
    well i do not have a copy of the standard, so i always run my query through the mimer validator

    this query --
    Code:
    select a, b, c*1.1+d AS e
    from t
    group by e
    apparently conforms to SQL-2003 and even SQL-99
    The Mimer SQL Validator verifies syntax, it does NOT verify column names. How is it supposed to know if your table t has a column named e or not?

  9. #9
    Join Date
    Nov 2003
    Posts
    2,934
    Provided Answers: 12
    Quote Originally Posted by stolze View Post
    I admit, I didn't search very long, but I couldn't find it in the ISO/ANSI standard. So you are probably talking about product-specific extensions.
    That's what I said

    You can have n columns/expressions in the SELECT list (aggegations or not) and you can have m completely different columns/expressions in the GROUP BY clause. Both lists are not tied to each other.
    Well all the database that I use see this differently. Oracle, PostgreSQL, SQL Server and DB2 throw an error if you have a column in the GROUP BY part that is not part of the SELECT list.

    In most cases, it doesn't make much sense to group on something not in the SELECT list or to leave out some non-aggregation column from the grouping sets
    I'd say there it never makes sense.

  10. #10
    Join Date
    Dec 2008
    Location
    At work...
    Posts
    92
    Quote Originally Posted by shammat View Post
    Well all the database that I use see this differently. Oracle, PostgreSQL, SQL Server and DB2 throw an error if you have a column in the GROUP BY part that is not part of the SELECT list.

    I'd say there it never makes sense.
    Odd, I just checked MS SQL Server, DB2 and MySQL, and none of them complained.

    http :// sqlzoo.net/

    Query:
    SELECT sum(population)
    FROM bbc
    GROUP BY region;



    The grouping columns are typically wanted in the select list, but once in a while I write queries where I see no use of them. (It has happened, but I can't remember exactly what I did...)

  11. #11
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by shammat View Post
    That's what I said
    Right. But since this is a group about the ISO/ANSI standard, those product-specific things are irrelevant for answers with respect to what is or is not covered in ISO/ANSI SQL.

    I'd say there it never makes sense.
    That's a strong statement. I don't know which requirements a certain application may have to need such a feature. So I would never be in a position to claim that it "never makes sense". But that's beside the point. The point is that grouping the result rows is an orthogonal concept to the SELECT list. The only dependency that exists is that aggregate functions are applied to a group defined by a GROUP BY clause. For everything else, there are no strings attached. That's what the standard requires and what products implement.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  12. #12
    Join Date
    Mar 2007
    Posts
    212
    Ok, a number of you have made suggestions buts what the correct SQL to obtain the correct results?

    Basically I want the average number of enquiries for each DAY OF THE WEEK i.e Mon-Sun.

    I've run both SQL queries in both MySQL and SQL Server and they both produce the same results.

    As far as I gather the second SQL query doesn't produce the desired results because the sub-query is grouping on the day of the week and since there are only ever 7 days in a week it will group the 2 Mondays together with 10 as the total. In other word it only produces 7 rows of data for each day of the week. The main select statement then does an avg on this sub query and since the sub-query now only has 1 row to represent Monday then the avg will always equal the total in the subquery for each day of the week becuase you are always dividing by 1.

    Now using the first SQL query, by default the AVG function seems to produce the avg as a whole number - this is in SQL Server. I've tried the following to display the average with 2 decimal places but it doesnt work. Is there any way to achieve this?

    Code:
    ROUND(AVG(total_enquiries),2) as avg_enquiries

  13. #13
    Join Date
    Nov 2003
    Posts
    2,934
    Provided Answers: 12
    Quote Originally Posted by JarlH View Post
    Odd, I just checked MS SQL Server, DB2 and MySQL, and none of them complained.
    Extremely odd...
    Postgres:
    Code:
    c:\>psql
    Aktive Codepage: 1252.
    psql (8.4.1)
    Type "help" for help.
    
    postgres=> create table group_test (id integer primary key, some_column varchar(20));
    NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "group_test_pkey" for table "group_test"
    CREATE TABLE
    postgres=> select id, count(*) from group_test group by some_column;
    ERROR:  column "group_test.id" must appear in the GROUP BY clause or be used in an aggregate function
    LINE 1: select id, count(*) from group_test group by some_column;
                   ^
    postgres=>
    Oracle:
    Code:
    C:\temp>sqlplus scott/tiger@oradb
    
    SQL*Plus: Release 10.2.0.1.0 - Production on Thu Dec 10 22:39:52 2009
    
    Copyright (c) 1982, 2005, Oracle.  All rights reserved.
    
    
    Connected to:
    Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
    With the OLAP and Data Mining options
    
    SQL> create table group_test (id integer primary key, some_column varchar(20));
    
    Table created.
    
    SQL> select id, count(*) from group_test group by some_column;
    select id, count(*) from group_test group by some_column
           *
    ERROR at line 1:
    ORA-00979: not a GROUP BY expression
    
    
    SQL>
    DB2 (9.7)
    Code:
    c:\temp>db2 connect to sample
    
       Database Connection Information
    
     Database server        = DB2/NT 9.7.0
     SQL authorization ID   = DB2ADMIN
     Local database alias   = SAMPLE
    
    c:\temp>db2 create table group_test (id integer not null primary key, some_column varchar(20))
    DB20000I  The SQL command completed successfully.
    
    c:\temp>db2 select id, count(*) from group_test group by some_column
    SQL0119N  An expression starting with "ID" specified in a SELECT clause,
    HAVING clause, or ORDER BY clause is not specified in the GROUP BY clause or
    it is in a SELECT clause, HAVING clause, or ORDER BY clause with a column
    function and no GROUP BY clause is specified.  SQLSTATE=42803
    
    c:\temp>
    SQL Server 2005
    Code:
    c:\temp>osql -S WALLACE\SQLEXPRESS -U myuser
    Password:
    1> create table group_test (id integer primary key, some_column varchar(20))
    2> go
    1> select id, count(*) from group_test group by some_column
    2> go
    Msg 8120, Level 16, State 1, Server WALLACE\SQLEXPRESS, Line 1
    Column 'group_test.id' is invalid in the select list because it is not contained in either an aggregate function or the
    GROUP BY clause.
    1>

  14. #14
    Join Date
    Dec 2008
    Location
    At work...
    Posts
    92
    Quote Originally Posted by shammat View Post
    Extremely odd...
    [...]
    1> select id, count(*) from group_test group by some_column
    2> go
    Msg 8120, Level 16, State 1, Server WALLACE\SQLEXPRESS, Line 1
    Column 'group_test.id' is invalid in the select list because it is not contained in either an aggregate function or the
    GROUP BY clause.
    1>[/code]
    You originally claimed
    Well all the database that I use see this differently. Oracle, PostgreSQL, SQL Server and DB2 throw an error if you have a column in the GROUP BY part that is not part of the SELECT list.
    I said that you can have a column in the GROUP BY that is not in the SELECT list.

    Here you are trying to do the opposite, a column id in the SELECT list which "is not contained in either an aggregate function or the GROUP BY clause".

  15. #15
    Join Date
    Nov 2003
    Posts
    2,934
    Provided Answers: 12
    Quote Originally Posted by JarlH View Post
    You originally claimed


    I said that you can have a column in the GROUP BY that is not in the SELECT list.

    Here you are trying to do the opposite, a column id in the SELECT list which "is not contained in either an aggregate function or the GROUP BY clause".
    Sorry for that, that was indeed incorrectly put on my side.

    My example statement is valid for MySQL and that's what I wanted to point out (the shortcomings of grouping in MySQL)

    Re-reading my posts from a "distance" I do understand the confusion.
    Sorry again, next time I'll try to be more precise...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •