If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > ANSI SQL > Strange results when working out average of a count

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 12-09-09, 16:19
ozzii ozzii is offline
Registered User
 
Join Date: Mar 2007
Posts: 194
Strange results when working out average of a count

Hi

I have the following mySQL table:

Code:
CREATE TABLE `enquiries` (
  `enquiry_id` int(10) unsigned NOT NULL auto_increment,
  `enquiry_date` datetime NOT NULL,
  `enquiry_type` varchar(50) default NULL,
  PRIMARY KEY  (`enquiry_id`)
)
I am trying to work out the average enquiries by day of week. I use the following sql. This produces correct results. for example if i have a total of 10 enquires over 2 Mondays the average enquiries for a Monday will be 5. You will note from the following SQL, within the inline statement I have grouped the qury using the enquiry_date.

Code:
select day_of_week, 
       day_name,
       AVG(total_enquiries) as avg_enquiries
from (select  dayofweek(enquiry_date) as day_of_week,
              dayname(enquiry_date) as day_name,
              count(*) as total_enquiries
      from enquiries
      group by 
      enquiry_date) il
group by
  day_of_week, 
  day_name
Now heres the question if I use the following SQL instead of an average I end up with a total i.e 10 enquires for a Monday. You will note in the following SQL, within the inline statment I have done the grouping this time using the aliases of day_of_week and day_name.

Code:
select day_of_week, 
       day_name,
       AVG(total_enquiries) as avg_enquiries
from (select  dayofweek(enquiry_date) as day_of_week,
              dayname(enquiry_date) as day_name,
              count(*) as total_enquiries
      from enquiries
      group by 
      day_of_week,
      day_name) il
group by
  day_of_week, 
  day_name
Can any one explain to me why you get this phenomena. Why cant I get an average using either of the SQL statements.

Any help would be appreciated. I have attached a dump to replicate.
Attached Files
File Type: txt test_db.txt (1.9 KB, 86 views)
Reply With Quote
  #2 (permalink)  
Old 12-09-09, 17:09
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,298
Why do you post this in the ANSI SQL forum, if it's clearly a MySQL question?
Reply With Quote
  #3 (permalink)  
Old 12-09-09, 17:19
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,578
Your query is rather strange. This is the order in which a subselect is to be evaluated:

1. FROM clause
2. WHERE clause
3. GROUP BY clause
4. HAVING clause
5. SELECT clause
6. ORDER BY clause
7. FETCH FIRST clause

So trying to reference in the GROUP BY clause something in the SELECT list can't work because the SELECT list has not yet been evaluated. I am surprised that your database system actually accepts this syntax. I'd count it as a bug.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #4 (permalink)  
Old 12-09-09, 18:54
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,298
Quote:
Originally Posted by stolze View Post
So trying to reference in the GROUP BY clause something in the SELECT list can't work because the SELECT list has not yet been evaluated.
Hmm, you always reference something from the SELECT list in the GROUP BY.
Code:
SELECT col1, col2, count(*)
FROM the_table
GROUP BY col1, col2
Some DBMS do allow to reference an alias from the SELECT list in the group by (in this case MySQL, but PostgreSQL allows it as well)

The real difference between the two inner selects is that the first one does a wrong group by as it groups by an expression (inquiry_date) that is not part of the select list.

Allowing unrelated (and thus wrong) GROUP BY expressions is a shortcoming (bug?) of MySQL.

The second inner select is actually correct because it groups by both (non-aggregate) expressions in the select list (it's using the column alias for the group by but conceptionally it is correct)
Reply With Quote
  #5 (permalink)  
Old 12-09-09, 19:28
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,578
Quote:
Originally Posted by shammat View Post
Hmm, you always reference something from the SELECT list in the GROUP BY.
Code:
SELECT col1, col2, count(*)
FROM the_table
GROUP BY col1, col2
No, you reference a column from the table "the_table" here. In fact, you can group on a column that is not even in the SELECT list.

Quote:
Some DBMS do allow to reference an alias from the SELECT list in the group by (in this case MySQL, but PostgreSQL allows it as well)
I admit, I didn't search very long, but I couldn't find it in the ISO/ANSI standard. So you are probably talking about product-specific extensions.

Quote:
The real difference between the two inner selects is that the first one does a wrong group by as it groups by an expression (inquiry_date) that is not part of the select list.

Allowing unrelated (and thus wrong) GROUP BY expressions is a shortcoming (bug?) of MySQL.
No, that's perfectly valid. Grouping by an expression in the SELECT list only and using the alias for that expression in the GROUP BY clause is not standard SQL.

Quote:
The second inner select is actually correct because it groups by both (non-aggregate) expressions in the select list (it's using the column alias for the group by but conceptionally it is correct)
You can have n columns/expressions in the SELECT list (aggegations or not) and you can have m completely different columns/expressions in the GROUP BY clause. Both lists are not tied to each other. In most cases, it doesn't make much sense to group on something not in the SELECT list or to leave out some non-aggregation column from the grouping sets, but semantically that is permissible. For example, if you have a functional dependency between columns C1 and C2, where C2 is fully dependent on C1, grouping by (C1, C2) is identical to grouping by C1 only because the combination cannot produce more distinct rows than C1 alone.

If there is something to complain about, it is probably the order of the different clauses. If SELECT would come at the very end, it is (hopefully) easier to understand that it is evaluated after everything coming before it.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #6 (permalink)  
Old 12-09-09, 23:29
r937 r937 is online now
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,085
Quote:
Originally Posted by stolze View Post
Grouping by an expression in the SELECT list only and using the alias for that expression in the GROUP BY clause is not standard SQL.
well i do not have a copy of the standard, so i always run my query through the mimer validator

this query --
Code:
select a, b, c*1.1+d AS e
from t
group by e
apparently conforms to SQL-2003 and even SQL-99

now, maybe the mimer validator isn't accurate, and maybe something will work perfectly okay in some database systems but not others...

... but i wouldn't personally get too concerned about the standards
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #7 (permalink)  
Old 12-10-09, 03:22
JarlH JarlH is offline
Registered User
 
Join Date: Dec 2008
Location: At work...
Posts: 58
Quote:
Originally Posted by shammat View Post
Some DBMS do allow to reference an alias from the SELECT list in the group by (in this case MySQL, but PostgreSQL allows it as well)
But ANSI SQL doesn't. Column aliases are typically used to rename the result columns, and in the ORDER BY clause.

As in Stolze's evaluation order:
5. SELECT clause
6. ORDER BY clause
Reply With Quote
  #8 (permalink)  
Old 12-10-09, 03:26
JarlH JarlH is offline
Registered User
 
Join Date: Dec 2008
Location: At work...
Posts: 58
Quote:
Originally Posted by r937 View Post
well i do not have a copy of the standard, so i always run my query through the mimer validator

this query --
Code:
select a, b, c*1.1+d AS e
from t
group by e
apparently conforms to SQL-2003 and even SQL-99
The Mimer SQL Validator verifies syntax, it does NOT verify column names. How is it supposed to know if your table t has a column named e or not?
Reply With Quote
  #9 (permalink)  
Old 12-10-09, 05:05
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,298
Quote:
Originally Posted by stolze View Post
I admit, I didn't search very long, but I couldn't find it in the ISO/ANSI standard. So you are probably talking about product-specific extensions.
That's what I said

Quote:
You can have n columns/expressions in the SELECT list (aggegations or not) and you can have m completely different columns/expressions in the GROUP BY clause. Both lists are not tied to each other.
Well all the database that I use see this differently. Oracle, PostgreSQL, SQL Server and DB2 throw an error if you have a column in the GROUP BY part that is not part of the SELECT list.

Quote:
In most cases, it doesn't make much sense to group on something not in the SELECT list or to leave out some non-aggregation column from the grouping sets
I'd say there it never makes sense.
Reply With Quote
  #10 (permalink)  
Old 12-10-09, 05:36
JarlH JarlH is offline
Registered User
 
Join Date: Dec 2008
Location: At work...
Posts: 58
Quote:
Originally Posted by shammat View Post
Well all the database that I use see this differently. Oracle, PostgreSQL, SQL Server and DB2 throw an error if you have a column in the GROUP BY part that is not part of the SELECT list.

I'd say there it never makes sense.
Odd, I just checked MS SQL Server, DB2 and MySQL, and none of them complained.

http :// sqlzoo.net/

Query:
SELECT sum(population)
FROM bbc
GROUP BY region;



The grouping columns are typically wanted in the select list, but once in a while I write queries where I see no use of them. (It has happened, but I can't remember exactly what I did...)
Reply With Quote
  #11 (permalink)  
Old 12-10-09, 11:23
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,578
Quote:
Originally Posted by shammat View Post
That's what I said
Right. But since this is a group about the ISO/ANSI standard, those product-specific things are irrelevant for answers with respect to what is or is not covered in ISO/ANSI SQL.

Quote:
I'd say there it never makes sense.
That's a strong statement. I don't know which requirements a certain application may have to need such a feature. So I would never be in a position to claim that it "never makes sense". But that's beside the point. The point is that grouping the result rows is an orthogonal concept to the SELECT list. The only dependency that exists is that aggregate functions are applied to a group defined by a GROUP BY clause. For everything else, there are no strings attached. That's what the standard requires and what products implement.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #12 (permalink)  
Old 12-10-09, 14:03
ozzii ozzii is offline
Registered User
 
Join Date: Mar 2007
Posts: 194
Ok, a number of you have made suggestions buts what the correct SQL to obtain the correct results?

Basically I want the average number of enquiries for each DAY OF THE WEEK i.e Mon-Sun.

I've run both SQL queries in both MySQL and SQL Server and they both produce the same results.

As far as I gather the second SQL query doesn't produce the desired results because the sub-query is grouping on the day of the week and since there are only ever 7 days in a week it will group the 2 Mondays together with 10 as the total. In other word it only produces 7 rows of data for each day of the week. The main select statement then does an avg on this sub query and since the sub-query now only has 1 row to represent Monday then the avg will always equal the total in the subquery for each day of the week becuase you are always dividing by 1.

Now using the first SQL query, by default the AVG function seems to produce the avg as a whole number - this is in SQL Server. I've tried the following to display the average with 2 decimal places but it doesnt work. Is there any way to achieve this?

Code:
ROUND(AVG(total_enquiries),2) as avg_enquiries
Reply With Quote
  #13 (permalink)  
Old 12-10-09, 17:49
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,298
Quote:
Originally Posted by JarlH View Post
Odd, I just checked MS SQL Server, DB2 and MySQL, and none of them complained.
Extremely odd...
Postgres:
Code:
c:\>psql
Aktive Codepage: 1252.
psql (8.4.1)
Type "help" for help.

postgres=> create table group_test (id integer primary key, some_column varchar(20));
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "group_test_pkey" for table "group_test"
CREATE TABLE
postgres=> select id, count(*) from group_test group by some_column;
ERROR:  column "group_test.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: select id, count(*) from group_test group by some_column;
               ^
postgres=>
Oracle:
Code:
C:\temp>sqlplus scott/tiger@oradb

SQL*Plus: Release 10.2.0.1.0 - Production on Thu Dec 10 22:39:52 2009

Copyright (c) 1982, 2005, Oracle.  All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the OLAP and Data Mining options

SQL> create table group_test (id integer primary key, some_column varchar(20));

Table created.

SQL> select id, count(*) from group_test group by some_column;
select id, count(*) from group_test group by some_column
       *
ERROR at line 1:
ORA-00979: not a GROUP BY expression


SQL>
DB2 (9.7)
Code:
c:\temp>db2 connect to sample

   Database Connection Information

 Database server        = DB2/NT 9.7.0
 SQL authorization ID   = DB2ADMIN
 Local database alias   = SAMPLE

c:\temp>db2 create table group_test (id integer not null primary key, some_column varchar(20))
DB20000I  The SQL command completed successfully.

c:\temp>db2 select id, count(*) from group_test group by some_column
SQL0119N  An expression starting with "ID" specified in a SELECT clause,
HAVING clause, or ORDER BY clause is not specified in the GROUP BY clause or
it is in a SELECT clause, HAVING clause, or ORDER BY clause with a column
function and no GROUP BY clause is specified.  SQLSTATE=42803

c:\temp>
SQL Server 2005
Code:
c:\temp>osql -S WALLACE\SQLEXPRESS -U myuser
Password:
1> create table group_test (id integer primary key, some_column varchar(20))
2> go
1> select id, count(*) from group_test group by some_column
2> go
Msg 8120, Level 16, State 1, Server WALLACE\SQLEXPRESS, Line 1
Column 'group_test.id' is invalid in the select list because it is not contained in either an aggregate function or the
GROUP BY clause.
1>
Reply With Quote
  #14 (permalink)  
Old 12-11-09, 03:28
JarlH JarlH is offline
Registered User
 
Join Date: Dec 2008
Location: At work...
Posts: 58
Quote:
Originally Posted by shammat View Post
Extremely odd...
[...]
1> select id, count(*) from group_test group by some_column
2> go
Msg 8120, Level 16, State 1, Server WALLACE\SQLEXPRESS, Line 1
Column 'group_test.id' is invalid in the select list because it is not contained in either an aggregate function or the
GROUP BY clause.
1>[/code]
You originally claimed
Quote:
Well all the database that I use see this differently. Oracle, PostgreSQL, SQL Server and DB2 throw an error if you have a column in the GROUP BY part that is not part of the SELECT list.
I said that you can have a column in the GROUP BY that is not in the SELECT list.

Here you are trying to do the opposite, a column id in the SELECT list which "is not contained in either an aggregate function or the GROUP BY clause".
Reply With Quote
  #15 (permalink)  
Old 12-11-09, 04:34
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,298
Quote:
Originally Posted by JarlH View Post
You originally claimed


I said that you can have a column in the GROUP BY that is not in the SELECT list.

Here you are trying to do the opposite, a column id in the SELECT list which "is not contained in either an aggregate function or the GROUP BY clause".
Sorry for that, that was indeed incorrectly put on my side.

My example statement is valid for MySQL and that's what I wanted to point out (the shortcomings of grouping in MySQL)

Re-reading my posts from a "distance" I do understand the confusion.
Sorry again, next time I'll try to be more precise...
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On