If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > problem with using aggregate functions

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-23-10, 15:33
marcin_koss marcin_koss is offline
Registered User
 
Join Date: Nov 2009
Posts: 9
problem with using aggregate functions

Let's consider following statement to retrieve data from well known "world" database:

SELECT continent, name, MAX(population) from country GROUP BY continent;

After issuing this statement, mysql returns following result set:

continent, name, max(population)
Asia, Afghanistan, 1277558000
Europe, Netherlands, 146934000
North America, Netherlands Antilles, 278357000
Africa, Algeria, 111506000
Oceania, American Samoa, 18886000
Antarctica, Antarctica, 0
South America, Argentina, 170115000

What we have here is correct population number but the names of the countries are wrong.

Does anybody know what is missing in this statement?
Reply With Quote
  #2 (permalink)  
Old 01-23-10, 16:29
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,407
Quote:
Originally Posted by marcin_koss View Post
What we have here is correct population number but the names of the countries are wrong.
You are missing the name in the GROUP BY clause.
Code:
SELECT continent, 
       name, 
       MAX(population) 
FROM country 
GROUP BY continent, name;
I'm surprised it didn't throw an error
Reply With Quote
  #3 (permalink)  
Old 01-23-10, 16:45
marcin_koss marcin_koss is offline
Registered User
 
Join Date: Nov 2009
Posts: 9
This is not what I'm looking to get. By adding 'name' to GROUP BY clause I get all the countries listed that are in the database. What I need is only one country per one continent with highest population count.
Reply With Quote
  #4 (permalink)  
Old 01-23-10, 17:59
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,407
Quote:
Originally Posted by marcin_koss View Post
This is not what I'm looking to get. By adding 'name' to GROUP BY clause I get all the countries listed that are in the database. What I need is only one country per one continent with highest population count.
I'm not sure I understand your table structure, but it sounds like you want something like this:
Code:
SELECT c1.continent,
       c1.name,
       c1.population
FROM country c1
WHERE (c1.name,c1.population) 
    IN (SELECT c2.name,
               c2.population
        FROM country c2
        WHERE c2.continent = c1.continent
        AND   c2.population = (SELECT MAX(c3.population)
                               FROM country c3
                               WHERE c3.continent = c2.continent));
Note that if two countries have the same highest population (in one contintent) they'll show up both.

Btw: what's this "well known" world database? If the name of the continent is repeated in the country table, this does not sound like a good (normalized) design
Reply With Quote
  #5 (permalink)  
Old 01-24-10, 00:27
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
Quote:
Originally Posted by shammat View Post
I'm surprised it didn't throw an error
you must be unfamiliar with mysql's extension to standard sql in this area

it is a very long read, but worth the effort: Debunking GROUP BY myths

note that in the example in this thread, the "name" that is printed for each continent is ~not~ necessarily the one with the max population -- a classic example of the extension being misused

sadly, i have seen it misused far more often than used correctly


Quote:
Originally Posted by shammat View Post
Btw: what's this "well known" world database?
http://dev.mysql.com/doc/world-setup...rld-setup.html
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #6 (permalink)  
Old 01-25-10, 02:28
marcin_koss marcin_koss is offline
Registered User
 
Join Date: Nov 2009
Posts: 9
Shammat, I'm not quite following your example.
Here's the "country" table on which I'm trying to perform that query.

CREATE TABLE `country` (
`Code` char(3) NOT NULL default '',
`Name` char(52) NOT NULL default '',
`Continent` enum('Asia','Europe','North America','Africa','Oceania','Antarctica','South America') NOT NULL default 'Asia',
`Population` int(11) NOT NULL default '0',
PRIMARY KEY (`Code`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
Reply With Quote
  #7 (permalink)  
Old 01-25-10, 06:21
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
~jk groupwise max

several people who have benchmarked various solutions to this problem have found that joining to the uncorrelated subquery is the most efficient
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #8 (permalink)  
Old 01-25-10, 07:20
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,407
Quote:
Originally Posted by marcin_koss View Post
Shammat, I'm not quite following your example.
Did you run it?
Reply With Quote
  #9 (permalink)  
Old 01-25-10, 07:59
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
Quote:
Originally Posted by shammat View Post
Did you run it?
i did

yours runs in 11.03 seconds

this query --
Code:
SELECT country.Continent
     , country.Name
     , country.Population
  FROM ( SELECT Continent
              , MAX(Population) AS max_pop
           FROM country
         GROUP
             BY Continent ) AS m
INNER
  JOIN country
    ON country.Continent = m.Continent
   AND country.Population = m.max_pop
produces exactly the same result, but runs in 0.000 sec, although it obviously cannot be exactly zero
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #10 (permalink)  
Old 01-25-10, 09:21
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,407
Quote:
Originally Posted by r937 View Post
yours runs in 11.03 seconds
Yes, I was aware that it wasn't very efficient, but I couldn't think of a better way just then (at least not with MySQL)
Reply With Quote
  #11 (permalink)  
Old 01-25-10, 18:30
marcin_koss marcin_koss is offline
Registered User
 
Join Date: Nov 2009
Posts: 9
Yes, both statements work great, but Shammat,why your approach is so slow? I'm not sure what c1 and c2 stands for...

Thank you guys for your help!
Reply With Quote
  #12 (permalink)  
Old 01-26-10, 14:52
shammat shammat is offline
Registered User
 
Join Date: Nov 2003
Posts: 2,407
Quote:
Originally Posted by marcin_koss View Post
Shammat,why your approach is so slow?
Because MySQL does not optimize that very well, it needs to scan the table three times in order to return the result. Whereas the statement from rudy only needs two passes

c1 is a so called "alias" for the table country which basically "renames" the table for the statement.
Reply With Quote
  #13 (permalink)  
Old 01-26-10, 16:43
marcin_koss marcin_koss is offline
Registered User
 
Join Date: Nov 2009
Posts: 9
Got it. Thanks Shammat.
Reply With Quote
Reply

Tags
aggregate, group by, mysql

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On