| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

01-23-10, 15:33
|
|
Registered User
|
|
Join Date: Nov 2009
Posts: 9
|
|
|
problem with using aggregate functions
|
|
Let's consider following statement to retrieve data from well known "world" database:
SELECT continent, name, MAX(population) from country GROUP BY continent;
After issuing this statement, mysql returns following result set:
continent, name, max(population)
Asia, Afghanistan, 1277558000
Europe, Netherlands, 146934000
North America, Netherlands Antilles, 278357000
Africa, Algeria, 111506000
Oceania, American Samoa, 18886000
Antarctica, Antarctica, 0
South America, Argentina, 170115000
What we have here is correct population number but the names of the countries are wrong.
Does anybody know what is missing in this statement?
|
|

01-23-10, 16:29
|
|
Registered User
|
|
Join Date: Nov 2003
Posts: 2,407
|
|
Quote:
Originally Posted by marcin_koss
What we have here is correct population number but the names of the countries are wrong.
|
You are missing the name in the GROUP BY clause.
Code:
SELECT continent,
name,
MAX(population)
FROM country
GROUP BY continent, name;
I'm surprised it didn't throw an error
|
|

01-23-10, 16:45
|
|
Registered User
|
|
Join Date: Nov 2009
Posts: 9
|
|
|
|
This is not what I'm looking to get. By adding 'name' to GROUP BY clause I get all the countries listed that are in the database. What I need is only one country per one continent with highest population count.
|
|

01-23-10, 17:59
|
|
Registered User
|
|
Join Date: Nov 2003
Posts: 2,407
|
|
Quote:
Originally Posted by marcin_koss
This is not what I'm looking to get. By adding 'name' to GROUP BY clause I get all the countries listed that are in the database. What I need is only one country per one continent with highest population count.
|
I'm not sure I understand your table structure, but it sounds like you want something like this:
Code:
SELECT c1.continent,
c1.name,
c1.population
FROM country c1
WHERE (c1.name,c1.population)
IN (SELECT c2.name,
c2.population
FROM country c2
WHERE c2.continent = c1.continent
AND c2.population = (SELECT MAX(c3.population)
FROM country c3
WHERE c3.continent = c2.continent));
Note that if two countries have the same highest population (in one contintent) they'll show up both.
Btw: what's this "well known" world database? If the name of the continent is repeated in the country table, this does not sound like a good (normalized) design 
|
|

01-24-10, 00:27
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
Quote:
Originally Posted by shammat
I'm surprised it didn't throw an error
|
you must be unfamiliar with mysql's extension to standard sql in this area
it is a very long read, but worth the effort: Debunking GROUP BY myths
note that in the example in this thread, the "name" that is printed for each continent is ~not~ necessarily the one with the max population -- a classic example of the extension being misused
sadly, i have seen it misused far more often than used correctly
Quote:
Originally Posted by shammat
Btw: what's this "well known" world database?
|
http://dev.mysql.com/doc/world-setup...rld-setup.html
|
|

01-25-10, 02:28
|
|
Registered User
|
|
Join Date: Nov 2009
Posts: 9
|
|
Shammat, I'm not quite following your example.
Here's the "country" table on which I'm trying to perform that query.
CREATE TABLE `country` (
`Code` char(3) NOT NULL default '',
`Name` char(52) NOT NULL default '',
`Continent` enum('Asia','Europe','North America','Africa','Oceania','Antarctica','South America') NOT NULL default 'Asia',
`Population` int(11) NOT NULL default '0',
PRIMARY KEY (`Code`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
|
|

01-25-10, 06:21
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
~jk groupwise max
several people who have benchmarked various solutions to this problem have found that joining to the uncorrelated subquery is the most efficient
|
|

01-25-10, 07:20
|
|
Registered User
|
|
Join Date: Nov 2003
Posts: 2,407
|
|
Quote:
Originally Posted by marcin_koss
Shammat, I'm not quite following your example.
|
Did you run it?
|
|

01-25-10, 07:59
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,525
|
|
Quote:
Originally Posted by shammat
Did you run it?
|
i did
yours runs in 11.03 seconds
this query --
Code:
SELECT country.Continent
, country.Name
, country.Population
FROM ( SELECT Continent
, MAX(Population) AS max_pop
FROM country
GROUP
BY Continent ) AS m
INNER
JOIN country
ON country.Continent = m.Continent
AND country.Population = m.max_pop
produces exactly the same result, but runs in 0.000 sec, although it obviously cannot be exactly zero
|
|

01-25-10, 09:21
|
|
Registered User
|
|
Join Date: Nov 2003
Posts: 2,407
|
|
Quote:
Originally Posted by r937
yours runs in 11.03 seconds
|
Yes, I was aware that it wasn't very efficient, but I couldn't think of a better way just then (at least not with MySQL)
|
|

01-25-10, 18:30
|
|
Registered User
|
|
Join Date: Nov 2009
Posts: 9
|
|
Yes, both statements work great, but Shammat,why your approach is so slow? I'm not sure what c1 and c2 stands for...
Thank you guys for your help!
|
|

01-26-10, 14:52
|
|
Registered User
|
|
Join Date: Nov 2003
Posts: 2,407
|
|
Quote:
Originally Posted by marcin_koss
Shammat,why your approach is so slow?
|
Because MySQL does not optimize that very well, it needs to scan the table three times in order to return the result. Whereas the statement from rudy only needs two passes
c1 is a so called "alias" for the table country which basically "renames" the table for the statement.
|
|

01-26-10, 16:43
|
|
Registered User
|
|
Join Date: Nov 2009
Posts: 9
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|