Results 1 to 13 of 13
  1. #1
    Join Date
    Nov 2009
    Posts
    9

    Unanswered: problem with using aggregate functions

    Let's consider following statement to retrieve data from well known "world" database:

    SELECT continent, name, MAX(population) from country GROUP BY continent;

    After issuing this statement, mysql returns following result set:

    continent, name, max(population)
    Asia, Afghanistan, 1277558000
    Europe, Netherlands, 146934000
    North America, Netherlands Antilles, 278357000
    Africa, Algeria, 111506000
    Oceania, American Samoa, 18886000
    Antarctica, Antarctica, 0
    South America, Argentina, 170115000

    What we have here is correct population number but the names of the countries are wrong.

    Does anybody know what is missing in this statement?

  2. #2
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Quote Originally Posted by marcin_koss View Post
    What we have here is correct population number but the names of the countries are wrong.
    You are missing the name in the GROUP BY clause.
    Code:
    SELECT continent, 
           name, 
           MAX(population) 
    FROM country 
    GROUP BY continent, name;
    I'm surprised it didn't throw an error

  3. #3
    Join Date
    Nov 2009
    Posts
    9
    This is not what I'm looking to get. By adding 'name' to GROUP BY clause I get all the countries listed that are in the database. What I need is only one country per one continent with highest population count.

  4. #4
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Quote Originally Posted by marcin_koss View Post
    This is not what I'm looking to get. By adding 'name' to GROUP BY clause I get all the countries listed that are in the database. What I need is only one country per one continent with highest population count.
    I'm not sure I understand your table structure, but it sounds like you want something like this:
    Code:
    SELECT c1.continent,
           c1.name,
           c1.population
    FROM country c1
    WHERE (c1.name,c1.population) 
        IN (SELECT c2.name,
                   c2.population
            FROM country c2
            WHERE c2.continent = c1.continent
            AND   c2.population = (SELECT MAX(c3.population)
                                   FROM country c3
                                   WHERE c3.continent = c2.continent));
    Note that if two countries have the same highest population (in one contintent) they'll show up both.

    Btw: what's this "well known" world database? If the name of the continent is repeated in the country table, this does not sound like a good (normalized) design

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by shammat View Post
    I'm surprised it didn't throw an error
    you must be unfamiliar with mysql's extension to standard sql in this area

    it is a very long read, but worth the effort: Debunking GROUP BY myths

    note that in the example in this thread, the "name" that is printed for each continent is ~not~ necessarily the one with the max population -- a classic example of the extension being misused

    sadly, i have seen it misused far more often than used correctly


    Quote Originally Posted by shammat View Post
    Btw: what's this "well known" world database?
    http://dev.mysql.com/doc/world-setup...rld-setup.html
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  6. #6
    Join Date
    Nov 2009
    Posts
    9
    Shammat, I'm not quite following your example.
    Here's the "country" table on which I'm trying to perform that query.

    CREATE TABLE `country` (
    `Code` char(3) NOT NULL default '',
    `Name` char(52) NOT NULL default '',
    `Continent` enum('Asia','Europe','North America','Africa','Oceania','Antarctica','South America') NOT NULL default 'Asia',
    `Population` int(11) NOT NULL default '0',
    PRIMARY KEY (`Code`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    ~jk groupwise max

    several people who have benchmarked various solutions to this problem have found that joining to the uncorrelated subquery is the most efficient
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Quote Originally Posted by marcin_koss View Post
    Shammat, I'm not quite following your example.
    Did you run it?

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by shammat View Post
    Did you run it?
    i did

    yours runs in 11.03 seconds

    this query --
    Code:
    SELECT country.Continent
         , country.Name
         , country.Population
      FROM ( SELECT Continent
                  , MAX(Population) AS max_pop
               FROM country
             GROUP
                 BY Continent ) AS m
    INNER
      JOIN country
        ON country.Continent = m.Continent
       AND country.Population = m.max_pop
    produces exactly the same result, but runs in 0.000 sec, although it obviously cannot be exactly zero
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  10. #10
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Quote Originally Posted by r937 View Post
    yours runs in 11.03 seconds
    Yes, I was aware that it wasn't very efficient, but I couldn't think of a better way just then (at least not with MySQL)

  11. #11
    Join Date
    Nov 2009
    Posts
    9
    Yes, both statements work great, but Shammat,why your approach is so slow? I'm not sure what c1 and c2 stands for...

    Thank you guys for your help!

  12. #12
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Quote Originally Posted by marcin_koss View Post
    Shammat,why your approach is so slow?
    Because MySQL does not optimize that very well, it needs to scan the table three times in order to return the result. Whereas the statement from rudy only needs two passes

    c1 is a so called "alias" for the table country which basically "renames" the table for the statement.

  13. #13
    Join Date
    Nov 2009
    Posts
    9
    Got it. Thanks Shammat.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •