Results 1 to 8 of 8
  1. #1
    Join Date
    Oct 2011
    Posts
    14

    Unanswered: Get more frequent value GROUPing BY another row

    The table schema:
    Code:
    CREATE TABLE `comentarios` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `nombre` varchar(32) COLLATE utf8_bin NOT NULL,
      `email` varchar(64) COLLATE utf8_bin NOT NULL,
      `comentario` text COLLATE utf8_bin NOT NULL,
      `timestamp` int(11) NOT NULL,
      `spam` tinyint(4) NOT NULL,
      PRIMARY KEY (`id`),
      KEY `email` (`email`)
    ) ENGINE=MyISAM;
    Now, #email# is like the user's ID, but you can input anything in #nombre#. So, I want to list all the "users" GROUPing by #email#, and get the most frequent #nombre# to display it next to #email#

    I search how to to this, but I found a MS SQL solution using "ROW_NUMBER() OVER (PARTITION BY _ROW_ ORDER BY COUNT(*) DESC)". Later, I search ways of transforming this to MySQL but I couldn't do it.

    Could anyone explain me how to do this query ?
    Thank you very much

  2. #2
    Join Date
    Sep 2009
    Location
    San Sebastian, Spain
    Posts
    880
    In this case you will need to perform a join against another query which returns the max(timestamp) for the email address:

    SELECT c.email, c.nombre
    FROM comentarios c
    JOIN (SELECT email, MAX(timestamp) timestamp FROM comentarios GROUP BY email) t
    ON (c.email = t.email AND c.timestamp = t.timestamp);
    Ronan Cashell
    Certified Oracle DBA/Certified MySQL Expert (DBA & Cluster DBA)
    http://www.it-iss.com
    Follow me on Twitter

  3. #3
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    ronan, he wanted MAX(COUNT(*)) instead of MAX(timestamp)

    yes, i know MAX(COUNT(*)) isn't valid

    "the most frequent #nombre#" is the challenge
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  4. #4
    Join Date
    Oct 2011
    Posts
    14
    Yes, MAX(COUNT(*)) would be great
    It's a challenge, but a doable one or it's a "NP" problem ? :P

    Quote Originally Posted by it-iss.com View Post
    SELECT c.email, c.nombre
    FROM comentarios c
    JOIN (SELECT email, MAX(timestamp) timestamp FROM comentarios GROUP BY email) t
    ON (c.email = t.email AND c.timestamp = t.timestamp);
    This couldn't be done with having ? Just because of intrigue xD
    Code:
    SELECT c.email, c.nombre
    FROM comentarios c
    GROUP BY email
    HAVING MAX(timestamp)

  5. #5
    Join Date
    Oct 2011
    Posts
    14
    Other question, but about the email key:
    The cardinality of the primary key is about 40k, and because there're about 40k comments
    And the email key, i expected to have less cardinality, because lots of emails are repeated in the table, but has the same as the primary key.

    Why is that ? Anything to do that it's varchar utf8_bin ? How can i fix it ?
    Thank you

  6. #6
    Join Date
    Sep 2009
    Location
    San Sebastian, Spain
    Posts
    880
    The primary key is unique so cardinality will always be the number of entries in the table. As for the email, what query are you using to make sure that it it not something in there? Also when the data was inserted are you sure that there was not whitespacing added to either the front or end of the email address which would cause this result? The best way is to find 2 records with the same email address and limit based on that email address only.
    Ronan Cashell
    Certified Oracle DBA/Certified MySQL Expert (DBA & Cluster DBA)
    http://www.it-iss.com
    Follow me on Twitter

  7. #7
    Join Date
    Oct 2011
    Posts
    14
    I attached an image showing the query grouping by email and the cardinality of the email index, and that are not equal.

    They don't should to be equal ?
    There's something wrong of the way of creating the index ?
    Code:
    ADD INDEX `email` ( `email` )
    Y also try with but with the same result.
    Code:
    ADD INDEX `email` ( `email` ( 64 ) )
    Attached Thumbnails Attached Thumbnails cardinality.jpg  

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by nachitox View Post
    So, I want to list all the "users" GROUPing by #email#, and get the most frequent #nombre# to display it next to #email#
    the query you showed in your attachment is exactly what you wanted, except you should add the count to the SELECT clause
    Code:
    SELECT email
         , COUNT(*) AS emails_count
      FROM comentarios
     WHERE COALESCE(email,'') <> ''
    GROUP
        BY email
    ORDER
        BY emails_count DESC LIMIT 1 -- for MAX(COUNT())
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •