Results 1 to 14 of 14
  1. #1
    Join Date
    Nov 2010
    Posts
    25

    Unanswered: IN clause and subqueries

    Hello!

    Faced with the following issue.
    Let's say we got a query:
    Code:
    SELECT * FROM main_table WHERE string_column IN ('a','b','c');
    It works instantaneous.

    Let's say we got a table sub_table with a string row containing the same 'a','b','c' values.
    The following query takes forever to finish:
    Code:
    SELECT * FROM main_table WHERE string_column IN (SELECT string_column FROM sub_table)
    Table main_table contains about 3m rows.

    Any suggestions why the latter is so slow?

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by tedd View Post
    Any suggestions why the latter is so slow?
    because it uses a subquery and not a join
    Code:
    SELECT something
         , anything
         , just_not_the_dreaded_evil_select_star 
      FROM main_table 
    INNER
      JOIN sub_table
        ON subtable.string_column = main_table.string_column
    you'd better have an index on both string_columns

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Nov 2010
    Posts
    25
    I thought that the latter query should work like that:
    1. Execute the subquery, get all values from the sub_table and feed them to the main query
    2. After p1, the query would be effectively like the first query with plain IN.

    I'm curious what does MySQL actually do inside
    Last edited by tedd; 12-24-10 at 11:19.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by tedd View Post
    I'm curious what does MySQL actually do inside
    that's easy to find out, just do an EXPLAIN

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Nov 2010
    Posts
    25
    EXPLAIN on the latter:
    Code:
    +----+--------------------+---------+------+---------------+------+---------+------+---------+-------------+
    | id | select_type        | table   | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
    +----+--------------------+---------+------+---------------+------+---------+------+---------+-------------+
    |  1 | PRIMARY            |  main   | ALL  | NULL          | NULL | NULL    | NULL | 2333072 | Using where |
    |  2 | DEPENDENT SUBQUERY |  sub    | ALL  | NULL          | NULL | NULL    | NULL |       3 | Using where |
    +----+--------------------+---------+------+---------------+------+---------+------+---------+-------------+
    I guess it takes ALL available rows in main and compares each row with a subquery`s results. Am I right? But that seems not very clever for me

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    now add some relevant indexes, and try the EXPLAIN again

    then try the EXPLAIN on the join query
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Nov 2010
    Posts
    25
    Actually all needed indexes are created already. Its strange why MySQL didn't use it.

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by tedd View Post
    Actually all needed indexes are created already. Its strange why MySQL didn't use it.
    could you do a SHOW CREATE TABLE for both tables, please
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Nov 2010
    Posts
    25
    The "main_table":
    Code:
    CREATE TABLE `sb_urls` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `url` varchar(32) DEFAULT NULL,
      `date` int(11) NOT NULL DEFAULT '0',
      `tid` int(16) NOT NULL DEFAULT '0',
      `ok` int(1) NOT NULL DEFAULT '1',
      PRIMARY KEY (`id`),
      UNIQUE KEY `url_tid` (`url`,`tid`),
      KEY `url` (`url`)
    ) ENGINE=InnoDB AUTO_INCREMENT=6383522 DEFAULT CHARSET=utf8
    The "sub_table":
    Code:
    CREATE TABLE `malware` (
      `malware_hash` varchar(32) NOT NULL,
      `status` int(1) DEFAULT '0',
      PRIMARY KEY (`malware_hash`),
      KEY `malware_index` (`malware_hash`,`status`),
      KEY `status` (`status`)
    ) ENGINE=MyISAM DEFAULT CHARSET=latin1
    And the actual query:
    Code:
    select * from sb_urls where url IN (select malware_hash from malware);

    Also, I found an interesting thing - the EXPLAIN output depends on the columns specified in the SELECT clause.
    Code:
    EXPLAIN select id,url,date from sb_urls where url IN (select malware_hash from malware);
    +----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
    | id | select_type        | table   | type  | possible_keys | key     | key_len | ref  | rows    | Extra                    |
    +----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
    |  1 | PRIMARY            | sb_urls | ALL   | NULL          | NULL    | NULL    | NULL | 2334144 | Using where              |
    |  2 | DEPENDENT SUBQUERY | malware | index | NULL          | PRIMARY | 34      | NULL |  472975 | Using where; Using index |
    +----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
    and
    Code:
    explain select id from sb_urls where url IN (select malware_hash from malware);
    +----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
    | id | select_type        | table   | type  | possible_keys | key     | key_len | ref  | rows    | Extra                    |
    +----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
    |  1 | PRIMARY            | sb_urls | index | NULL          | url     | 99      | NULL | 2334144 | Using where; Using index |
    |  2 | DEPENDENT SUBQUERY | malware | index | NULL          | PRIMARY | 34      | NULL |  472975 | Using where; Using index |
    +----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
    As you can see, in the latter query the 'url' index is used, but in the former is not.

  10. #10
    Join Date
    Nov 2010
    Posts
    25
    Also, it's strange for me that the 'key_len' column is equal to 99, instead of 32.

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by tedd View Post
    Also, it's strange for me that the 'key_len' column is equal to 99, instead of 32.
    yes, weird

    could you show the join EXPLAIN please?

    by the way, i think the extra index here is redundant --
    Code:
    UNIQUE KEY `url_tid` (`url`,`tid`),
      KEY `url` (`url`)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  12. #12
    Join Date
    Nov 2010
    Posts
    25
    Code:
    explain SELECT * FROM sb_urls INNER JOIN malware ON sb_urls.url = malware.malware_hash;
    +----+-------------+---------+-------+---------------+---------------+---------+------+--------+-------------+
    | id | select_type | table   | type  | possible_keys | key           | key_len | ref  | rows   | Extra       |
    +----+-------------+---------+-------+---------------+---------------+---------+------+--------+-------------+
    |  1 | SIMPLE      | malware | index | NULL          | malware_index | 39      | NULL | 473116 | Using index |
    |  1 | SIMPLE      | sb_urls | ref   | url_tid,url   | url_tid       | 99      | func |      1 | Using where |
    +----+-------------+---------+-------+---------------+---------------+---------+------+--------+-------------+
    Some additional info. I've talked to the author of this code, he said that initially the 'sb_urls.url' column was VARCHAR(700) DEFAULT NULL, but later he decided to ALTER TABLE sb_urls CHANGE url url VARCHAR(32) DEFAULT NULL.

    Maybe this is the reason why the key_len of the 'url' index is 99?

  13. #13
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    i don't know, but i think so

    say, have you run these different queries and checked the cpu/elapsed times?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  14. #14
    Join Date
    Nov 2010
    Posts
    25
    Looks like the whole DB is acting strange. Today I saw some simple queries executing really slow. The problem is deeper

    As always, many thanks for help, Rudy.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •