If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > IN clause and subqueries

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 12-24-10, 08:30
tedd tedd is offline
Registered User
 
Join Date: Nov 2010
Posts: 24
IN clause and subqueries

Hello!

Faced with the following issue.
Let's say we got a query:
Code:
SELECT * FROM main_table WHERE string_column IN ('a','b','c');
It works instantaneous.

Let's say we got a table sub_table with a string row containing the same 'a','b','c' values.
The following query takes forever to finish:
Code:
SELECT * FROM main_table WHERE string_column IN (SELECT string_column FROM sub_table)
Table main_table contains about 3m rows.

Any suggestions why the latter is so slow?
Reply With Quote
  #2 (permalink)  
Old 12-24-10, 08:49
r937 r937 is online now
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Quote:
Originally Posted by tedd View Post
Any suggestions why the latter is so slow?
because it uses a subquery and not a join
Code:
SELECT something
     , anything
     , just_not_the_dreaded_evil_select_star 
  FROM main_table 
INNER
  JOIN sub_table
    ON subtable.string_column = main_table.string_column
you'd better have an index on both string_columns

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 12-24-10, 10:12
tedd tedd is offline
Registered User
 
Join Date: Nov 2010
Posts: 24
I thought that the latter query should work like that:
1. Execute the subquery, get all values from the sub_table and feed them to the main query
2. After p1, the query would be effectively like the first query with plain IN.

I'm curious what does MySQL actually do inside

Last edited by tedd; 12-24-10 at 10:19.
Reply With Quote
  #4 (permalink)  
Old 12-24-10, 10:20
r937 r937 is online now
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Quote:
Originally Posted by tedd View Post
I'm curious what does MySQL actually do inside
that's easy to find out, just do an EXPLAIN

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 12-24-10, 10:32
tedd tedd is offline
Registered User
 
Join Date: Nov 2010
Posts: 24
EXPLAIN on the latter:
Code:
+----+--------------------+---------+------+---------------+------+---------+------+---------+-------------+
| id | select_type        | table   | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+--------------------+---------+------+---------------+------+---------+------+---------+-------------+
|  1 | PRIMARY            |  main   | ALL  | NULL          | NULL | NULL    | NULL | 2333072 | Using where |
|  2 | DEPENDENT SUBQUERY |  sub    | ALL  | NULL          | NULL | NULL    | NULL |       3 | Using where |
+----+--------------------+---------+------+---------------+------+---------+------+---------+-------------+
I guess it takes ALL available rows in main and compares each row with a subquery`s results. Am I right? But that seems not very clever for me
Reply With Quote
  #6 (permalink)  
Old 12-24-10, 10:35
r937 r937 is online now
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
now add some relevant indexes, and try the EXPLAIN again

then try the EXPLAIN on the join query
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #7 (permalink)  
Old 12-24-10, 10:56
tedd tedd is offline
Registered User
 
Join Date: Nov 2010
Posts: 24
Actually all needed indexes are created already. Its strange why MySQL didn't use it.
Reply With Quote
  #8 (permalink)  
Old 12-24-10, 11:22
r937 r937 is online now
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Quote:
Originally Posted by tedd View Post
Actually all needed indexes are created already. Its strange why MySQL didn't use it.
could you do a SHOW CREATE TABLE for both tables, please
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #9 (permalink)  
Old 12-24-10, 11:47
tedd tedd is offline
Registered User
 
Join Date: Nov 2010
Posts: 24
The "main_table":
Code:
CREATE TABLE `sb_urls` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `url` varchar(32) DEFAULT NULL,
  `date` int(11) NOT NULL DEFAULT '0',
  `tid` int(16) NOT NULL DEFAULT '0',
  `ok` int(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`),
  UNIQUE KEY `url_tid` (`url`,`tid`),
  KEY `url` (`url`)
) ENGINE=InnoDB AUTO_INCREMENT=6383522 DEFAULT CHARSET=utf8
The "sub_table":
Code:
CREATE TABLE `malware` (
  `malware_hash` varchar(32) NOT NULL,
  `status` int(1) DEFAULT '0',
  PRIMARY KEY (`malware_hash`),
  KEY `malware_index` (`malware_hash`,`status`),
  KEY `status` (`status`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
And the actual query:
Code:
select * from sb_urls where url IN (select malware_hash from malware);

Also, I found an interesting thing - the EXPLAIN output depends on the columns specified in the SELECT clause.
Code:
EXPLAIN select id,url,date from sb_urls where url IN (select malware_hash from malware);
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type        | table   | type  | possible_keys | key     | key_len | ref  | rows    | Extra                    |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
|  1 | PRIMARY            | sb_urls | ALL   | NULL          | NULL    | NULL    | NULL | 2334144 | Using where              |
|  2 | DEPENDENT SUBQUERY | malware | index | NULL          | PRIMARY | 34      | NULL |  472975 | Using where; Using index |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
and
Code:
explain select id from sb_urls where url IN (select malware_hash from malware);
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type        | table   | type  | possible_keys | key     | key_len | ref  | rows    | Extra                    |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
|  1 | PRIMARY            | sb_urls | index | NULL          | url     | 99      | NULL | 2334144 | Using where; Using index |
|  2 | DEPENDENT SUBQUERY | malware | index | NULL          | PRIMARY | 34      | NULL |  472975 | Using where; Using index |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
As you can see, in the latter query the 'url' index is used, but in the former is not.
Reply With Quote
  #10 (permalink)  
Old 12-24-10, 11:52
tedd tedd is offline
Registered User
 
Join Date: Nov 2010
Posts: 24
Also, it's strange for me that the 'key_len' column is equal to 99, instead of 32.
Reply With Quote
  #11 (permalink)  
Old 12-24-10, 12:00
r937 r937 is online now
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Quote:
Originally Posted by tedd View Post
Also, it's strange for me that the 'key_len' column is equal to 99, instead of 32.
yes, weird

could you show the join EXPLAIN please?

by the way, i think the extra index here is redundant --
Code:
UNIQUE KEY `url_tid` (`url`,`tid`),
  KEY `url` (`url`)
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #12 (permalink)  
Old 12-24-10, 13:47
tedd tedd is offline
Registered User
 
Join Date: Nov 2010
Posts: 24
Code:
explain SELECT * FROM sb_urls INNER JOIN malware ON sb_urls.url = malware.malware_hash;
+----+-------------+---------+-------+---------------+---------------+---------+------+--------+-------------+
| id | select_type | table   | type  | possible_keys | key           | key_len | ref  | rows   | Extra       |
+----+-------------+---------+-------+---------------+---------------+---------+------+--------+-------------+
|  1 | SIMPLE      | malware | index | NULL          | malware_index | 39      | NULL | 473116 | Using index |
|  1 | SIMPLE      | sb_urls | ref   | url_tid,url   | url_tid       | 99      | func |      1 | Using where |
+----+-------------+---------+-------+---------------+---------------+---------+------+--------+-------------+
Some additional info. I've talked to the author of this code, he said that initially the 'sb_urls.url' column was VARCHAR(700) DEFAULT NULL, but later he decided to ALTER TABLE sb_urls CHANGE url url VARCHAR(32) DEFAULT NULL.

Maybe this is the reason why the key_len of the 'url' index is 99?
Reply With Quote
  #13 (permalink)  
Old 12-24-10, 14:01
r937 r937 is online now
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
i don't know, but i think so

say, have you run these different queries and checked the cpu/elapsed times?
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #14 (permalink)  
Old 12-27-10, 09:05
tedd tedd is offline
Registered User
 
Join Date: Nov 2010
Posts: 24
Looks like the whole DB is acting strange. Today I saw some simple queries executing really slow. The problem is deeper

As always, many thanks for help, Rudy.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On