| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

12-24-10, 08:30
|
|
Registered User
|
|
Join Date: Nov 2010
Posts: 24
|
|
|
IN clause and subqueries
|
|
Hello!
Faced with the following issue.
Let's say we got a query:
Code:
SELECT * FROM main_table WHERE string_column IN ('a','b','c');
It works instantaneous.
Let's say we got a table sub_table with a string row containing the same 'a','b','c' values.
The following query takes forever to finish:
Code:
SELECT * FROM main_table WHERE string_column IN (SELECT string_column FROM sub_table)
Table main_table contains about 3m rows.
Any suggestions why the latter is so slow?
|
|

12-24-10, 08:49
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
Quote:
Originally Posted by tedd
Any suggestions why the latter is so slow?
|
because it uses a subquery and not a join
Code:
SELECT something
, anything
, just_not_the_dreaded_evil_select_star
FROM main_table
INNER
JOIN sub_table
ON subtable.string_column = main_table.string_column
you'd better have an index on both string_columns

|
|

12-24-10, 10:12
|
|
Registered User
|
|
Join Date: Nov 2010
Posts: 24
|
|
|
|
I thought that the latter query should work like that:
1. Execute the subquery, get all values from the sub_table and feed them to the main query
2. After p1, the query would be effectively like the first query with plain IN.
I'm curious what does MySQL actually do inside 
|
Last edited by tedd; 12-24-10 at 10:19.
|

12-24-10, 10:20
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
Quote:
Originally Posted by tedd
I'm curious what does MySQL actually do inside 
|
that's easy to find out, just do an EXPLAIN

|
|

12-24-10, 10:32
|
|
Registered User
|
|
Join Date: Nov 2010
Posts: 24
|
|
EXPLAIN on the latter:
Code:
+----+--------------------+---------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+------+---------------+------+---------+------+---------+-------------+
| 1 | PRIMARY | main | ALL | NULL | NULL | NULL | NULL | 2333072 | Using where |
| 2 | DEPENDENT SUBQUERY | sub | ALL | NULL | NULL | NULL | NULL | 3 | Using where |
+----+--------------------+---------+------+---------------+------+---------+------+---------+-------------+
I guess it takes ALL available rows in main and compares each row with a subquery`s results. Am I right? But that seems not very clever for me 
|
|

12-24-10, 10:35
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
now add some relevant indexes, and try the EXPLAIN again
then try the EXPLAIN on the join query
|
|

12-24-10, 10:56
|
|
Registered User
|
|
Join Date: Nov 2010
Posts: 24
|
|
Actually all needed indexes are created already. Its strange why MySQL didn't use it.
|
|

12-24-10, 11:22
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
Quote:
Originally Posted by tedd
Actually all needed indexes are created already. Its strange why MySQL didn't use it.
|
could you do a SHOW CREATE TABLE for both tables, please
|
|

12-24-10, 11:47
|
|
Registered User
|
|
Join Date: Nov 2010
Posts: 24
|
|
The "main_table":
Code:
CREATE TABLE `sb_urls` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`url` varchar(32) DEFAULT NULL,
`date` int(11) NOT NULL DEFAULT '0',
`tid` int(16) NOT NULL DEFAULT '0',
`ok` int(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `url_tid` (`url`,`tid`),
KEY `url` (`url`)
) ENGINE=InnoDB AUTO_INCREMENT=6383522 DEFAULT CHARSET=utf8
The "sub_table":
Code:
CREATE TABLE `malware` (
`malware_hash` varchar(32) NOT NULL,
`status` int(1) DEFAULT '0',
PRIMARY KEY (`malware_hash`),
KEY `malware_index` (`malware_hash`,`status`),
KEY `status` (`status`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1
And the actual query:
Code:
select * from sb_urls where url IN (select malware_hash from malware);
Also, I found an interesting thing - the EXPLAIN output depends on the columns specified in the SELECT clause.
Code:
EXPLAIN select id,url,date from sb_urls where url IN (select malware_hash from malware);
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | PRIMARY | sb_urls | ALL | NULL | NULL | NULL | NULL | 2334144 | Using where |
| 2 | DEPENDENT SUBQUERY | malware | index | NULL | PRIMARY | 34 | NULL | 472975 | Using where; Using index |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
and
Code:
explain select id from sb_urls where url IN (select malware_hash from malware);
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
| 1 | PRIMARY | sb_urls | index | NULL | url | 99 | NULL | 2334144 | Using where; Using index |
| 2 | DEPENDENT SUBQUERY | malware | index | NULL | PRIMARY | 34 | NULL | 472975 | Using where; Using index |
+----+--------------------+---------+-------+---------------+---------+---------+------+---------+--------------------------+
As you can see, in the latter query the 'url' index is used, but in the former is not.
|
|

12-24-10, 11:52
|
|
Registered User
|
|
Join Date: Nov 2010
Posts: 24
|
|
Also, it's strange for me that the 'key_len' column is equal to 99, instead of 32.
|
|

12-24-10, 12:00
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
Quote:
Originally Posted by tedd
Also, it's strange for me that the 'key_len' column is equal to 99, instead of 32.
|
yes, weird
could you show the join EXPLAIN please?
by the way, i think the extra index here is redundant --
Code:
UNIQUE KEY `url_tid` (`url`,`tid`),
KEY `url` (`url`)
|
|

12-24-10, 13:47
|
|
Registered User
|
|
Join Date: Nov 2010
Posts: 24
|
|
Code:
explain SELECT * FROM sb_urls INNER JOIN malware ON sb_urls.url = malware.malware_hash;
+----+-------------+---------+-------+---------------+---------------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+-------+---------------+---------------+---------+------+--------+-------------+
| 1 | SIMPLE | malware | index | NULL | malware_index | 39 | NULL | 473116 | Using index |
| 1 | SIMPLE | sb_urls | ref | url_tid,url | url_tid | 99 | func | 1 | Using where |
+----+-------------+---------+-------+---------------+---------------+---------+------+--------+-------------+
Some additional info. I've talked to the author of this code, he said that initially the 'sb_urls.url' column was VARCHAR(700) DEFAULT NULL, but later he decided to ALTER TABLE sb_urls CHANGE url url VARCHAR(32) DEFAULT NULL.
Maybe this is the reason why the key_len of the 'url' index is 99?
|
|

12-24-10, 14:01
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
i don't know, but i think so
say, have you run these different queries and checked the cpu/elapsed times?
|
|

12-27-10, 09:05
|
|
Registered User
|
|
Join Date: Nov 2010
Posts: 24
|
|
Looks like the whole DB is acting strange. Today I saw some simple queries executing really slow. The problem is deeper
As always, many thanks for help, Rudy.
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|