Welcome to the dBforums forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions, articles and access our other FREE features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload your own photos and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact support.

If you prefer not to see double-underlined words and corresponding ads, place your cursor
here for ContentLink opt out.

Go Back  dBforums > Data Access, Manipulation & Batch Languages > ANSI SQL > "NOT IN" vs "MINUS"

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-30-03, 14:26
MattyB MattyB is offline
Registered User
 
Join Date: Mar 2003
Posts: 2
"NOT IN" vs "MINUS"

Can anyone offer any help ?

I've got two queries...


1. SELECT mod_code
FROM SRS.Table1
WHERE mod_code NOT IN
(SELECT mod_code
FROM SRS.Table2);

2. SELECT mod_code
FROM SRS.Table1
MINUS
SELECT mod_code
FROM SRS.Table2;

And upon timing them (two tables are quite large) Query 1 takes ALOT longer than Query 2.

My problem is that I don't understand how "NOT IN" works - MINUS works by comparing the two tables and removing duplicates across the mod_code.SRS.Table1 and mod_code.SRS.Table2

But how does NOT IN work, and why is it alot slower?

thanks for your help!!
Reply With Quote
  #2 (permalink)  
Old 03-31-03, 06:28
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 4,874
Re: "NOT IN" vs "MINUS"

Here is my answer based on the way Oracle works, other DBMSs may do things differently. I am also assuming there is a unique index on the table2 columns being compared.

With MINUS, a full scan is done on both tables and the results for table2 are removed from the results for table1.

With NOT IN, a full table scan is done on table1. For each table1 row, a lookup is then done in table2. If no row is found in table2, the table1 row is returned - at least, that is what I have found.
The reason the NOT IN is slower concerns the number of reads required to perform the query. Let's suppose the tables have the following characteristics:

TABLE1: 20,000 rows in 1000 blocks
TABLE2: 10,000 rows in 500 blocks

Reads required for minus:
Full scan of TABLE1 = 1000 blocks
+
Full scan of Table2 = 500 blocks
= 1500 reads

Reads required for NOT IN:
Full scan of TABLE1 = 1000 blocks
20,000 lookups in TABLE2 = 20,000 x (depth of index on TABLE2)
= 21,000 at least

i.e. a lot more work is done by the NOT IN query.

Here is my test example:

Code:
SQL> create table t1 as select object_id from all_objects; Table created. SQL> select count(*) from t1; COUNT(*) ---------- 42169 SQL> create table t2 as select object_id from all_objects where rownum < 42000; Table created. SQL> alter table t1 add primary key(object_id); Table altered. SQL> alter table t2 add primary key(object_id); Table altered. SQL> analyze table t1 compute statistics; Table analyzed. SQL> analyze table t2 compute statistics; Table analyzed. SQL> set timing on SQL> select count(*) from 2 ( select object_id from t1 3 minus 4 select object_id from t2 5 ) 6 / COUNT(*) ---------- 171 real: 1072 SQL> select count(*) from 2 ( select object_id from t1 3 where object_id not in 4 ( select object_id from t2 5 ) 6 ) 7 / COUNT(*) ---------- 171 real: 2143 SQL> set timing off SQL> set autotrace on SQL> select count(*) from 2 ( select object_id from t1 3 minus 4 select object_id from t2 5 ) 6 / COUNT(*) ---------- 171 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=280 Card=1) 1 0 SORT (AGGREGATE) 2 1 VIEW (Cost=280 Card=84168) 3 2 MINUS 4 3 SORT (UNIQUE) (Cost=140 Card=42169 Bytes=168676) 5 4 TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=42169 By tes=168676) 6 3 SORT (UNIQUE) (Cost=140 Card=41999 Bytes=167996) 7 6 TABLE ACCESS (FULL) OF 'T2' (Cost=10 Card=41999 By tes=167996) Statistics ---------------------------------------------------------- 0 recursive calls 24 db block gets 136 consistent gets 64 physical reads 0 redo size 380 bytes sent via SQL*Net to client 518 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 3 sorts (memory) 0 sorts (disk) 1 rows processed SQL> select count(*) from 2 ( select object_id from t1 3 where object_id not in 4 ( select object_id from t2 5 ) 6 ) 7 / COUNT(*) ---------- 171 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=1 Bytes=4) 1 0 SORT (AGGREGATE) 2 1 FILTER 3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=2109 Bytes=8 436) 4 2 INDEX (UNIQUE SCAN) OF 'SYS_C00128497' (UNIQUE) (Cost= 1 Card=1 Bytes=4) Statistics ---------------------------------------------------------- 0 recursive calls 12 db block gets 84406 consistent gets 0 physical reads 0 redo size 405 bytes sent via SQL*Net to client 541 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1 rows processed
__________________
Tony Andrews
http://tonyandrews.blogspot.com

Last edited by andrewst : 03-31-03 at 06:30.
Reply With Quote
  #3 (permalink)  
Old 03-31-03, 08:59
MattyB MattyB is offline
Registered User
 
Join Date: Mar 2003
Posts: 2
Andrew,

That was just the answer I was looking for!

Many thanks for your help!

Matt
Reply With Quote
  #4 (permalink)  
Old 04-09-03, 09:28
acg_ray acg_ray is offline
Registered User
 
Join Date: Jan 2003
Location: Pittsburgh, PA
Posts: 86
Question Re: "NOT IN" vs "MINUS"

Is there a similar command to "minus" in SQL Server? I use "NOT IN" but would prefer a faster command.





Quote:
Originally posted by andrewst
Here is my answer based on the way Oracle works, other DBMSs may do things differently. I am also assuming there is a unique index on the table2 columns being compared.

With MINUS, a full scan is done on both tables and the results for table2 are removed from the results for table1.

With NOT IN, a full table scan is done on table1. For each table1 row, a lookup is then done in table2. If no row is found in table2, the table1 row is returned - at least, that is what I have found.
The reason the NOT IN is slower concerns the number of reads required to perform the query. Let's suppose the tables have the following characteristics:

TABLE1: 20,000 rows in 1000 blocks
TABLE2: 10,000 rows in 500 blocks

Reads required for minus:
Full scan of TABLE1 = 1000 blocks
+
Full scan of Table2 = 500 blocks
= 1500 reads

Reads required for NOT IN:
Full scan of TABLE1 = 1000 blocks
20,000 lookups in TABLE2 = 20,000 x (depth of index on TABLE2)
= 21,000 at least

i.e. a lot more work is done by the NOT IN query.

Here is my test example:

Code:
SQL> create table t1 as select object_id from all_objects; Table created. SQL> select count(*) from t1; COUNT(*) ---------- 42169 SQL> create table t2 as select object_id from all_objects where rownum < 42000; Table created. SQL> alter table t1 add primary key(object_id); Table altered. SQL> alter table t2 add primary key(object_id); Table altered. SQL> analyze table t1 compute statistics; Table analyzed. SQL> analyze table t2 compute statistics; Table analyzed. SQL> set timing on SQL> select count(*) from 2 ( select object_id from t1 3 minus 4 select object_id from t2 5 ) 6 / COUNT(*) ---------- 171 real: 1072 SQL> select count(*) from 2 ( select object_id from t1 3 where object_id not in 4 ( select object_id from t2 5 ) 6 ) 7 / COUNT(*) ---------- 171 real: 2143 SQL> set timing off SQL> set autotrace on SQL> select count(*) from 2 ( select object_id from t1 3 minus 4 select object_id from t2 5 ) 6 / COUNT(*) ---------- 171 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=280 Card=1) 1 0 SORT (AGGREGATE) 2 1 VIEW (Cost=280 Card=84168) 3 2 MINUS 4 3 SORT (UNIQUE) (Cost=140 Card=42169 Bytes=168676) 5 4 TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=42169 By tes=168676) 6 3 SORT (UNIQUE) (Cost=140 Card=41999 Bytes=167996) 7 6 TABLE ACCESS (FULL) OF 'T2' (Cost=10 Card=41999 By tes=167996) Statistics ---------------------------------------------------------- 0 recursive calls 24 db block gets 136 consistent gets 64 physical reads 0 redo size 380 bytes sent via SQL*Net to client 518 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 3 sorts (memory) 0 sorts (disk) 1 rows processed SQL> select count(*) from 2 ( select object_id from t1 3 where object_id not in 4 ( select object_id from t2 5 ) 6 ) 7 / COUNT(*) ---------- 171 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=1 Bytes=4) 1 0 SORT (AGGREGATE) 2 1 FILTER 3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=2109 Bytes=8 436) 4 2 INDEX (UNIQUE SCAN) OF 'SYS_C00128497' (UNIQUE) (Cost= 1 Card=1 Bytes=4) Statistics ---------------------------------------------------------- 0 recursive calls 12 db block gets 84406 consistent gets 0 physical reads 0 redo size 405 bytes sent via SQL*Net to client 541 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 1 rows processed
Reply With Quote
  #5 (permalink)  
Old 04-09-03, 10:01
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 4,874
Re: "NOT IN" vs "MINUS"

Quote:
Originally posted by acg_ray
Is there a similar command to "minus" in SQL Server? I use "NOT IN" but would prefer a faster command.

Amazingly (to me) it appears that SQL Server does not support MINUS, nor INTERSECT - according to the on-line manual here:

http://msdn.microsoft.com/library/de...a-ses_9sfo.asp

The ANSI name for MINUS is EXCEPT, but SQL Server doesn't seem to have that either. I find that strange, because relational databases are all about set processing, and UNION, MINUS/EXCEPT and INTERSECT are operators that work on sets (remember those Venn diagrams at school?)
__________________
Tony Andrews
http://tonyandrews.blogspot.com
Reply With Quote
  #6 (permalink)  
Old 04-09-03, 10:23
acg_ray acg_ray is offline
Registered User
 
Join Date: Jan 2003
Location: Pittsburgh, PA
Posts: 86
Re: "NOT IN" vs "MINUS"

Thanks for confirming (unfortuanately) what I already expected. I learned Oracle in school, but have always used SQL Server professionally, and I was hoping I was missing something from SQL Server... but apparently not!


Quote:
Originally posted by andrewst
Amazingly (to me) it appears that SQL Server does not support MINUS, nor INTERSECT - according to the on-line manual here:

http://msdn.microsoft.com/library/de...a-ses_9sfo.asp

The ANSI name for MINUS is EXCEPT, but SQL Server doesn't seem to have that either. I find that strange, because relational databases are all about set processing, and UNION, MINUS/EXCEPT and INTERSECT are operators that work on sets (remember those Venn diagrams at school?)
Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On