Here is my answer based on the way Oracle works, other DBMSs may do things differently. I am also assuming there is a unique index on the table2 columns being compared.
With MINUS, a full scan is done on both tables and the results for table2 are removed from the results for table1.
With NOT IN, a full table scan is done on table1. For each table1 row, a lookup is then done in table2. If no row is found in table2, the table1 row is returned - at least, that is what I have found.
The reason the NOT IN is slower concerns the number of reads required to perform the query. Let's suppose the tables have the following characteristics:
TABLE1: 20,000 rows in 1000 blocks
TABLE2: 10,000 rows in 500 blocks
Reads required for minus:
Full scan of TABLE1 = 1000 blocks
+
Full scan of Table2 = 500 blocks
= 1500 reads
Reads required for NOT IN:
Full scan of TABLE1 = 1000 blocks
20,000 lookups in TABLE2 = 20,000 x (depth of index on TABLE2)
= 21,000 at least
i.e. a lot more work is done by the NOT IN query.
Here is my test example:
Code:
SQL> create table t1 as select object_id from all_objects;
Table created.
SQL> select count(*) from t1;
COUNT(*)
----------
42169
SQL> create table t2 as select object_id from all_objects where rownum < 42000;
Table created.
SQL> alter table t1 add primary key(object_id);
Table altered.
SQL> alter table t2 add primary key(object_id);
Table altered.
SQL> analyze table t1 compute statistics;
Table analyzed.
SQL> analyze table t2 compute statistics;
Table analyzed.
SQL> set timing on
SQL> select count(*) from
2 ( select object_id from t1
3 minus
4 select object_id from t2
5 )
6 /
COUNT(*)
----------
171
real: 1072
SQL> select count(*) from
2 ( select object_id from t1
3 where object_id not in
4 ( select object_id from t2
5 )
6 )
7 /
COUNT(*)
----------
171
real: 2143
SQL> set timing off
SQL> set autotrace on
SQL> select count(*) from
2 ( select object_id from t1
3 minus
4 select object_id from t2
5 )
6 /
COUNT(*)
----------
171
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=280 Card=1)
1 0 SORT (AGGREGATE)
2 1 VIEW (Cost=280 Card=84168)
3 2 MINUS
4 3 SORT (UNIQUE) (Cost=140 Card=42169 Bytes=168676)
5 4 TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=42169 By
tes=168676)
6 3 SORT (UNIQUE) (Cost=140 Card=41999 Bytes=167996)
7 6 TABLE ACCESS (FULL) OF 'T2' (Cost=10 Card=41999 By
tes=167996)
Statistics
----------------------------------------------------------
0 recursive calls
24 db block gets
136 consistent gets
64 physical reads
0 redo size
380 bytes sent via SQL*Net to client
518 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
3 sorts (memory)
0 sorts (disk)
1 rows processed
SQL> select count(*) from
2 ( select object_id from t1
3 where object_id not in
4 ( select object_id from t2
5 )
6 )
7 /
COUNT(*)
----------
171
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=1 Bytes=4)
1 0 SORT (AGGREGATE)
2 1 FILTER
3 2 TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=2109 Bytes=8
436)
4 2 INDEX (UNIQUE SCAN) OF 'SYS_C00128497' (UNIQUE) (Cost=
1 Card=1 Bytes=4)
Statistics
----------------------------------------------------------
0 recursive calls
12 db block gets
84406 consistent gets
0 physical reads
0 redo size
405 bytes sent via SQL*Net to client
541 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed