Results 1 to 6 of 6
  1. #1
    Join Date
    Mar 2003
    Posts
    2

    Unanswered: "NOT IN" vs "MINUS"

    Can anyone offer any help ?

    I've got two queries...


    1. SELECT mod_code
    FROM SRS.Table1
    WHERE mod_code NOT IN
    (SELECT mod_code
    FROM SRS.Table2);

    2. SELECT mod_code
    FROM SRS.Table1
    MINUS
    SELECT mod_code
    FROM SRS.Table2;

    And upon timing them (two tables are quite large) Query 1 takes ALOT longer than Query 2.

    My problem is that I don't understand how "NOT IN" works - MINUS works by comparing the two tables and removing duplicates across the mod_code.SRS.Table1 and mod_code.SRS.Table2

    But how does NOT IN work, and why is it alot slower?

    thanks for your help!!

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1

    Re: "NOT IN" vs "MINUS"

    Here is my answer based on the way Oracle works, other DBMSs may do things differently. I am also assuming there is a unique index on the table2 columns being compared.

    With MINUS, a full scan is done on both tables and the results for table2 are removed from the results for table1.

    With NOT IN, a full table scan is done on table1. For each table1 row, a lookup is then done in table2. If no row is found in table2, the table1 row is returned - at least, that is what I have found.
    The reason the NOT IN is slower concerns the number of reads required to perform the query. Let's suppose the tables have the following characteristics:

    TABLE1: 20,000 rows in 1000 blocks
    TABLE2: 10,000 rows in 500 blocks

    Reads required for minus:
    Full scan of TABLE1 = 1000 blocks
    +
    Full scan of Table2 = 500 blocks
    = 1500 reads

    Reads required for NOT IN:
    Full scan of TABLE1 = 1000 blocks
    20,000 lookups in TABLE2 = 20,000 x (depth of index on TABLE2)
    = 21,000 at least

    i.e. a lot more work is done by the NOT IN query.

    Here is my test example:

    Code:
    SQL> create table t1  as select object_id from all_objects;
    
    Table created.
    
    SQL> select count(*) from t1;
    
      COUNT(*)
    ----------
         42169
    
    SQL> create table t2 as select object_id from all_objects where rownum < 42000;
    
    Table created.
    
    SQL> alter table t1 add primary key(object_id);
    
    Table altered.
    
    SQL> alter table t2 add primary key(object_id);
    
    Table altered.
    
    SQL> analyze table t1 compute statistics;
    
    Table analyzed.
    
    SQL> analyze table t2 compute statistics;
    
    Table analyzed.
    
    SQL> set timing on
    SQL> select count(*) from
      2  ( select object_id from t1
      3    minus
      4    select object_id from t2
      5  )
      6  /
    
      COUNT(*)
    ----------
           171
    
     real: 1072
    
    SQL> select count(*) from
      2  ( select object_id from t1
      3    where object_id not in
      4    ( select object_id from t2
      5    )
      6  )
      7  /
    
      COUNT(*)
    ----------
           171
    
     real: 2143
    
    SQL> set timing off 
    SQL> set autotrace on
    
    SQL> select count(*) from
      2  ( select object_id from t1
      3    minus
      4    select object_id from t2
      5  )
      6  /
    
      COUNT(*)
    ----------
           171
    
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=280 Card=1)
       1    0   SORT (AGGREGATE)
       2    1     VIEW (Cost=280 Card=84168)
       3    2       MINUS
       4    3         SORT (UNIQUE) (Cost=140 Card=42169 Bytes=168676)
       5    4           TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=42169 By
              tes=168676)
    
       6    3         SORT (UNIQUE) (Cost=140 Card=41999 Bytes=167996)
       7    6           TABLE ACCESS (FULL) OF 'T2' (Cost=10 Card=41999 By
              tes=167996)
    
    
    
    
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
             24  db block gets
            136  consistent gets
             64  physical reads
              0  redo size
            380  bytes sent via SQL*Net to client
            518  bytes received via SQL*Net from client
              4  SQL*Net roundtrips to/from client
              3  sorts (memory)
              0  sorts (disk)
              1  rows processed
    
    SQL> select count(*) from
      2  ( select object_id from t1
      3    where object_id not in
      4    ( select object_id from t2
      5    )
      6  )
      7  /
    
      COUNT(*)
    ----------
           171
    
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=1 Bytes=4)
       1    0   SORT (AGGREGATE)
       2    1     FILTER
       3    2       TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=2109 Bytes=8
              436)
    
       4    2       INDEX (UNIQUE SCAN) OF 'SYS_C00128497' (UNIQUE) (Cost=
              1 Card=1 Bytes=4)
    
    
    
    
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
             12  db block gets
          84406  consistent gets
              0  physical reads
              0  redo size
            405  bytes sent via SQL*Net to client
            541  bytes received via SQL*Net from client
              4  SQL*Net roundtrips to/from client
              1  sorts (memory)
              0  sorts (disk)
              1  rows processed
    Last edited by andrewst; 03-31-03 at 06:30.

  3. #3
    Join Date
    Mar 2003
    Posts
    2
    Andrew,

    That was just the answer I was looking for!

    Many thanks for your help!

    Matt

  4. #4
    Join Date
    Jan 2003
    Location
    Pittsburgh, PA
    Posts
    86

    Question Re: "NOT IN" vs "MINUS"

    Is there a similar command to "minus" in SQL Server? I use "NOT IN" but would prefer a faster command.





    Originally posted by andrewst
    Here is my answer based on the way Oracle works, other DBMSs may do things differently. I am also assuming there is a unique index on the table2 columns being compared.

    With MINUS, a full scan is done on both tables and the results for table2 are removed from the results for table1.

    With NOT IN, a full table scan is done on table1. For each table1 row, a lookup is then done in table2. If no row is found in table2, the table1 row is returned - at least, that is what I have found.
    The reason the NOT IN is slower concerns the number of reads required to perform the query. Let's suppose the tables have the following characteristics:

    TABLE1: 20,000 rows in 1000 blocks
    TABLE2: 10,000 rows in 500 blocks

    Reads required for minus:
    Full scan of TABLE1 = 1000 blocks
    +
    Full scan of Table2 = 500 blocks
    = 1500 reads

    Reads required for NOT IN:
    Full scan of TABLE1 = 1000 blocks
    20,000 lookups in TABLE2 = 20,000 x (depth of index on TABLE2)
    = 21,000 at least

    i.e. a lot more work is done by the NOT IN query.

    Here is my test example:

    Code:
    SQL> create table t1  as select object_id from all_objects;
    
    Table created.
    
    SQL> select count(*) from t1;
    
      COUNT(*)
    ----------
         42169
    
    SQL> create table t2 as select object_id from all_objects where rownum < 42000;
    
    Table created.
    
    SQL> alter table t1 add primary key(object_id);
    
    Table altered.
    
    SQL> alter table t2 add primary key(object_id);
    
    Table altered.
    
    SQL> analyze table t1 compute statistics;
    
    Table analyzed.
    
    SQL> analyze table t2 compute statistics;
    
    Table analyzed.
    
    SQL> set timing on
    SQL> select count(*) from
      2  ( select object_id from t1
      3    minus
      4    select object_id from t2
      5  )
      6  /
    
      COUNT(*)
    ----------
           171
    
     real: 1072
    
    SQL> select count(*) from
      2  ( select object_id from t1
      3    where object_id not in
      4    ( select object_id from t2
      5    )
      6  )
      7  /
    
      COUNT(*)
    ----------
           171
    
     real: 2143
    
    SQL> set timing off 
    SQL> set autotrace on
    
    SQL> select count(*) from
      2  ( select object_id from t1
      3    minus
      4    select object_id from t2
      5  )
      6  /
    
      COUNT(*)
    ----------
           171
    
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=280 Card=1)
       1    0   SORT (AGGREGATE)
       2    1     VIEW (Cost=280 Card=84168)
       3    2       MINUS
       4    3         SORT (UNIQUE) (Cost=140 Card=42169 Bytes=168676)
       5    4           TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=42169 By
              tes=168676)
    
       6    3         SORT (UNIQUE) (Cost=140 Card=41999 Bytes=167996)
       7    6           TABLE ACCESS (FULL) OF 'T2' (Cost=10 Card=41999 By
              tes=167996)
    
    
    
    
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
             24  db block gets
            136  consistent gets
             64  physical reads
              0  redo size
            380  bytes sent via SQL*Net to client
            518  bytes received via SQL*Net from client
              4  SQL*Net roundtrips to/from client
              3  sorts (memory)
              0  sorts (disk)
              1  rows processed
    
    SQL> select count(*) from
      2  ( select object_id from t1
      3    where object_id not in
      4    ( select object_id from t2
      5    )
      6  )
      7  /
    
      COUNT(*)
    ----------
           171
    
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=10 Card=1 Bytes=4)
       1    0   SORT (AGGREGATE)
       2    1     FILTER
       3    2       TABLE ACCESS (FULL) OF 'T1' (Cost=10 Card=2109 Bytes=8
              436)
    
       4    2       INDEX (UNIQUE SCAN) OF 'SYS_C00128497' (UNIQUE) (Cost=
              1 Card=1 Bytes=4)
    
    
    
    
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
             12  db block gets
          84406  consistent gets
              0  physical reads
              0  redo size
            405  bytes sent via SQL*Net to client
            541  bytes received via SQL*Net from client
              4  SQL*Net roundtrips to/from client
              1  sorts (memory)
              0  sorts (disk)
              1  rows processed

  5. #5
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1

    Re: "NOT IN" vs "MINUS"

    Originally posted by acg_ray
    Is there a similar command to "minus" in SQL Server? I use "NOT IN" but would prefer a faster command.
    Amazingly (to me) it appears that SQL Server does not support MINUS, nor INTERSECT - according to the on-line manual here:

    http://msdn.microsoft.com/library/de...a-ses_9sfo.asp

    The ANSI name for MINUS is EXCEPT, but SQL Server doesn't seem to have that either. I find that strange, because relational databases are all about set processing, and UNION, MINUS/EXCEPT and INTERSECT are operators that work on sets (remember those Venn diagrams at school?)

  6. #6
    Join Date
    Jan 2003
    Location
    Pittsburgh, PA
    Posts
    86

    Re: "NOT IN" vs "MINUS"

    Thanks for confirming (unfortuanately) what I already expected. I learned Oracle in school, but have always used SQL Server professionally, and I was hoping I was missing something from SQL Server... but apparently not!


    Originally posted by andrewst
    Amazingly (to me) it appears that SQL Server does not support MINUS, nor INTERSECT - according to the on-line manual here:

    http://msdn.microsoft.com/library/de...a-ses_9sfo.asp

    The ANSI name for MINUS is EXCEPT, but SQL Server doesn't seem to have that either. I find that strange, because relational databases are all about set processing, and UNION, MINUS/EXCEPT and INTERSECT are operators that work on sets (remember those Venn diagrams at school?)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •