Results 1 to 7 of 7
  1. #1
    Join Date
    Jan 2004
    Posts
    22

    Unanswered: NOT EXISTS versus NOT IN

    I have two tables :

    tableA
    a_id (int)
    value (varchar 255)

    tableB
    id (int)
    a_id (int)
    b_id (int)
    c_id (int)
    d_id (int)

    Both these tables contain considerable amounts of rows, but over time tableA will end up containing orphaned values (i.e. the a_id is not used in tableB) and this problem cannot be rectified by setting, for example, cascade deletes.

    To fix this problem I decided to write a simple stored procedure to purge all values in tableA where its a_id is not used in tableB :

    DELETE FROM tableA WHERE a_id NOT IN (SELECT a_id FROM tableB)

    Now although the following document relates to postgres :

    http://archives.postgresql.org/pgsql...2/msg00174.php

    I was interested to find out if I should be wary of using NOT IN in my query.

  2. #2
    Join Date
    Dec 2003
    Posts
    454
    The results should be the same.

  3. #3
    Join Date
    Jan 2004
    Posts
    22
    When I asked the orginal question, I dumbed down the example a little bit, in that there will be multiple "tableB"

    So my query will actually be something like :

    DELETE FROM tableA WHERE a_id NOT IN (SELECT a_id FROM tableB) AND a_id NOT IN (SELECT a_id FROM tableC) AND a_id NOT IN (SELECT a_id FROM tableE) AND a_id NOT IN (SELECT a_id FROM tableE)

    Now from what I understand the process SQL will follow is :

    1) Get the first ID from tableA perform all the selects and check the values against the WHERE clause

    2) Get the next ID from tablaA reperform all the selects and check the values against the WHERE clause

    3) Repeat until all ids in tableA have been processed

    If that is correct, thats a hell of a lot of selects it will be doing

  4. #4
    Join Date
    Sep 2003
    Posts
    522
    DELETE a from tableA a left outer join tableB b on a.a_id = a_id left out join tableC c on a.a_id = c.a_id left outer join tableE e on a.a_id = e.a_id where b.a_id is null and c.a_id is null and e.a_id is null

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Under most circumstances the SQL Server optimizer will take a NOT EXISTS clause and convert it to a standard JOIN syntax as illustrated by ms_sql_dba, so there would be no effective difference. I don't know if it will do the same for NOT IN, but on complex statements the optimizer may not make this conversion and so it is good practice to use ms_sql_dba's JOIN syntax instead of NOT EXISTS or NOT IN.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Sep 2003
    Posts
    522
    it's nice when someone can translate it into an academic english.
    while IN predicate may be optimized into a join, NOT IN almost always yields a table or an index scan.

  7. #7
    Join Date
    Jan 2004
    Posts
    31

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •