Results 1 to 5 of 5
  1. #1
    Join Date
    Apr 2009
    Posts
    6

    Is DISTINCT necessary when using NOT IN Subselect?

    In the following stament is there any point including the DISTINCT keyword or will all query optimisers realise that only distinct rows need to be selected in the subselect?

    Code:
    SELECT
        ...
    FROM
        tableA
    WHERE
        col1 NOT IN (SELECT DISTINCT
                         col2
                     FROM
                         tableB);
    Thanks in advance

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    it's superfluous, if you are speaking of the intended meaning (semantics of the statement)

    however, if you are referring to performance, it may or may not speed up the query, depending on which database system you try this in
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Apr 2009
    Posts
    6
    Thanks for the reply.

    So depending on the DBMS this will either improve query performance or have no effect on performance at all (but it will never decrease performance). IMHO it would therefore be sensible to always include the DISTINCT keyword in these subselects. Do you agree/disagree?

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    No, it can definitely decrease performance because some database engines will wait for the entire result set to materialize if DISTINCT is added.

    I think that the correct way to handle this type of code is to correct it by changing the IN clause to an EXISTS clause and removing the DISTINCT operator. This reduces the SQL statement to primatives, which any database engine should be able to optimize for best performance.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    You can also use an outer join. This query isn't exactly equivalent to the NOT IN version because it ignores nulls in col2, if any. It probably gives the result you wanted however:

    SELECT
    ...
    FROM tableA
    LEFT JOIN tableB
    ON tableA.col1 = tableB.col2
    WHERE tableB.col2 IS NOT NULL;

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •