If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Is DISTINCT necessary when using NOT IN Subselect?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-25-09, 11:51
zeolite zeolite is offline
Registered User
 
Join Date: Apr 2009
Posts: 6
Is DISTINCT necessary when using NOT IN Subselect?

In the following stament is there any point including the DISTINCT keyword or will all query optimisers realise that only distinct rows need to be selected in the subselect?

Code:
SELECT
    ...
FROM
    tableA
WHERE
    col1 NOT IN (SELECT DISTINCT
                     col2
                 FROM
                     tableB);
Thanks in advance
Reply With Quote
  #2 (permalink)  
Old 04-25-09, 12:33
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
it's superfluous, if you are speaking of the intended meaning (semantics of the statement)

however, if you are referring to performance, it may or may not speed up the query, depending on which database system you try this in
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 04-25-09, 12:53
zeolite zeolite is offline
Registered User
 
Join Date: Apr 2009
Posts: 6
Thanks for the reply.

So depending on the DBMS this will either improve query performance or have no effect on performance at all (but it will never decrease performance). IMHO it would therefore be sensible to always include the DISTINCT keyword in these subselects. Do you agree/disagree?
Reply With Quote
  #4 (permalink)  
Old 04-25-09, 13:06
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
No, it can definitely decrease performance because some database engines will wait for the entire result set to materialize if DISTINCT is added.

I think that the correct way to handle this type of code is to correct it by changing the IN clause to an EXISTS clause and removing the DISTINCT operator. This reduces the SQL statement to primatives, which any database engine should be able to optimize for best performance.

-PatP
__________________
In theory, theory and practice are identical. In practice, theory and practice are unrelated.
Reply With Quote
  #5 (permalink)  
Old 04-25-09, 15:57
dportas dportas is offline
Registered User
 
Join Date: Dec 2007
Location: London, UK
Posts: 732
You can also use an outer join. This query isn't exactly equivalent to the NOT IN version because it ignores nulls in col2, if any. It probably gives the result you wanted however:

SELECT
...
FROM tableA
LEFT JOIN tableB
ON tableA.col1 = tableB.col2
WHERE tableB.col2 IS NOT NULL;
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On