Results 1 to 15 of 15
  1. #1
    Join Date
    Jun 2010
    Posts
    7

    Unanswered: How to do this query?

    Hello,

    I want to do a query like this:
    select * from customer where customer.id in
    ( select customerid from bids union all select sellerId from products union all select buyerId from products)

    it seems unnecessary to perform two selects on the products table. I tried doing a WITH (select sellerId, buyerId from products) and then using that table in the IN clause. I also tried doing like this:

    select * from
    customers,
    (select sellerId, buyerId from products) p,
    (select customerid from bids) b
    where customers.id in b.customerid
    or customers.id in p.buyerid
    or customers.id or p.sellerid

    but nothing has worked. Any tips?
    Last edited by Guitarkalle; 06-15-10 at 15:28.

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Assuming that you have usable indicies, the two SELECT operations are actually much more efficient for many reasons. I'd do this as:
    Code:
    SELECT *
       FROM table1
       WHERE EXISTS (SELECT 1
          FROM table3 AS z1
          WHERE  z1.id = table1.id)
          OR EXISTS (SELECT 1
             FROM table3 AS z2
             WHERE  z2.id2 = table1.id)
    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    Jun 2010
    Posts
    7
    Hello, thanks for the reply!

    I have indexes on the columns! Why would it be more efficient? Because it can just use the index and doesn't have to touch the data in the row?

    Will Exists use the index and IN will not?

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    see, this is why i hate questions that are posed in generic "table1" terms

    what happened to table2, guys?

    there it is right there in the original question , but by the time the original question was over, and pat answered, it got losted

    this type of muddle and confusion seldom happens if actual table names are used from the outset
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Jun 2010
    Posts
    7
    Hello!

    I edited my original question, hopefully it makes some sense now!

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Guitarkalle View Post
    hopefully it makes some sense now!
    it certainly does!! thank you very much

    may i ask a question?

    if you took all your customers, and subtracted out those that are involved in a bid, and also subtracted out those that are either a seller or a buyer of a product, would there be any left?

    if the answer is yes, then your original query is perfect (except i would use UNION insead of UNION ALL) --

    select * from customer where customer.id in
    ( select customerid from bids union select sellerId from products union select buyerId from products)

    if the answer is no, then your original query can be simplified to --

    select * from customer

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    The problem with using the IN clause is not what it does but how it accomplishes the task... The IN clause will construct the result set as a heap, then sift through that heap one item at a time. The EXISTS clause is free to use an index scan and it can quit as soon as it can prove either the existance or non-existance of a row from the subquery. Two index seeks will be faster than a heap query of almost any size but certainly for any result set smaller than four database pages.

    -PatP
    Last edited by Pat Phelan; 06-15-10 at 16:04. Reason: replaced "sort" with less-ambiguous "sift"
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  8. #8
    Join Date
    Jun 2010
    Posts
    7
    Quote Originally Posted by r937 View Post
    it certainly does!! thank you very much

    may i ask a question?

    if you took all your customers, and subtracted out those that are involved in a bid, and also subtracted out those that are either a seller or a buyer of a product, would there be any left?

    if the answer is yes, then your original query is perfect (except i would use UNION insead of UNION ALL) --

    select * from customer where customer.id in
    ( select customerid from bids union select sellerId from products union select buyerId from products)

    if the answer is no, then your original query can be simplified to --

    select * from customer

    Yes, you are right, it is not the whole query. There are some criterons on the products and bids tables which generates a subset of the customers. The idea of the query is to select only the necessary customers to the client and not all of them (which would not scale very well in the long run I would guess).

    regarding union vs union all. The difference as far as I know is that union does a select distinct on the set. Would that not just degrade performance in my case?

  9. #9
    Join Date
    Jun 2010
    Posts
    7
    Quote Originally Posted by Pat Phelan View Post
    The problem with using the IN clause is not what it does but how it accomplishes the task... The IN clause will construct the result set as a heap, then sift through that heap one item at a time. The EXISTS clause is free to use an index scan and it can quit as soon as it can prove either the existance or non-existance of a row from the subquery. Two index seeks will be faster than a heap query of almost any size but certainly for any result set smaller than four database pages.

    -PatP
    Thanks for the insight, love this information!

    So the IN clause will create a temporary heap table which it just iterates through looking for matches?

    select * from customer where exists (select sellerid from products where customerid = sellerid)

    Is this equivalent to a join basically?

    Could I not write the original query like this?
    Select * from customer c
    where exists (select * from products p where c.customerid = p.sellerid OR c.customerid = p.buyerid)
    or exists (select * from bids where bids.id = c.customerid)

    ?

  10. #10
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    Could I not write the original query like this?
    Select * from customer c
    where exists (select * from products p where c.customerid = p.sellerid OR c.customerid = p.buyerid)
    or exists (select * from bids where bids.id = c.customerid)
    I would suggest against that as it could lead you to using some type of a multi-index access which would invariably slow things down. In MOST cases you would be better off with something like:
    Code:
    Select * from customer c 
    where exists (select 1 from products p where c.customerid = p.sellerid)
     OR exists (select 1  from products p where c.customerid = p.buyerid) 
    or exists (select 1 from bids where bids.id = c.customerid)
    Also, notice that I changed your select * to select 1. Reasoning is it is much cheaper to have a single 1 in memory, than all columns that may exists in a table.
    Dave

  11. #11
    Join Date
    Jun 2010
    Posts
    7
    OK, thanks for your input!

    I think I will go with that approach.

    Also, according to
    Ask Tom "IN & EXISTS"

    IN is generally good when the inner query is small and the outer query is big.
    EXISTS is generally good when the inner query is big and the outer query is small.

    But that discussion was regarding Oracle and also posted 2001 so...

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Tom's comments from the page you cited were certainly true for Oracle in 2001.

    While I haven't tested every database server, I'd expect that every current generation database server would implement the EXISTS clause more efficiently than the IN clause. There might be an exception to this rule, but I'm not aware of one.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  13. #13
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by dav1mo View Post
    Also, notice that I changed your select * to select 1. Reasoning is it is much cheaper to have a single 1 in memory, than all columns that may exists in a table.
    Minor, but this doesn't happen. You can find something from (I think) Conor Cunningham on this if you Google around. In short the engine does not retrieve all the columns of data if you use *.

  14. #14
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quote Originally Posted by pootle flump View Post
    Minor, but this doesn't happen. You can find something from (I think) Conor Cunningham on this if you Google around. In short the engine does not retrieve all the columns of data if you use *.
    Is that dependant on what version of SQL Server you are using? I believe that 2000 expanded a * into the column list and as such was a teeny-tiny performance hit. I believe this was "fixed" in 2005.

    although I could be well wrong ;)
    George
    Home | Blog

  15. #15
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    I think Conor described that performance hit as a "couple of cycles".

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •