Page 1 of 2 12 LastLast
Results 1 to 15 of 18
  1. #1
    Join Date
    Apr 2007
    Posts
    130

    Unanswered: How to avoid using inefficient IN/NOT IN in a query SQL statement

    Hello ALL,


    The inefficiency of the IN clause in SQL statement is well known. Could anyone suggest some workarounds for the following case.

    I have three tables tb1,tb2 and tb3. I would like to sum up field values from two tables,and show a list of person name whose name appear in tb3. How to do this using a SQL statement.

    Code:
    tb1
    Name Salary
    Bob 1000
    Tom 2000
    John 3000
    Winson 4000
    Code:
    tb2
    Name Bonus
    Bob 100
    Tom 200
    Code:
    tb3
    Name
    Bob
    Tom
    I would like to get the following result

    Code:
    Name Total
    Bob 1100
    Tom 2200

    i would like to use the following statement

    Code:
    SELECT
         tb1.name, (tb1.salary+tb2.bonus) as remuneration
    FROM
         tb1
    LEFT JOIN
         tb2 on tb2.Name=tb1.Name
    WHERE tb1.name IN tb3
    However,when tb1, tb2 and tb3 are very large, the above statement is inefficient. Can anyone suggest more efficient alternative statements.

    Thanks

  2. #2
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Have you considered joining to tbl3 based on name instead of using IN?
    George
    Home | Blog

  3. #3
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    George - why help them do their homework, they never seem to put any effort into doing it themselves.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    mike -- what makes you think this is homework
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by r937
    mike -- what makes you think this is homework
    • This type of coursework is quite useful as the open ended nature of the question allows the student to show his understanding of the query in general, he can show the efficiency of various examples of SQL and, if he fully understands the question, he can get a bonus mark by showing why the question as posed doesn't produce the correct answer. It's the type of question that might stretch a keen student or, of course, just send a lazy one to a forum.
    • How large would these tables become in real life? The t1 table (employees) is likely to have anywhere between 10 to 1k records, the t2 table (bonus) is likely to be empty at the moment and the t3 table is always going to be smaller than t1 (and most likely much smaller) so any real inefficiency is most likely to go unnoticed.
    • If there was a real company with enough employees to cause a performance issue with the query - would you have someone with this level of expertise playing with the HR data?
    • The table names, example data and, most importantly, the table structure come out of class 101.
    • I loosely remember getting a similar question when I was at college.
    • I could probably go on but I'm not going to.

    Rudy -- what makes you feel it's a real world problem?


    PS I'd still be happy to help him if he puts in a bit of effort rather than just posting the question and expecting us to do it for him.

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by mike_bike_kite
    Rudy -- what makes you feel it's a real world problem?
    because i've read all his previous threads, which you obviously haven't

    there is no "name, salary, and bonus" involved here

    he's trying to simplify his real-world situation so as to direct our attention to the SQL and not the complexity of his database

    now, having said that, i don't particularly like it when posters "dumb down" their questions with over-simplified examples

    what i usually do is answer the over-simplified example, and if the poster cannot translate that back into his real-world scenario, that's his problem


    still think it's homework, mike?

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by r937
    because i've read all his previous threads, which you obviously haven't

    there is no "name, salary, and bonus" involved here
    I simply answer the questions as they're posted - if someone wants to paint their horse in black and white stripes then they shouldn't be surprised if people see a zebra.

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    yup, as i said, i always try to answer the question as asked

    but i'm very careful to avoid accusing someone of foisting homework questions on us unless i have pretty clear evidence

    and in this case there's plenty of evidence to the contrary

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Apr 2007
    Posts
    130
    as r937 stated, this question posted here is a simplified real world problem. I an working on a Text retrieval system. I am new to MySQL. Whenevern I figure out a statement with great effort, I always suspect whether it is the best one --consuming the least time to execute.

    Even some cases, I failed to figure out a statement to implement queries. Honestly, without the suggestions and solutions posted by you on dbforums.com, i would have not made progress on my project. Thanks r937 and all friends who ever helped me on this forum.

    Ok I try to solve this problem myself using following statement. My friends, please see if it is the best one.
    Code:
    SELECT
         tb1.name, (tb1.salary+tb2.bonus) as remuneration
    FROM
         tb1
    LEFT JOIN
         tb2, tb3 on tb2.Name=tb1.Name AND tb1.Name=tb3.Name
    Last edited by cy163; 11-04-08 at 01:38.

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    the very first thing you should do when writing an SQL statement is not to ask us if it is the best one, but to test it

    what happened when you tested that query?

    i have a feeling that it won't even run, because of incorrect syntax
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Rudy is quite correct and the syntax is wrong. If you try running it through MySQL you get the error:
    Code:
    ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ' tb3 on tb2.Name=tb1.Name AND tb1.Name=tb3.Name' at line
    You should always go for readability in your code before efficiency. Though getting the syntax correct is very important too. If your query is too slow then start to look at tuning the query. How long does it take to run at the moment with your data? It ran instantly for me as I just got the above error.

    Do you really want to use tb3 to store the keys of the items you're interested in? This means you have to insert them into the table, run your query and then delete them afterwards. What would happen if 2 people tried to run this query at the same time? You'll either want to store another user name in t3 or perhaps just build the query on the fly.

    I'd seriously consider buying a good book on SQL - you'll learn so much faster. Once you've mastered SQL you can move onto getting a book or two on database design and then perhaps performance tuning.

    Give us real examples with real data - it stops any confusion.

  12. #12
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quote Originally Posted by mike_bike_kite
    Once you've mastered SQL
    How long does that take? I've been at it for ages now and I'm still nowhere near expert status!
    George
    Home | Blog

  13. #13
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    you are too!!
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  14. #14
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quote Originally Posted by r937
    you are too!!
    Sweet of you to say, but I'm still at apprentice status
    George
    Home | Blog

  15. #15
    Join Date
    Apr 2007
    Posts
    130
    Quote Originally Posted by r937
    the very first thing you should do when writing an SQL statement is not to ask us if it is the best one, but to test it

    what happened when you tested that query?

    i have a feeling that it won't even run, because of incorrect syntax

    sorry i type the statement from memory. The actual statement is as follows
    Code:
    SELECT
         tb1.name, (tb1.salary+tb2.bonus) as remuneration
    FROM
         tb1
    LEFT JOIN
         (tb2, tb3) on (tb2.Name=tb1.Name AND tb1.Name=tb3.Name)
    It took 23 seconds on a PowerPC P4 server.
    there are 3 million records in tb1, 40 records in tb2, and 100,000 records in tb3. I have created index on name field for all the 3 tables.

    I would like to know if there is more efficient statement than the above.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •