If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > How to avoid using inefficient IN/NOT IN in a query SQL statement

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-03-08, 08:25
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
How to avoid using inefficient IN/NOT IN in a query SQL statement

Hello ALL,


The inefficiency of the IN clause in SQL statement is well known. Could anyone suggest some workarounds for the following case.

I have three tables tb1,tb2 and tb3. I would like to sum up field values from two tables,and show a list of person name whose name appear in tb3. How to do this using a SQL statement.

Code:
tb1
Name Salary
Bob 1000
Tom 2000
John 3000
Winson 4000
Code:
tb2
Name Bonus
Bob 100
Tom 200
Code:
tb3
Name
Bob
Tom
I would like to get the following result

Code:
Name Total
Bob 1100
Tom 2200

i would like to use the following statement

Code:
SELECT
     tb1.name, (tb1.salary+tb2.bonus) as remuneration
FROM
     tb1
LEFT JOIN
     tb2 on tb2.Name=tb1.Name
WHERE tb1.name IN tb3
However,when tb1, tb2 and tb3 are very large, the above statement is inefficient. Can anyone suggest more efficient alternative statements.

Thanks
Reply With Quote
  #2 (permalink)  
Old 11-03-08, 08:56
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
Have you considered joining to tbl3 based on name instead of using IN?
__________________
George
Twitter | Blog
Reply With Quote
  #3 (permalink)  
Old 11-03-08, 09:30
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
George - why help them do their homework, they never seem to put any effort into doing it themselves.
Reply With Quote
  #4 (permalink)  
Old 11-03-08, 15:02
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
mike -- what makes you think this is homework
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 11-03-08, 19:11
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by r937
mike -- what makes you think this is homework
  • This type of coursework is quite useful as the open ended nature of the question allows the student to show his understanding of the query in general, he can show the efficiency of various examples of SQL and, if he fully understands the question, he can get a bonus mark by showing why the question as posed doesn't produce the correct answer. It's the type of question that might stretch a keen student or, of course, just send a lazy one to a forum.
  • How large would these tables become in real life? The t1 table (employees) is likely to have anywhere between 10 to 1k records, the t2 table (bonus) is likely to be empty at the moment and the t3 table is always going to be smaller than t1 (and most likely much smaller) so any real inefficiency is most likely to go unnoticed.
  • If there was a real company with enough employees to cause a performance issue with the query - would you have someone with this level of expertise playing with the HR data?
  • The table names, example data and, most importantly, the table structure come out of class 101.
  • I loosely remember getting a similar question when I was at college.
  • I could probably go on but I'm not going to.
Rudy -- what makes you feel it's a real world problem?


PS I'd still be happy to help him if he puts in a bit of effort rather than just posting the question and expecting us to do it for him.
Reply With Quote
  #6 (permalink)  
Old 11-03-08, 19:27
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
Quote:
Originally Posted by mike_bike_kite
Rudy -- what makes you feel it's a real world problem?
because i've read all his previous threads, which you obviously haven't

there is no "name, salary, and bonus" involved here

he's trying to simplify his real-world situation so as to direct our attention to the SQL and not the complexity of his database

now, having said that, i don't particularly like it when posters "dumb down" their questions with over-simplified examples

what i usually do is answer the over-simplified example, and if the poster cannot translate that back into his real-world scenario, that's his problem


still think it's homework, mike?

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #7 (permalink)  
Old 11-03-08, 19:51
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by r937
because i've read all his previous threads, which you obviously haven't

there is no "name, salary, and bonus" involved here
I simply answer the questions as they're posted - if someone wants to paint their horse in black and white stripes then they shouldn't be surprised if people see a zebra.
Reply With Quote
  #8 (permalink)  
Old 11-03-08, 20:10
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
yup, as i said, i always try to answer the question as asked

but i'm very careful to avoid accusing someone of foisting homework questions on us unless i have pretty clear evidence

and in this case there's plenty of evidence to the contrary

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #9 (permalink)  
Old 11-03-08, 23:47
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
as r937 stated, this question posted here is a simplified real world problem. I an working on a Text retrieval system. I am new to MySQL. Whenevern I figure out a statement with great effort, I always suspect whether it is the best one --consuming the least time to execute.

Even some cases, I failed to figure out a statement to implement queries. Honestly, without the suggestions and solutions posted by you on dbforums.com, i would have not made progress on my project. Thanks r937 and all friends who ever helped me on this forum.

Ok I try to solve this problem myself using following statement. My friends, please see if it is the best one.
Code:
SELECT
     tb1.name, (tb1.salary+tb2.bonus) as remuneration
FROM
     tb1
LEFT JOIN
     tb2, tb3 on tb2.Name=tb1.Name AND tb1.Name=tb3.Name

Last edited by cy163; 11-04-08 at 00:38.
Reply With Quote
  #10 (permalink)  
Old 11-04-08, 03:10
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
the very first thing you should do when writing an SQL statement is not to ask us if it is the best one, but to test it

what happened when you tested that query?

i have a feeling that it won't even run, because of incorrect syntax
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #11 (permalink)  
Old 11-04-08, 04:24
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Rudy is quite correct and the syntax is wrong. If you try running it through MySQL you get the error:
Code:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ' tb3 on tb2.Name=tb1.Name AND tb1.Name=tb3.Name' at line
You should always go for readability in your code before efficiency. Though getting the syntax correct is very important too. If your query is too slow then start to look at tuning the query. How long does it take to run at the moment with your data? It ran instantly for me as I just got the above error.

Do you really want to use tb3 to store the keys of the items you're interested in? This means you have to insert them into the table, run your query and then delete them afterwards. What would happen if 2 people tried to run this query at the same time? You'll either want to store another user name in t3 or perhaps just build the query on the fly.

I'd seriously consider buying a good book on SQL - you'll learn so much faster. Once you've mastered SQL you can move onto getting a book or two on database design and then perhaps performance tuning.

Give us real examples with real data - it stops any confusion.
Reply With Quote
  #12 (permalink)  
Old 11-04-08, 05:05
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
Quote:
Originally Posted by mike_bike_kite
Once you've mastered SQL
How long does that take? I've been at it for ages now and I'm still nowhere near expert status!
__________________
George
Twitter | Blog
Reply With Quote
  #13 (permalink)  
Old 11-04-08, 07:15
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
you are too!!
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #14 (permalink)  
Old 11-04-08, 07:29
gvee gvee is offline
www.gvee.co.uk
 
Join Date: Jan 2007
Location: UK
Posts: 10,156
Quote:
Originally Posted by r937
you are too!!
Sweet of you to say, but I'm still at apprentice status
__________________
George
Twitter | Blog
Reply With Quote
  #15 (permalink)  
Old 11-04-08, 09:42
cy163 cy163 is offline
Registered User
 
Join Date: Apr 2007
Posts: 127
Quote:
Originally Posted by r937
the very first thing you should do when writing an SQL statement is not to ask us if it is the best one, but to test it

what happened when you tested that query?

i have a feeling that it won't even run, because of incorrect syntax

sorry i type the statement from memory. The actual statement is as follows
Code:
SELECT
     tb1.name, (tb1.salary+tb2.bonus) as remuneration
FROM
     tb1
LEFT JOIN
     (tb2, tb3) on (tb2.Name=tb1.Name AND tb1.Name=tb3.Name)
It took 23 seconds on a PowerPC P4 server.
there are 3 million records in tb1, 40 records in tb2, and 100,000 records in tb3. I have created index on name field for all the 3 tables.

I would like to know if there is more efficient statement than the above.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On