If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > performance of IN clause

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-26-10, 08:05
uhm uhm is offline
Registered User
 
Join Date: Mar 2010
Posts: 8
performance of IN clause

I need to update the status of some products and I'm doing it using the IN clause

Here's what I'm doing:

Code:
UPDATE products SET status = 0 WHERE (feedid = 2 AND sku NOT IN ('C37CDCV6PM037','56031003','K301799','6924'...many more skus));
in the test I'm doing I have a list of ~5000 SKU codes, and it takes more than 4 minutes (~300.000 records in the table, but it's destined to grow).

There's an index on the SKU column, any ideas how I could tweak this?
Reply With Quote
  #2 (permalink)  
Old 03-26-10, 09:37
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by uhm View Post
any ideas how I could tweak this?
You haven't provide the indexes on the table nor the table definition so it's difficult to provide a solution. The index on the sku column probably won't help as you're using NOT IN (it's easy to find page 5 in a book but imagine trying to find all the pages that are not page 5). A few questions:
  • How many rows are in the products table?
  • Is there an index on feedid?
  • How many different values would feedid hold in the products table?
  • Why do you have a a long list of sku codes as a hard coded list - shouldn't they be in a table somewhere?
  • How many rows are being updated here and does the update happen often?
__________________
Mike
Reply With Quote
  #3 (permalink)  
Old 03-26-10, 15:17
uhm uhm is offline
Registered User
 
Join Date: Mar 2010
Posts: 8
Quote:
Originally Posted by mike_bike_kite View Post
You haven't provide the indexes on the table nor the table definition so it's difficult to provide a solution. The index on the sku column probably won't help as you're using NOT IN (it's easy to find page 5 in a book but imagine trying to find all the pages that are not page 5). A few questions:
  • How many rows are in the products table?
  • Is there an index on feedid?
  • How many different values would feedid hold in the products table?
  • Why do you have a a long list of sku codes as a hard coded list - shouldn't they be in a table somewhere?
  • How many rows are being updated here and does the update happen often?
- the products table has about 300.000 rows now, but it's just a test, this value should grow to several millions.
- yes, there's an index on feedid
- feedid currently has only 2 different numeric values, it will hold max few hundreds
- I hardcoded the skus there because it was easier to read, but they are stored in a text field in another table ( a single text field containing all the skus)
- the update should happen once everyday, I don't have a realistic figure of how many skus will need to be updated every day for each feed, I suppose
most times the number will be rather low, few tens maybe.
The problem is reversed, I only have a list of those that don't need to be updated, hence the NOT IN clause.
Reply With Quote
  #4 (permalink)  
Old 03-26-10, 15:51
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by uhm View Post
- yes, there's an index on feedid
- feedid currently has only 2 different numeric values, it will hold max few hundreds
The index will never be called if it only holds two different values - the optimiser will rightly figure that it's better to table scan. If it's a test system then I'd add around 20 different feedids, redo the indexes and see how it works then.
Quote:
Originally Posted by uhm View Post
- I hardcoded the skus there because it was easier to read, but they are stored in a text field in another table ( a single text field containing all the skus)
I assume the concept of normalisation is a bit alien on your planet? I'd read up on first normal form and repeating groups.
Quote:
Originally Posted by uhm View Post
- the update should happen once everyday, I don't have a realistic figure of how many skus will need to be updated every day for each feed, I suppose most times the number will be rather low, few tens maybe.
Why not check that the status is not already 0 in your where clause. Then to make things faster still just add the status to your index with the feedid. Now the update will only try to update the records it needs to. New index would be (feedid, status).
Quote:
Originally Posted by uhm View Post
but it's destined to grow
If the new query runs ok why not fill your database with a few million rows and see how the new update behaves then. It's just a test system so it shouldn't be hard to do.
__________________
Mike
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On