Results 1 to 4 of 4
  1. #1
    Join Date
    Mar 2010
    Posts
    8

    Unanswered: performance of IN clause

    I need to update the status of some products and I'm doing it using the IN clause

    Here's what I'm doing:

    Code:
    UPDATE products SET status = 0 WHERE (feedid = 2 AND sku NOT IN ('C37CDCV6PM037','56031003','K301799','6924'...many more skus));
    in the test I'm doing I have a list of ~5000 SKU codes, and it takes more than 4 minutes (~300.000 records in the table, but it's destined to grow).

    There's an index on the SKU column, any ideas how I could tweak this?

  2. #2
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by uhm View Post
    any ideas how I could tweak this?
    You haven't provide the indexes on the table nor the table definition so it's difficult to provide a solution. The index on the sku column probably won't help as you're using NOT IN (it's easy to find page 5 in a book but imagine trying to find all the pages that are not page 5). A few questions:
    • How many rows are in the products table?
    • Is there an index on feedid?
    • How many different values would feedid hold in the products table?
    • Why do you have a a long list of sku codes as a hard coded list - shouldn't they be in a table somewhere?
    • How many rows are being updated here and does the update happen often?

  3. #3
    Join Date
    Mar 2010
    Posts
    8
    Quote Originally Posted by mike_bike_kite View Post
    You haven't provide the indexes on the table nor the table definition so it's difficult to provide a solution. The index on the sku column probably won't help as you're using NOT IN (it's easy to find page 5 in a book but imagine trying to find all the pages that are not page 5). A few questions:
    • How many rows are in the products table?
    • Is there an index on feedid?
    • How many different values would feedid hold in the products table?
    • Why do you have a a long list of sku codes as a hard coded list - shouldn't they be in a table somewhere?
    • How many rows are being updated here and does the update happen often?
    - the products table has about 300.000 rows now, but it's just a test, this value should grow to several millions.
    - yes, there's an index on feedid
    - feedid currently has only 2 different numeric values, it will hold max few hundreds
    - I hardcoded the skus there because it was easier to read, but they are stored in a text field in another table ( a single text field containing all the skus)
    - the update should happen once everyday, I don't have a realistic figure of how many skus will need to be updated every day for each feed, I suppose
    most times the number will be rather low, few tens maybe.
    The problem is reversed, I only have a list of those that don't need to be updated, hence the NOT IN clause.

  4. #4
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by uhm View Post
    - yes, there's an index on feedid
    - feedid currently has only 2 different numeric values, it will hold max few hundreds
    The index will never be called if it only holds two different values - the optimiser will rightly figure that it's better to table scan. If it's a test system then I'd add around 20 different feedids, redo the indexes and see how it works then.

    Quote Originally Posted by uhm View Post
    - I hardcoded the skus there because it was easier to read, but they are stored in a text field in another table ( a single text field containing all the skus)
    I assume the concept of normalisation is a bit alien on your planet? I'd read up on first normal form and repeating groups.

    Quote Originally Posted by uhm View Post
    - the update should happen once everyday, I don't have a realistic figure of how many skus will need to be updated every day for each feed, I suppose most times the number will be rather low, few tens maybe.
    Why not check that the status is not already 0 in your where clause. Then to make things faster still just add the status to your index with the feedid. Now the update will only try to update the records it needs to. New index would be (feedid, status).

    Quote Originally Posted by uhm View Post
    but it's destined to grow
    If the new query runs ok why not fill your database with a few million rows and see how the new update behaves then. It's just a test system so it shouldn't be hard to do.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •