Results 1 to 15 of 15
  1. #1
    Join Date
    Apr 2009
    Posts
    9

    Unanswered: Can Anybody speed this query up?

    Hi

    I have the following table structure:

    table: Customer
    column: customer_id (pk)
    column: regTime (int field. time user registered)

    table: Order_sum
    column: customer_id (fk)
    column: status

    A customer can have multiple orders all with different statuses. 1 to 10.

    I need to return a list of customer_ids where the customer has registered over 30 days ago AND who hasn't got any orders of status 2, 3 or 10. So if they have an order of status 1 we want to include them. If the customer has two orders with statuses 1 and 2, we don't it in the results.

    The SQL i have that works is:

    Code:
    SELECT DISTINCT customer.customer_id
    FROM customer
    WHERE customer .regTime < ( UNIX_TIMESTAMP( ) - ( 60 *60 *24 *30 ) )
    AND NOT
    EXISTS (
       SELECT 1
       FROM order_sum
       WHERE order_sum.customer_id = customer.customer_id
       AND order_sum.status IN ( 2, 3, 10 )
    )
    but it takes a long time to run. about 30 seconds.

    My customer table has 1717 rows. The order_sum table has 1952 rows.

    Can anybody speed my query up please ?

    Thanks

    Darren

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Code:
    SELECT customer.customer_id
      FROM customer
    LEFT OUTER
      JOIN order_sum
        ON order_sum.customer_id = customer.customer_id
       AND order_sum.status IN ( 2, 3, 10 )
     WHERE customer.regTime < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
       AND order_sum.customer_id IS NULL
    make sure you have an index declared on customer.regTime and another on order_sum.customer_id

    it might be even faster if the index were a compound index declared on (customer_id,status)

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Nov 2008
    Posts
    7
    Try this:
    Code:
    SELECT 
    	DISTINCT cr.customer_id
    FROM 
    	customer cr
    	JOIN
    	order_sum osm
    	ON(osm.customer_id = cr.customer_id)
    WHERE 
    	datediff(now(), cr.regTime) > 30
    	AND osm.status NOT IN (2,3,10);
    and

    Code:
    SELECT 
    	DISTINCT cr.customer_id
    FROM 
    	customer cr
    	JOIN
    	order_sum osm
    	ON(osm.customer_id = cr.customer_id AND datediff(now(), cr.regTime) > 30)
    WHERE 
    	osm.status NOT IN (2,3,10);
    I'm not sure whether it really is faster, so you'll have to try that, but simple joins are fairly fast in MySQL and this way of writing them comes more natural to me.
    Make sure to check on you indexes as r937 suggested. You should also have an index on customer.customer_id, particularly if that's the unique row identifier.
    Last edited by slapo; 04-11-09 at 14:58.

  4. #4
    Join Date
    Apr 2009
    Posts
    9
    Thanks for your help guys.

    I havent got round to testing whose is the fastest query just yet as when i put an index on order_sum.customer_id my query worked within 2 seconds!!

    Thanks

    Darren

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    regarding performance...

    in general, when you place a function on a table column, the optimizer cannot utilize an index on that column

    compare my condition --
    Code:
    customer.regTime < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
    with this --
    Code:
    datediff(now(), cr.regTime) > 30
    this second condition will fail to work efficiently

    in fact, it will actually fail to work correctly, as well, because it does not take into consideration that regTime is actually an integer, and not a date which can be used in the DATEDIFF function

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  6. #6
    Join Date
    Apr 2009
    Posts
    9
    Quote Originally Posted by r937
    Code:
    SELECT customer.customer_id
      FROM customer
    LEFT OUTER
      JOIN order_sum
        ON order_sum.customer_id = customer.customer_id
       AND order_sum.status IN ( 2, 3, 10 )
     WHERE customer.regTime < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
       AND order_sum.customer_id IS NULL
    make sure you have an index declared on customer.regTime and another on order_sum.customer_id

    it might be even faster if the index were a compound index declared on (customer_id,status)

    Hi

    How does the above take into account we DONT want to return records if they have a status 2,3,10? we only want to return the customer id if it doesnt have an order of 2,3,10. so orders with status 1,2 would not be return the customer id.

    Thanks

  7. #7
    Join Date
    Apr 2009
    Posts
    9
    Ah got it.

    It uses a left outer join so may or may not include any orders that have status 2,3,10.

    then we just say we want the ones that have no ordersum.customerid.

    Nice work

  8. #8
    Join Date
    Apr 2009
    Posts
    9
    Hi

    Thanks for the help so far r937, it works a treat.

    How would i change the sql to say:

    Return those customers who have made ONE order of status (2,3,10) and that order was over 30 days ago?

    So they might have 3 orders, two are status 1, one is status 2. This needs to be included.
    If they have 3 orders, one of status 1 and two of status 2. This is NOT included.
    This is on order time now not customer reg time.
    AND if they have made an order (2,3,10) within 30 days the customer is not added.

    Im finding it complicated to explain it, no wonder I cant write the SQL !!

    Thanks for any help

  9. #9
    Join Date
    Nov 2008
    Posts
    7
    Quote Originally Posted by r937
    regarding performance...

    in general, when you place a function on a table column, the optimizer cannot utilize an index on that column

    compare my condition --
    Code:
    customer.regTime < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
    with this --
    Code:
    datediff(now(), cr.regTime) > 30
    this second condition will fail to work efficiently

    in fact, it will actually fail to work correctly, as well, because it does not take into consideration that regTime is actually an integer, and not a date which can be used in the DATEDIFF function

    That's right, I failed to notice the datatype of regTime and its possible values :P
    The point about the performance should be correct as well.

    Sorry DazlerD and thanks for the correction, r937

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by DazlerD
    Return those customers who have made ONE order of status (2,3,10) and that order was over 30 days ago?
    Code:
    SELECT customer.customer_id
      FROM customer
    INNER
      JOIN order_sum
        ON order_sum.customer_id = customer.customer_id
       AND order_sum.status IN ( 2, 3, 10 )
       AND order_sum.time < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
    GROUP
        BY customer.customer_id
    HAVING COUNT(*) = 1
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Apr 2009
    Posts
    9
    Brilliant r937, thanks a lot.

    Ive been looking at this for the last hour, starting from the beginning then adding bits and doing some testing all the way along.

    I did work out the inner join so happy with that !!

    Thanks again to both of you for your advice.

    Have a good weekend.

    D

  12. #12
    Join Date
    Apr 2009
    Posts
    9
    Hi

    I copied the results into a table and the wrote some sql to check the results:

    SELECT sum.cart_order_id, sum.customer_id, sum.status, from_unixtime(sum.time)
    FROM order_sum sum, PUR1_01 pur
    WHERE sum.customer_id = pur.customer_id
    and sum.status in (2,3,10)

    return 889 results

    SELECT distinct(sum.customer_id)
    FROM order_sum sum, PUR1_01 pur
    WHERE sum.customer_id = pur.customer_id
    and sum.status in (2,3,10)

    returned 885 records.

    The sql is returning where the customer has ONE order with a status 2,3,10 more than 30 days ago. BUT the ones i found have another order within 30 days.

    I only want those customers who have ONE order of 2,3,10 and that order is > 30 days ago.

    Then i need to do another query where the cust has made TWO orders of 2,3,10 and BOTH those orders are > 90 days.

    See I told you i found it hard to explain !!!

    Thanks

    D

  13. #13
    Join Date
    Apr 2009
    Posts
    9
    My results return

    cust_id order date
    628 2008-12-06 14:53:55
    628 2009-07-03 17:09:44

    How can this query

    Code:
    SELECT customer.customer_id
      FROM customer
    INNER
      JOIN order_sum
        ON order_sum.customer_id = customer.customer_id
       AND order_sum.status IN ( 2, 3, 10 )
       AND order_sum.time < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
    GROUP
        BY customer.customer_id
    HAVING COUNT(*) = 1
    be changed to not return any results that have an order in the last 30 days?

    Thanks

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Code:
    SELECT customer.customer_id
      FROM customer
    INNER
      JOIN order_sum
        ON order_sum.customer_id = customer.customer_id
    GROUP
        BY customer.customer_id
    HAVING 1 =
           COUNT(CASE WHEN order_sum.status IN ( 2, 3, 10 )
                       AND order_sum.time < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
                      THEN 'older'
                      ELSE NULL END )
       AND 0 =
           COUNT(CASE WHEN order_sum.status IN ( 2, 3, 10 )
                       AND order_sum.time >= UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
                      THEN 'newer'
                      ELSE NULL END )
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Apr 2009
    Posts
    9
    Hi r937

    I've implemented and tested your sql statement.

    Sweet man, it works a treat. Ive just got to understand it now !!!

    Thanks ever so much. Much appreciated.

    D

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •