| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

04-09-09, 08:33
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 9
|
|
|
Can Anybody speed this query up?
|
|
Hi
I have the following table structure:
table: Customer
column: customer_id (pk)
column: regTime (int field. time user registered)
table: Order_sum
column: customer_id (fk)
column: status
A customer can have multiple orders all with different statuses. 1 to 10.
I need to return a list of customer_ids where the customer has registered over 30 days ago AND who hasn't got any orders of status 2, 3 or 10. So if they have an order of status 1 we want to include them. If the customer has two orders with statuses 1 and 2, we don't it in the results.
The SQL i have that works is:
Code:
SELECT DISTINCT customer.customer_id
FROM customer
WHERE customer .regTime < ( UNIX_TIMESTAMP( ) - ( 60 *60 *24 *30 ) )
AND NOT
EXISTS (
SELECT 1
FROM order_sum
WHERE order_sum.customer_id = customer.customer_id
AND order_sum.status IN ( 2, 3, 10 )
)
but it takes a long time to run. about 30 seconds.
My customer table has 1717 rows. The order_sum table has 1952 rows.
Can anybody speed my query up please ?
Thanks
Darren
|
|

04-09-09, 09:17
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
Code:
SELECT customer.customer_id
FROM customer
LEFT OUTER
JOIN order_sum
ON order_sum.customer_id = customer.customer_id
AND order_sum.status IN ( 2, 3, 10 )
WHERE customer.regTime < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
AND order_sum.customer_id IS NULL
make sure you have an index declared on customer.regTime and another on order_sum.customer_id
it might be even faster if the index were a compound index declared on (customer_id,status)

|
|

04-11-09, 13:54
|
|
Registered User
|
|
Join Date: Nov 2008
Posts: 7
|
|
|
|
Try this:
Code:
SELECT
DISTINCT cr.customer_id
FROM
customer cr
JOIN
order_sum osm
ON(osm.customer_id = cr.customer_id)
WHERE
datediff(now(), cr.regTime) > 30
AND osm.status NOT IN (2,3,10);
and
Code:
SELECT
DISTINCT cr.customer_id
FROM
customer cr
JOIN
order_sum osm
ON(osm.customer_id = cr.customer_id AND datediff(now(), cr.regTime) > 30)
WHERE
osm.status NOT IN (2,3,10);
I'm not sure whether it really is faster, so you'll have to try that, but simple joins are fairly fast in MySQL and this way of writing them comes more natural to me.
Make sure to check on you indexes as r937 suggested. You should also have an index on customer.customer_id, particularly if that's the unique row identifier.
|
Last edited by slapo; 04-11-09 at 13:58.
|

04-16-09, 05:03
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 9
|
|
Thanks for your help guys.
I havent got round to testing whose is the fastest query just yet as when i put an index on order_sum.customer_id my query worked within 2 seconds!!
Thanks
Darren
|
|

04-16-09, 08:12
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
regarding performance...
in general, when you place a function on a table column, the optimizer cannot utilize an index on that column
compare my condition --
Code:
customer.regTime < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
with this --
Code:
datediff(now(), cr.regTime) > 30
this second condition will fail to work efficiently
in fact, it will actually fail to work correctly, as well, because it does not take into consideration that regTime is actually an integer, and not a date which can be used in the DATEDIFF function

|
|

04-16-09, 12:52
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 9
|
|
Quote:
|
Originally Posted by r937
Code:
SELECT customer.customer_id
FROM customer
LEFT OUTER
JOIN order_sum
ON order_sum.customer_id = customer.customer_id
AND order_sum.status IN ( 2, 3, 10 )
WHERE customer.regTime < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
AND order_sum.customer_id IS NULL
make sure you have an index declared on customer.regTime and another on order_sum.customer_id
it might be even faster if the index were a compound index declared on (customer_id,status)

|
Hi
How does the above take into account we DONT want to return records if they have a status 2,3,10? we only want to return the customer id if it doesnt have an order of 2,3,10. so orders with status 1,2 would not be return the customer id.
Thanks
|
|

04-16-09, 12:55
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 9
|
|
Ah got it.
It uses a left outer join so may or may not include any orders that have status 2,3,10.
then we just say we want the ones that have no ordersum.customerid.
Nice work 
|
|

04-17-09, 07:14
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 9
|
|
Hi
Thanks for the help so far r937, it works a treat.
How would i change the sql to say:
Return those customers who have made ONE order of status (2,3,10) and that order was over 30 days ago?
So they might have 3 orders, two are status 1, one is status 2. This needs to be included.
If they have 3 orders, one of status 1 and two of status 2. This is NOT included.
This is on order time now not customer reg time.
AND if they have made an order (2,3,10) within 30 days the customer is not added.
Im finding it complicated to explain it, no wonder I cant write the SQL !!
Thanks for any help
|
|

04-17-09, 08:41
|
|
Registered User
|
|
Join Date: Nov 2008
Posts: 7
|
|
Quote:
|
Originally Posted by r937
regarding performance...
in general, when you place a function on a table column, the optimizer cannot utilize an index on that column
compare my condition --
Code:
customer.regTime < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
with this --
Code:
datediff(now(), cr.regTime) > 30
this second condition will fail to work efficiently
in fact, it will actually fail to work correctly, as well, because it does not take into consideration that regTime is actually an integer, and not a date which can be used in the DATEDIFF function

|
That's right, I failed to notice the datatype of regTime and its possible values :P
The point about the performance should be correct as well.
Sorry DazlerD and thanks for the correction, r937 
|
|

04-17-09, 08:48
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
Quote:
|
Originally Posted by DazlerD
Return those customers who have made ONE order of status (2,3,10) and that order was over 30 days ago?
|
Code:
SELECT customer.customer_id
FROM customer
INNER
JOIN order_sum
ON order_sum.customer_id = customer.customer_id
AND order_sum.status IN ( 2, 3, 10 )
AND order_sum.time < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
GROUP
BY customer.customer_id
HAVING COUNT(*) = 1

|
|

04-17-09, 12:05
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 9
|
|
Brilliant r937, thanks a lot.
Ive been looking at this for the last hour, starting from the beginning then adding bits and doing some testing all the way along.
I did work out the inner join so happy with that !!
Thanks again to both of you for your advice.
Have a good weekend.
D
|
|

04-17-09, 12:33
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 9
|
|
Hi
I copied the results into a table and the wrote some sql to check the results:
SELECT sum.cart_order_id, sum.customer_id, sum.status, from_unixtime(sum.time)
FROM order_sum sum, PUR1_01 pur
WHERE sum.customer_id = pur.customer_id
and sum.status in (2,3,10)
return 889 results
SELECT distinct(sum.customer_id)
FROM order_sum sum, PUR1_01 pur
WHERE sum.customer_id = pur.customer_id
and sum.status in (2,3,10)
returned 885 records.
The sql is returning where the customer has ONE order with a status 2,3,10 more than 30 days ago. BUT the ones i found have another order within 30 days.
I only want those customers who have ONE order of 2,3,10 and that order is > 30 days ago.
Then i need to do another query where the cust has made TWO orders of 2,3,10 and BOTH those orders are > 90 days.
See I told you i found it hard to explain !!!
Thanks
D
|
|

07-13-09, 07:40
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 9
|
|
My results return
cust_id order date
628 2008-12-06 14:53:55
628 2009-07-03 17:09:44
How can this query
Code:
SELECT customer.customer_id
FROM customer
INNER
JOIN order_sum
ON order_sum.customer_id = customer.customer_id
AND order_sum.status IN ( 2, 3, 10 )
AND order_sum.time < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
GROUP
BY customer.customer_id
HAVING COUNT(*) = 1
be changed to not return any results that have an order in the last 30 days?
Thanks
|
|

07-13-09, 07:57
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,534
|
|
Code:
SELECT customer.customer_id
FROM customer
INNER
JOIN order_sum
ON order_sum.customer_id = customer.customer_id
GROUP
BY customer.customer_id
HAVING 1 =
COUNT(CASE WHEN order_sum.status IN ( 2, 3, 10 )
AND order_sum.time < UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
THEN 'older'
ELSE NULL END )
AND 0 =
COUNT(CASE WHEN order_sum.status IN ( 2, 3, 10 )
AND order_sum.time >= UNIX_TIMESTAMP(CURRENT_DATE - INTERVAL 30 DAY)
THEN 'newer'
ELSE NULL END )

|
|

07-15-09, 06:44
|
|
Registered User
|
|
Join Date: Apr 2009
Posts: 9
|
|
Hi r937
I've implemented and tested your sql statement.
Sweet man, it works a treat. Ive just got to understand it now !!!
Thanks ever so much. Much appreciated.
D
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|