Results 1 to 12 of 12
  1. #1
    Join Date
    Jun 2012
    Location
    Greyton, South Africa
    Posts
    19

    Unanswered: 8000 records in over 12 minutes

    Hi All

    I have the following SQL select query:
    Code:
    SELECT 
    	inbound_emails.inbound_email_id, 
    	inbound_emails.campaign_id, 
    	campaign_header.campaign_name, 
    	customers.customer_name, 
    	inbound_emails.date_received, 
    	inbound_emails.email_from, 
    	inbound_emails.email_to, 
    	inbound_emails.email_cc, 
    	inbound_emails.email_subject, 
    	inbound_emails.email_source, 
    	inbound_emails.response_category, 
    	inbound_emails.email_body
    FROM 
    	((inbound_emails 
    		LEFT JOIN campaign_header 
    			ON inbound_emails.campaign_id = campaign_header.campaign_id) 
    			LEFT JOIN contracts 
    				ON campaign_header.contract_id = contracts.contract_id) 
    				LEFT JOIN customers 
    					ON contracts.customer_id = customers.customer_id
    ORDER BY inbound_emails.date_received DESC;
    Without any indexes (other than PK's) this query would take 1min 21 secs to return all 8500 records.

    I wasn't satisfied with that so I added indexes to the tables.

    Indexes are;

    inbound_emails:
    PK: inbound_email_id
    1. campaign_id (non-unique, non-clustered)
    2. email_from (non-unique, non-clustered)
    3. email_to (non-unique, non-clustered)

    campaign_header:
    PK: campaign_id
    1. campaign_name (unique, non-clustered)

    customers:
    PK: customer_id
    1. customer_name (unique, non-clustered)

    contracts:
    PK: contract_id
    1. contract_name (unique, non-clustered)
    2. customer_id (non-unique, non-clustered)

    Have I made a hash of the indexes? Or is the query statement badly written? What is a reasonable amount of time to expect to retrieve 8500 records?

    I think email_from and email_to on inbound_emails is the main culprit, but I don't understand why?

    Many thanks
    Jon
    Last edited by Jon von der Heyden; 05-28-14 at 07:27.

  2. #2
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    Have you looked at the execution plan?

    Generally lots of single column non-clustered indices are not good for performance What you want are indices that "cover" your query but do not cause excessive I/O during insert/update and delete operations. It is a balancing act.

    Also if you have a front end that can handle the sorting, dump the ORDER BY. It is an extra sort operation. If those left joined tables are not truly "optional" relationships, convert them to inner joins.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  3. #3
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I've reformatted and cleaned up your query just a bit. You will probably find that this generates a slightly cleaner execution plan.
    Code:
    SELECT 
       inbound_emails.inbound_email_id,  inbound_emails.campaign_id
    ,  campaign_header.campaign_name,    customers.customer_name
    ,  inbound_emails.date_received,     inbound_emails.email_from
    ,  inbound_emails.email_to,          inbound_emails.email_cc
    ,  inbound_emails.email_subject,     inbound_emails.email_source
    ,  inbound_emails.response_category, inbound_emails.email_body
       FROM  inbound_emails 
       LEFT JOIN campaign_header 
          ON (inbound_emails.campaign_id = campaign_header.campaign_id) 
       LEFT JOIN contracts 
          ON (campaign_header.contract_id = contracts.contract_id) 
       LEFT JOIN customers 
          ON (contracts.customer_id = customers.customer_id
       ORDER BY inbound_emails.date_received DESC;
    To optimize this query, the indexes that I'd suggest would be:
    Code:
    Duplicates Table           Columns
    Allowed    inbound_emails  date_received
    Allowed    campaign_header campaign_id
    Unique     contracts       contract_id
    Unique     clients         client_id
    As Thrasymachus suggested in his post, you need to look at the grand scheme of things to optimize indices for your entire database. Arranging covering indexes will help all of your queries, and extra indexes cost INSERT/UPDATE/DELETE performance without adding any SELECT benefit.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  4. #4
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    If considering only this query,
    the following compound indexes might be useful,
    because index only access except inbound_emails table might be possible.

    campaign_header:
    (campaign_id , campaign_name , contract_id)

    contracts:
    (contract_id , customer_id)

    customers:
    (customer_id , customer_name)


    But, as PatP described,
    ... extra indexes cost INSERT/UPDATE/DELETE performance without adding any SELECT benefit.

  5. #5
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Quote Originally Posted by Thrasymachus View Post
    ...
    ...
    ... If those left joined tables are not truly "optional" relationships, convert them to inner joins.
    No column of contracts table was not in SELECT list.
    So, this might be possible.
    (Though, I don't know about performance aspects.)
    Code:
     FROM
           inbound_emails
     LEFT  JOIN
           campaign_header
      ON   campaign_header.campaign_id = inbound_emails.campaign_id
     LEFT  JOIN
           contracts
     INNER JOIN
           customers
      ON   customers.customer_id = contracts.customer_id
      ON   contracts.contract_id = campaign_header.contract_id

  6. #6
    Join Date
    Jun 2012
    Location
    Greyton, South Africa
    Posts
    19
    Hi All

    Thanks for all the replies

    I'm still trying to get to grips with all of this and today has been my 1st intro to execution plans (thanks Thrasymachus).

    I note 12% cost allocated to the desc sort, so good call on that although I do quite need it ordered.

    I see 85% cost allocated to retrieval of records from inbound_emails, so it doesn't seem much cost at all joining to other tables to recall campaign name and customer name.

    That said I also had a crack at rewriting the query hoping for improvement:
    Code:
    select 
    	inbound_emails.inbound_email_id,ca.campaign_id,ca.campaign_name,ca.customer_name,inbound_emails.date_received,inbound_emails.email_from,inbound_emails.email_to,inbound_emails.email_cc,inbound_emails.email_subject,inbound_emails.email_source,inbound_emails.response_category,inbound_emails.email_body
    from
    	inbound_emails left join
    	(
    		select 
    			cu.customer_name,campaign_header.campaign_name, campaign_header.campaign_id
    		from
    			(
    				select 
    					customers.customer_name, contracts.contract_id
    				from
    					contracts 
    					inner join customers
    						on
    						contracts.customer_id=customers.customer_id
    			) as cu
    			inner join campaign_header
    				on 
    				cu.contract_id=campaign_header.contract_id
    	) as ca
    	on
    		inbound_emails.campaign_id=ca.campaign_id
    order by inbound_emails.date_received desc;
    I have yet to validate that this will give me exactly as I had before. It's execution time is 1m18s.

    @PatP - your code also executes in 1m18s.

    I ran my old code again. At it's worst today it took 17m. It currently executes in 1m26s.

    Must we assume that the performance differences are related to network performance?

    I have identified the main culprit as email_body - which is nvarchar(max) because it holds the email message body for all 8.5k emails. If I leave this field off execution is 3-6 seconds (which I think is far more respectable).

    How does one usually deal with fields like this?

    P.S: Apologies for messy code. I'm still finding my feet with regards to sql coding conventions...

  7. #7
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    NP. Every gig I end up at, I end up spending most of my dev time performance tuning.

    Network performance is just one of many possible variables. DB server load, blocking, tempDB contention, hanging transactions, orphan spids, disk hot spots, bad code, bad indices, network cards falling asleep etc.... every day is a new adventure.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  8. #8
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    Regarding varchar(max) I only use it judiciously and I always question it when I see it. I would ask you if there is any limit on the size of the input being fed into this field by the application that is populating it. Can you live with only 8000K characters? Do you need to support unicode?
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  9. #9
    Join Date
    Jun 2012
    Location
    Greyton, South Africa
    Posts
    19
    Good points. I need to research MS Outlook as that's where it's being sucked in from. Thanks.

  10. #10
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    outlook certainly allows for more than 8K characters. I was thinking of more of a web front end.

    if you can not get around it at the source, go after the destination. if whatever is consuming this query does not care about anything beyond the first 8K characters, trying casting the field in the SELECT clause. I am not sure this will work. If this a web page or a report, you can always just return the first 8K characters and provide them with a "more..." link or subreport that takes them to the full contents of the email.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  11. #11
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    As an aside, I would stay away from the nesting of the joins as you showed in the one update. In most cases what Pat gave you above is the preferred method as it makes it easy to read and maintain. Also, as was previously mentioned, remove Contracts table from your query as you are not selecting anything from it and you are not limiting the results, due to left joining on the key. Thrasy's last point is a great one, since you already posted that without that column we are talking seconds. I would even go with less than 8k, try giving the first 100 bytes, with option of being able to select the entire mail. Lastly, what about some type of boundary on emails to pull back? Right now with no where clause you are getting every email that ever existed. If you could put a low end boundary on say the "inbound_emails.date_received", then an index on this column defined as DESC would not only help pick out which emails from this table, but could allow you to skip the sort as well.
    Dave

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I think that the contracts table is needed to JOIN the customers table to retrieve the customer name. I included the inbound_emails.date_received in my suggested indexes, and I agree that using that column to filter the rows returned is probably a good idea.

    Depending on the format of the inbound_emails.email_body, cutting it down may be challenging. HTML can get crabby if you truncate it in the wrong place! You may need to get creative in how you handle this, and a lot depends on how smart/adaptive your client application can be at dealing with the truncation.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •