Results 1 to 15 of 15
  1. #1
    Join Date
    Sep 2009
    Posts
    21

    Unanswered: Help on inefficent SQL query

    I am totally aware that the SQL I have written is rubbish but I am still quite new to this (writing SQL not being my main job) so haven't learnt how to do things properly!

    So I have a number of tables:

    Invoices:
    invoice_id, category_id, team_id, status, manager_id, ....

    Users:
    user_id, name, band, ....

    User_Category_Team:
    user_id, category_id, team_id

    Each user can be assigned to a number of different categories and teams and there may be more than 1 manager assigned to the same categorys and teams.

    When invoices are imported into the system I have to work out which is the most appropriate manager of the invoice - ie mapped to the same team and category as the invoice and having the lowest band.

    At the moment in VB I go through the table 1 invoice at a time working this out (seems hugely inefficient)


    for each invoice_id in Invoices

    sql = "UPDATE Invoices SET manager_id = (SELECT TOP(1) Users.user_id FROM Invoices LEFT JOIN User_Category_Team ON User_Category_Team.category_id = Invoices.category_id AND User_Category_Team.team_id = Invoices.team_id
    INNER JOIN Users ON Users.user_id = User_Category_Team.user_id
    WHERE User_Category_Team.team_id = Invoices.team_id AND User_Category_Team.category_id = Invoices.category_id AND invoice_id = '" & invoice_id & "' ORDER BY Users.band ASC) WHERE invoice_id = '" & invoice_id & "'"
    executesql(sql)


    Because the select can return quite a few managers and I only want the first, I didn't know how I could combine this into one large do-all-at-once type query entirely in SQL and not using VB.

    Any ideas?

  2. #2
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Because the select can return quite a few managers and I only want the first, I didn't know how I could combine this into one large do-all-at-once type query entirely in SQL and not using VB.
    What exactly is the problem? Your query looks good to me, except for switching to INNER JOIN after a LEFT OUTER JOIN. Try this one:
    Code:
    UPDATE Invoices 
    SET manager_id = (SELECT TOP 1 Users.user_id 
    		FROM Invoices 
    			LEFT OUTER JOIN User_Category_Team ON 
    				User_Category_Team.category_id = Invoices.category_id 
    				AND User_Category_Team.team_id = Invoices.team_id
    			LEFT OUTER JOIN Users ON 
    				Users.user_id = User_Category_Team.user_id
    		WHERE User_Category_Team.team_id = Invoices.team_id 
    			AND User_Category_Team.category_id = Invoices.category_id 
    			AND invoice_id = @invoice_id 
    		ORDER BY Users.band ASC) 
    WHERE invoice_id = @invoice_id
    Can you place your code between [CODE] tags to increase readability?

    Before you start embedding SQL code in VB-scripts. Test them as pure SQL-statements, like the one above. Only when that one works, start embedding it in VB.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  3. #3
    Join Date
    Sep 2009
    Posts
    21
    Hi, thanks for your reply. My problem is timing. The code I pasted is in use currently but the application user is complaining that it can take up to an hour to import 2000 invoices! With some timing tests I narrowed it down to this loop taking the majority of the time.

    I tried your suggestion of using LEFT OUTER instead of INNER but no noticable speed increase. I think the main problem is that it is switching between VB and SQL, so I'd hoped there would be one big SQL statement that would update all invoice records at once - that should massively reduce the overheads. But I don't know SQL well enough to be able to do that.

    I have also been playing with just the SELECT part of the statement directly on the server, just for 1 invoice and doing some timing.

    Code:
    SELECT TOP 1 Users.user_id FROM Invoices 
    	LEFT OUTER JOIN User_Category_Team ON 
    		User_Category_Team.category_id = Invoices.category_id 
    		AND User_Category_Team.team_id = Invoices.team_id
    	LEFT OUTER JOIN Users ON 
    		Users.user_id = User_Category_Team.user_id
    WHERE User_Category_Team.team_id = Invoices.team_id 
    	AND User_Category_Team.category_id = Invoices.category_id 
    	AND invoice_id = @invoice_id 
    ORDER BY Users.band ASC
    and if I remove TOP(1) the speed up is massive. For example with the top, the actual SQL I run (many more fields and joins than the example - I simplified it whilst still showing same behaviour) can take 12 seconds, but without the TOP, I get the entire list in 1 second. I don't understand that at all.

  4. #4
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    The TOP or not TOP question probably boils down to parallelism. TOP is a serial operation, so all the parallel streams have to stop what they are doing, get together, and figure out which row is the top row.

    As for the overall problem, I am thinking there is a problem of indexes. Any way you can get the query plan for this thing? Run this:
    Code:
    set showplan_xml on
    go
    SELECT TOP 1 Users.user_id FROM Invoices 
    	LEFT OUTER JOIN User_Category_Team ON 
    		User_Category_Team.category_id = Invoices.category_id 
    		AND User_Category_Team.team_id = Invoices.team_id
    	LEFT OUTER JOIN Users ON 
    		Users.user_id = User_Category_Team.user_id
    WHERE User_Category_Team.team_id = Invoices.team_id 
    	AND User_Category_Team.category_id = Invoices.category_id 
    	AND invoice_id = @invoice_id 
    ORDER BY Users.band ASC
    Naturally, you will need to declare and set @invoice_id.

  5. #5
    Join Date
    Sep 2009
    Posts
    21
    Thanks for the comments re TOP or not.

    I ran your showplan_xml - what on earth is all that! Unfortunately I cannot share my actual document with you are it contains customer sensitve information - I'm guessing this is somehow telling me what indexes I should have? What should I be looking at / focusing on?

    Is there definately no way that I can combine 2000 SQL statements executed from VB into one SQL statement?

  6. #6
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    That is the query plan in XML format. Unfortunately, it is the only way to share the plan to other people. The alternatives require a bit more work (and a test system). You could add indexes in a trial and error method, but that would take a great deal of time. Record this query running with Profiler. Feed that trace into Database Tuning Advisor. This tool may show a few indexes that will help the query out a great deal.

  7. #7
    Join Date
    Sep 2009
    Posts
    21
    Found a definate improvement ....

    Code:
    declare @invoice_id int;
    declare @max_id int;
    declare @num_modd int;
    set @invoice_id = (select min(invoice_id) from Invoices );
    set @max_id = (select max(invoice_id) from Invoices );
    set @num_modd = 0;
    while (@invoice_id <= @max_id)
    begin
    UPDATE Invoices 
    SET manager_id = (SELECT TOP 1 Users.user_id 
    		FROM Invoices 
    			LEFT OUTER JOIN User_Category_Team ON 
    				User_Category_Team.category_id = Invoices.category_id 
    				AND User_Category_Team.team_id = Invoices.team_id
    			LEFT OUTER JOIN Users ON 
    				Users.user_id = User_Category_Team.user_id
    		WHERE User_Category_Team.team_id = Invoices.team_id 
    			AND User_Category_Team.category_id = Invoices.category_id 
    			AND invoice_id = @invoice_id 
    		ORDER BY Users.band ASC) 
    WHERE invoice_id = @invoice_id
    set @invoice_id = @invoice_id + 1;
    set @num_modd = @num_modd + 1;
    end;
    select @num_modd;
    which doesn't need vb now - 5 mins down to 16 seconds on my simple example. My more complicated example is still quite long, so I think I do need to look into indexes and understand them. Also someone mentioned else preparing an sql statement, so I'll try to get my head round that also.

  8. #8
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    I had responded on this the other day, but I think this is when my router died and it never got saved.
    One thing I noticed that is probably causing some performance issues is that you are turning a left outer join back into an inner join.
    (SELECT TOP 1 Users.user_id
    FROM Invoices
    LEFT OUTER JOIN User_Category_Team ON
    User_Category_Team.category_id = Invoices.category_id
    AND User_Category_Team.team_id = Invoices.team_id
    LEFT OUTER JOIN Users ON
    Users.user_id = User_Category_Team.user_id
    WHERE User_Category_Team.team_id = Invoices.team_id
    AND User_Category_Team.category_id = Invoices.category_id
    AND invoice_id = @invoice_id
    ORDER BY Users.band ASC)
    Try:
    Code:
     
    (SELECT TOP 1 Users.user_id 
         FROM Invoices 
    LEFT OUTER JOIN User_Category_Team
        ON User_Category_Team.category_id = Invoices.category_id 
       AND User_Category_Team.team_id = Invoices.team_id
    LEFT OUTER JOIN Users
        ON Users.user_id = User_Category_Team.user_id
    WHERE invoice_id = @invoice_id 
    ORDER BY Users.band ASC)
    Dave Nance

  9. #9
    Join Date
    Sep 2009
    Posts
    21
    Thanks Dave - good spot!

    I made another dramatic speed increase by looking at the execution plan with and without top(1) - and they were very different. I then made a view which merged some of the tables and then used the view in the main query, and then the difference between with and without TOP(1) was not there, and everything worked a huge amount faster - in my real life situation, down from > 11hours to 15 mins!

    I'm sure indexes also would have helped but having done a brief amount of reading, I understand a single index but lose it when it comes to making indexes of multiple columns.

    Anyway all good now, thanks to all for their help.

  10. #10
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    indexes on multiple columns are not so different from single column indexes. Think of a phone book (if you are old enough to remember them). it is effectively an index on last and first name.

  11. #11
    Join Date
    Sep 2009
    Posts
    21
    It sounds so simple when you say it like that :-)

  12. #12
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Quote Originally Posted by Rachel_B View Post
    It sounds so simple when you say it like that :-)
    Stop flirting on this forum!

    And it's not because no one else has ever flirted with me on this forum that I'm saying this. Honestly, I'm not like that. Why has no one ever wanted to flirt with me here!? What does MCrowley have that I don't?
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  13. #13
    Join Date
    Sep 2009
    Posts
    21
    It wasn't intended as a flirt! It was just a comment! The smiley face was only because I felt a bit stupid.

  14. #14
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    Aww, It'll be OK, Wim. Do you want me to flirt with you? ;-)

  15. #15
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Quote Originally Posted by MCrowley View Post
    Aww, It'll be OK, Wim. Do you want me to flirt with you? ;-)
    Anything is better than being ignored, so yes!

    Though, I already feel a lot better since I read Rachel's response: after her flirt with you, she felt a bit stupid. And it wasn't even a real flirt, she was commenting on you. My wife "comments" on me A LOT, so the more I read on this thread, the faster my jealousy fades.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •