Results 1 to 11 of 11
  1. #1
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912

    Unanswered: Execution Plans <> Proportionate Execution Times

    Hi

    I am slowly getting to grips with SQL Server. As a part of this, I have been attempting to work on producing more efficient queries. This post is regarding what appears to be a discrepancy between the SQL Server execution plan and the actual time taken by a query to run.

    My brief is to produce an attendance system for an education establishment (I presume you know I'm not an A-Level student completing a project ). Circa 1.5m rows per annum, testing with ~3m rows currently. College_Year could strictly be inferred from the AttDateTime however it is included as a field because it a part of just about every PK this table is ever likely to be linked to. Indexes are not fully optimised yet.

    Table:
    Code:
    CREATE TABLE [dbo].[AttendanceDets] (
    [College_Year] [smallint] NOT NULL ,
    [Group_Code] [char] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    [Student_ID] [char] (8) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    [Session_Date] [datetime] NOT NULL ,
    [Start_Time] [datetime] NOT NULL ,
    [Att_Code] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL 
    ) ON [PRIMARY]
    GO
     
    CREATE CLUSTERED INDEX [IX_AltPK_Clust_AttendanceDets] ON [dbo].[AttendanceDets]([College_Year], [Group_Code], [Student_ID], [Session_Date], [Att_Code]) ON [PRIMARY]
    GO
     
    CREATE INDEX [All] ON [dbo].[AttendanceDets]([College_Year], [Group_Code], [Student_ID], [Session_Date], [Start_Time], [Att_Code]) ON [PRIMARY]
    GO
     
    CREATE INDEX [IX_AttendanceDets] ON [dbo].[AttendanceDets]([Att_Code]) ON [PRIMARY]
    GO
    ALL inserts are via an overnight sproc - data comes from a third party system. Group_Code is 12 chars (no more no less), student_ID 8 chars (no more no less).

    I have created a simple sproc. I am using this as a benchmark against which I am testing my options. I appreciate that this sproc is an inefficient jack of all trades - it has been designed as such so I can compare its performance to more specific sprocs and possibly some dynamic SQL.

    Sproc:
    Code:
    CREATE PROCEDURE [dbo].[CAMsp_Att]
     
    @College_Year AS SmallInt,
    @Student_ID AS VarChar(8) = '________', 
    @Group_Code AS VarChar(12) = '____________', 
    @Start_Date AS DateTime = '1950/01/01', 
    @End_Date as DateTime = '2020/01/01', 
    @Att_Code AS VarChar(1) = '_'
     
    AS
     
    IF @Start_Date = '1950/01/01'
    SET @Start_Date = CAST(CAST(@College_Year AS Char(4)) + '/08/31' AS DateTime)
     
    IF @End_Date = '2020/01/01'
    SET @End_Date = CAST(CAST(@College_Year +1 AS Char(4)) + '/07/31' AS DateTime)
     
    SELECT College_Year, Group_Code, Student_ID, Session_Date, Start_Time, Att_Code
     
    FROM dbo.AttendanceDets 
     
    WHERE College_Year = @College_Year
    AND Group_Code LIKE @Group_Code
    AND Student_ID LIKE @Student_ID
    AND Session_Date <= @End_Date
    AND Session_Date >=@Start_Date
    AND Att_Code LIKE @Att_Code
    GO
    My confusion lies with running the below script with Show Execution Plan:
    Code:
    --SET SHOWPLAN_TEXT ON
    --Go
     
    DECLARE @Time as DateTime
     
    Set @Time = GetDate()
     
    select College_Year, group_code, Student_ID, Session_Date, Start_Time, Att_Code 
    from attendanceDets
    where College_Year = 2005 AND group_code LIKE '____________' AND Student_ID LIKE '________'
    AND Session_Date <= '2005-11-16' AND Session_Date >= '2005-11-16' AND Att_Code LIKE '_'
     
    Print 'First query took: ' + CAST(DATEDIFF(ms, @Time, GETDATE()) AS VarCHar(5)) + ' milli-Seconds'
     
    Set @Time = GetDate()
     
    EXEC CAMsp_Att @College_Year = 2005, @Start_Date = '2005-11-16', 
    @End_Date = '2005-11-16'
     
    Print 'Second query took: ' + CAST(DATEDIFF(ms, @Time, GETDATE()) AS VarCHar(5)) + ' milli-Seconds'
    GO
     
    --SET SHOWPLAN_TEXT OFF
    --GO
    The execution plan for the first query appears miles more costly than the sproc yet it is effectively the same query with no parameters. However, my understanding is the cached plan substitutes literals for parameters anyway. In any case - the first query cost is listed as 99.52% of the batch, the sproc 0.48% (comparing the IO, cpu costs etc support this). BUT the text output is:
    Code:
    (10639 row(s) affected)
     
    First query took: 596 milli-Seconds
     
    (10639 row(s) affected)
     
    Second query took: 2856 milli-Seconds
    I appreciate that logical and physical performance are not one and the same but can why is there such a huge discrepancy between the two? They are tested on a dedicated test server, and repeated running and switching the order of the queries elicits the same results.

    Sample data can be provided if requested but I assumed it would not shed much light.

    BTW - I know that additional indexes can bring the plans and execution time closer together - my question is more about the concept.

    If you've made it this far - many thanks.
    If you can enlighten me - infinite thanks.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  2. #2
    Join Date
    Jan 2004
    Location
    In a large office with bad lighting
    Posts
    1,040
    I notice that your stored proc specifically names the owner and table (dbo.AttendanceDets) but the in-line query only references the table name (AttendanceDets).

    Knowing that when the table owner (or schema owner in 2005) is missing, the query engine first looks for the table owned by the user executing the query, then looks for the table owned by dbo if it fails to find it by owner, does changing the in-line query to reflect both the owner and table give different execution timing?

    -- This is all just a Figment of my Imagination --

  3. #3
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    The all underscore GroupCode and StudentID is equivalent to just '%'. Undescore is the single character wildcard in SQL Server. This is going to play hell with your query plans, which get carried over to the next run of the procedure. Try either adding "WITH RECOMPILE" to the stored proce definition, or clearing the procedure cache with DBCC FREEPROCCACHE before each test. My guess is that you are getting stuck using the index to read the entire table, which will increase reads by good amount.

    Just remember, if you are reading the entire table anyway, you don't want to bother with the index.

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by MCrowley
    The all underscore GroupCode and StudentID is equivalent to just '%'. Undescore is the single character wildcard in SQL Server. This is going to play hell with your query plans, which get carried over to the next run of the procedure. Try either adding "WITH RECOMPILE" to the stored proce definition, or clearing the procedure cache with DBCC FREEPROCCACHE before each test. My guess is that you are getting stuck using the index to read the entire table, which will increase reads by good amount.

    Just remember, if you are reading the entire table anyway, you don't want to bother with the index.
    Cheers MCrowley. I will check out yours and tom's suggstions. I did mention (and then erased - decided I was verbose enough ) that I cleared the cache a few times to avoid bad plans.
    I used the '_________' as a bit of a test. Normally I would use '%' but I wanted to see (admittedly I should have bottomed this out before posting) if there was a difference. It is very simply based on (as I understand it - again untested )
    Code:
    Foo LIKE '%a%'
    being unsargable whereas
    Code:
    Bar LIKE '_a%'
    is sargable. I was testing the idea that maybe SQL Server would treat a fixed length wildcard differently more favourably than '%'. I should investigate one thing at a time really
    Testimonial:
    pootle flump
    ur codings are working excelent.

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Ok - bit embarrassed that I posted the SQL equivalent of "I wonder what happens if I....?" with the '_______' bit. Changed to '%' (according to execution plans this is better) in the sproc. Changed the script to:

    Code:
     --SET SHOWPLAN_TEXT ON
    --Go
     
    DBCC FREEPROCCACHE
    DECLARE @Time as DateTime
     
    Set @Time = GetDate()
     
    select College_Year, group_code, Student_ID, Session_Date, Start_Time, Att_Code 
    from dbo.attendanceDets
    where College_Year = 2005 AND group_code LIKE '%' AND Student_ID LIKE '%'
    AND Session_Date <= '2005-11-16' AND Session_Date >= '2005-11-16' AND Att_Code LIKE '_'
     
    Print 'First query took: ' + CAST(DATEDIFF(ms, @Time, GETDATE()) AS VarCHar(5)) + ' milli-Seconds'
     
    Set @Time = GetDate()
     
    EXEC CAMsp_Att @College_Year = 2005, @Start_Date = '2005-11-16', 
    @End_Date = '2005-11-16' with recompile
     
    Print 'Second query took: ' + CAST(DATEDIFF(ms, @Time, GETDATE()) AS VarCHar(5)) + ' milli-Seconds'
    GO
     
    --SET SHOWPLAN_TEXT OFF
    --GO
    Belt & braces with clearing the cache.
    I have dropped and rebuilt indexes a few times during other tests and the order indexes are built seems to make a difference (I would expect this if I built the clustered index anything other than first but I didn't - ah well). The plan indicates 70.97% and 29.03% respectively and again the IO and CPU costs seem to corrolate with this. BTW - both use index seeks (although the first query has little yellow circles with arrows pointing left superimposed on the top of the icons - can't find mention of these in BoL - what does that mean?).
    Result:
    Code:
     DBCC execution completed. If DBCC printed error messages, contact your system administrator.
    (10639 row(s) affected)
    First query took: 593 milli-Seconds
    (10639 row(s) affected)
    Second query took: 1640 milli-Seconds
    So the physical still doesn't marry with the logical.

    Quote Originally Posted by MCrowley
    Just remember, if you are reading the entire table anyway, you don't want to bother with the index.
    Nothing to remember here - means nothing to me - why don't I want to bother with the index?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Lol - tks BTW (how ungrateful of me to forget...apols)
    Testimonial:
    pootle flump
    ur codings are working excelent.

  7. #7
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    The little yellow circles indicate Parallelism in your query. The work is getting divided among the available porocessors. It usually means you are processing a ton of data.

    With the table scan, suppose you have a table that spans 800 pages (for round numbers). Suppose also you have an index that takes up 200 pages. If you do a table scan, you end up reading 800 pages. If you use an index seek in a place where you are processing the entire table, you are then looking at reading 1000 pages, or 25% more than you wanted/could. It can slow some queries down.

    I will have to look at the rest of what you have posted in a bit. These people where I work are always demanding such silly stuff. Do you have a minute? Why is my production system down? Do you have a backup? Geez. Never ends, I tell ya.

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Tks again. I removed the indexes and ran the queries. The plans and times converged a lot - the sproc became quite a bit faster and the query sloweed a little. And yes - they both used table scans.
    Your explanation makes total sense. I am trying to decide what it tells me.
    1) Dump the whole idea of a generic sproc
    2) Dump the table wide index
    3) Find some way to force the sproc to perform a table scan

    Quote Originally Posted by MCrowley
    I will have to look at the rest of what you have posted in a bit. These people where I work are always demanding such silly stuff. Do you have a minute? Why is my production system down? Do you have a backup? Geez. Never ends, I tell ya.
    Yeah - my employers are funny like that too. I turn up, I keep my pants on even when it's hot (uk pants = trousers BTW) and I barely sleep at all 9-5. What more do they want?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    I would actually try an index on just Session_Date. If it is used, you can actually add back the primary key, as it would not likely be used, unless the start and end dates are wildly far apart.

    One other thing, you can get a text representation of a showplan. I know it isn't as sexy and all, but it posts pretty well. Just use "set showplan_text on".

  10. #10
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by MCrowley
    I would actually try an index on just Session_Date. If it is used, you can actually add back the primary key, as it would not likely be used, unless the start and end dates are wildly far apart.

    One other thing, you can get a text representation of a showplan. I know it isn't as sexy and all, but it posts pretty well. Just use "set showplan_text on".
    Cheers
    I looked up a few bits on t'interweb and I was thinking of making date more central (i.e. using it as the first field in the index). I will check out your suggestion. Start and end dates would be unlikely to ever be more than a year apart however they would be that far apart for a lot of the reporting. I can see how that would be a consideration.
    I did use showplan BTW (see it commented out in the posts?) - I did find it handy for getting a good sight of the detail of the plan.

    Although we have got more into getting the initial index design right - it still surprises me that you cannot rely on the plans to at least infer comparative performance (e.g. query x will return more quickly than query y). I have observed it since I started the thread in another, older view that I was looking to optimise. The changes I made I expected to improve the plan. They didn't - they worsened it but the query executed much more quickly.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  11. #11
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    It's a difference between the Estimate, and the Actual Cost. If you have taken a car to a less-than-reputable mechanic, the difference should be familiar to you ;-). Admittedly, even the best of mechanics can get caught flat footed, though.

    I see you have dropped and re-created the indexes a few times. Have you tried

    Code:
    UPDATE STATISTICS attendanceDets WITH FULLSCAN
    Normally, SQL Server will estimate statistics with a 30% sample, so doing the full 100% scan may knock the index statistics into proper scope.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •