Results 1 to 14 of 14
  1. #1
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10

    Unanswered: Statistics Maintenance

    I had a weird one the other day where we were having performance issues with a certain query.
    Everything seemed up to date, our index and statistics maintenance jobs were all happy.

    When I looked at the execution plan I saw that the estimated vs actual rows were stupidly different (1:10M), which was screwing things up.

    To fix I rebuilt the statistics in question with the FULLSCAN option.

    This has been weighing on my mind as of late and has caused me to revisit out statistic maintenance scripts.
    The jist of the script is to update statistics on objects if they haven't been updated in N days (checked using Stats_Date()). It then updates using default sampling.

    I'm worried that the Stats_Date() is not the right thing to use, due to auto-updating of statistics (or similar)?

    I was tempted to brute-force FULLSCAN everything during a quiet period, once a week, but I'd like to do something more elegant if possible.

    What do you reckon?
    George
    Home | Blog

  2. #2
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    The sampling can get a bad sample once in a while. I don't know what the default sampling rate for a generic UPDATE STATISTICS command is, and googling for it, has lead me down an interesting rat-hole for now.

  3. #3
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    This should tell you what the actual sample rates are (for SQL 2008 R2 and later)
    Code:
    select object_Name(i.object_id), i.index_id, i.name, sp.*
    from sys.indexes i join
    	sys.stats s on i.object_id = s.object_id and i.name = s.name cross apply
    	sys.dm_db_stats_properties (i.object_id, s.stats_id) sp
    where i.object_id > 1000 -- filter out (most) system objects
    This is one of the odder joins I have seen in the system catalog. I would have expected index_id to be in the sys.stats table. Maybe that will change in a later version.

  4. #4
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Well this particular instance must have had bad sampling for a good few months!
    The table it was referencing had a big initial data load and very little movement since.


    Yeah.. the help manual isn't that useful on what the default is: [i]"When none of the sample options (SAMPLE, FULLSCAN, RESAMPLE) are specified, the query optimizer samples the data and computes the sample size by default."[/url]

    Who samples the sampler?


    I'm leaning towards a brute force FULLSCAN at the moment, but that's mostly because of time constraints and knowing it will do the job.
    "When all you have is a hammer..."
    George
    Home | Blog

  5. #5
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Bump for Monday... any more advice before I get going?
    George
    Home | Blog

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I'll find the article, but that will probably take me a couple of weeks due to my present schedule.

    The problem is caused by the huge load and few if any changes to the table afterwards. This usage pattern will always wonk your statistics and it really demands doing a FULLSCAN as soon as you can after the load. Now isn't ideal, but it is the best option available to you so get going!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  7. #7
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Thanks Pat, I think that affirms things well enough for me.

    I will perform a one-off fullscan rebuild of indexes (in batches, over the coming weeks) to cover things, and then keep my existing routine(s) for ongoing maintenance.
    I will identify the frequent moving candidates and implement a separate step for each of those as required as exceptions, rather than the rule.
    George
    Home | Blog

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    The problem that I referred to is caused by doing a relatively large (95% or more) load, followed by few if any changes. This is a common problem in data warehouse tables, but relatively rare elsewhere.

    The problem in this scenario is that the stats never get set initially, so the only way to ever properly build them is to force a statistics update with a FULLSCAN. Until you do that, you have the default statistics (one row) and those statistics lead to "interesting" optimizer performance.

    I haven't tested it, but allegedly this problem is fixed by the new CE (Cardinality Estimator) in SQL 2014.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  9. #9
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    I think this particular issue was a remnant from the initial data migration. Took a while to rear its ugly head though.
    There's no doubt another one just waiting to happen so I'll do a quick sweep.

    I'll build from MCrowley's code to be a bit more particular about which stats need a fullscan.

    There are a few objects in one of our systems that often sees bulk loads of data. Not quite to a 95% scale now, maybe 70-80 Big enough that I will give them special treatment. They only see action once a week or so - should be easy to stay on top of it.
    George
    Home | Blog

  10. #10
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    Curiously, I have come across a problem with some similarities, but a crucial difference. There was an ad-hoc query being run, with a where clause of the form:
    Code:
    where datecolumn <= convert(date, DATEADD(MM,DATEDIFF(MM,0,GETDATE()),0))
    This query was adamant that this should only bring back 1 row, when in fact it brought back around 400k rows leaving behind about 10 or 20 k rows filtered out. I actually had to take advantage of parameter sniffing (or maybe a close cousin), and pushed the value of the convert to a date variable. The result was pretty satisfying. 400K index seeks was looking to take more than 3 hours. The single table scan returned in under 5 seconds.

    I am still a tad confused why the optimizer should have chosen to do an index seek, but it may have been because the optimizer is not able to go looking too deep into the stack of functions to see what it returns, and was assuming that the query wanted records less than date = 0, which really would have called for an index seek. That said, does anyone have any favorite functions for returning the first day of the month? I am settling on
    Code:
    dateadd(dd, -1 * day (getdate()) + 1, convert(date, getdate()))
    for the moment.

  11. #11
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I'm 98% sure that the problem is induced by the Convert() call. I'm pretty sure that you'll get better performance and the correct result by using:
    Code:
    SELECT Cast(DateAdd(m, DateDiff(m, 0, GetDate()), 0) AS DATE)
    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  12. #12
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    An interesting try, but no soap. The difference between CONVERT and CAST is one of the things I have to admit I know nothing of. I think CAST came about in SQL 2000, or so, but I never went around looking for how they are different.

  13. #13
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Why is the cast/convert even needed?
    The time portion is truncated as part of the maths...
    George
    Home | Blog

  14. #14
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    An excellent point, G. After spending a couple hours staring at the rest of the query, that point did not leap out at me.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •