Unanswered: Indexed View and Union... How to get around this?
SQL Server 2005
I'm supporting my sales team at work now and just stumbled onto the fact that some of their processes take multiple days to run--in part because they are unioning across tables with hundreds of millions of rows using a view. Data is kept in tables SalesYYYY by year and lately there have been two tables per year so you have SalesYYYY_A and Sales YYYY_B for a single year.
The business need is to be able to query a rolling year, rolling two years and last 12 weeks. Currently all processes that don't use the views are hard-coded with table names and cause issues on a fairly regular basis when all the table names need to be updated or for a short segment two tables should be queried instead of one.
My knee-jerk reaction was, "I'll build you an indexed view", having not had a chance to play with these yet. D'Oh! the need to UNION across tables makes that not work at all if I want to have an index. (Of course setting this data up as a partitioned table would be a logical plan but I don't have the clout to make that happen.)
So, I'm now trying to find a way to keep names static in coded processes but allow for bridging across tables AND having indexes to support queries. The best I've come up with so far is creating indexed views with 'CurrentYear', 'PriorYear', etc. so that when dates/years roll all that needs to be changed is the view. Not sure how efficient this will be.
Anyone invented a good wheel for this already and care to share?
My boss is the one that wants it kept in separate tables and I'm not getting any traction on convincing him we should change that. He's also not very keen on improving this process. However I just put some code into production that is going to bonk every 6 months unless I get something sorted out for this so that code can refer to static names--and performance needs are such that those objects should be indexed.
If you have a covering or clustered index on the date as MCrowley suggested, then the total number of rows in the table becomes immaterial for any query of the nature "Show me all the transactions between X date and Y date". A query would then use an index seek and a range scan. The total number of pages read then is:
(depth in pages of the index tree for the X value) + (number of pages at the leaf level from X date to Y date).
So to retrieve a full year's data from one single, large table, the only extra overhead over your bosses preference is traversing the B-Tree (which is negligible if these reports read a lot of rows).
Can you not throw together a POC on a dev box to demonstrate the benefits to your boss? Present him\ her with the various options available, with pros\ cons, costs, benefits plus development & administration overheads.
Yeah, but the boss is likely the one who put it all in wrong to begin with. It is going to be a hard sell. Best thing is probably to do the midnight slaughter approach. Go in at night, remove everyone's access to the database, except for some application ID that has no VIEW DEFINITION rights. Arrange the data as the three of us agree is correct, and let them access their views via the new application ID. Ideally, they will get so absorbed with their reports, they won't ask any silly questions about where the data came from. And once they all become obedient little sheep, then she can strike!.....oh, er...wait. that was my plan. Sorry.
OK, you guys are too funny. However I am but one of many developers on this project and making any change to the base tables would cause ripples across the other 2 servers that get copies of this data, the reports based off of them and heaven only knows how much code.
I am in cowboy coding land here but I'd be strung up in the town square if I attempted a midnight coup of data sanity as McCrowley suggests. There's just about zero chance something wouldn't be horked up the next day.
It's looking like Sales will still be waiting 30 hrs for their queries to run.
These queries shouldn't really run as slowly as you say, irrespective of the poor design.
These are archive tables right? So you can index them to high heaven and the only cost is disk, right? Why are they taking up so long to run? There are lots of problems with this design but it shouldn't cause massive querying problems.
Also, in terms of having fixed names in your code that refer to different objects over time, you should have a look at SYNONYMS in BoL.
Thanks Pootle Flump for the suggestion about synonyms. That will work for my code along with some meta data that will allow me to figure out what table to use as the actual under the synonym.
Regarding performance, I haven't seen the code that is rumored to take 30 hours. Your inferred point that the issue may not be the views is well taken. I've requested to at least get a look at what they are using in the where clause for filtering so I can see if the existing indexes support what they are doing.
In theory, if the proper indexes are on the tables the view that unions the tables together should use those indexes for queries over the view, right?