09-10-10, 04:52 #1Registered User
- Join Date
- Jul 2010
Unanswered: Summed Daily totals giving different amounts from Monthly totals
We have a cube that was displaying revenue at the monthly level for our shops.
The aggregration was done in SQL by before being read into the fact table that the cube was based on.
I was tasked to take the aggregration down to daily level.
What has happened is that my cubed monthly totals do not work out to the 'old' monthly sales.
One store is 0.4 % different on one measure and a total figure is about £300 out on sales of £48K
I am thinking a rounding error.
I appreciate the details are important but I do not want to come on here and expect someone to look through my code. :-)
I am more after pointers on what to look for and if anyone has seen this sort of thing before.
Any input appreciated.
09-10-10, 12:03 #2Resident Curmudgeon
Provided Answers: 54
- Join Date
- Feb 2004
- In front of the computer
Yepper, that's a common BI problem.
Gather the data from all of your sources into a single place (database). Convert everything into whatever you decide are standard units of measure (examples woudl be metric versus imperial, Euros versus dollars versus yen. standard time blocks like hours, etc). This unform data becomes your DW (Data Warehouse), often called "The single source of truth" for analysis.
As a test, reproduce existing reports from this data and note any differences. Even if you have done your job perfectly, there may still be differences between DW generated reports and legacy reports because the legacy reports might be incorrect or they might be making implicit adjustments to the data that no one rembers. Discuss any differences with management, they may opt to live with the differences if you can prove that the DW is "correct".
Build up the DW into a DM (Data Mart) using your current tool of choice. Compare the results from the DM with the legacy report(s). Odds are good that there will be differences at this point because you are now doing things the "newfangled" way. Look into the differences to understand them, then discuss these differences with managemetn too. Management may or may not approve of these differences, but more depends on why there are differences than how big those differences might be.
You are correct that rounding is a frequent source of problems in a DM, but it should not be a problem in a DW. A much larger reason for errors is often that the DM looks at data slightly differently than the legacy report did, so the DM might think of weeks as calendar elements while the old application understood weeks slightly differently (as noted in your previous thread with non-orthagonal measures).
-PatPIn theory, theory and practice are identical. In practice, theory and practice are unrelated.
09-13-10, 06:48 #3Registered User
- Join Date
- Jul 2010
Thanks Pat, helpful as always.
As a matter of fact the problem was my fault and not a rounding error. I'll run through it here
in case any other newbs run into a similar thing.
Data from the ERP system was stuck into the FACT table via 3 temp tables and a Staging FACT table.
As mentioned this was all aggregated in the intial load to the temp tables at monthly level.
The Select that took the data from the staging FACT table was 200 line script that used multiple UNIONS.
In my niavety I broke these down into different 'INSERT INTO's.
And there my error was made.
The first 'INTO' put some nulls in certain columns. These columns were then 'unnulled' by data further downline.
Im sure you expeirenced hands are all going 'Numpty'
The first INTO created the table and assigned the type of 'INT' to the columns with NULL.
My next load of INSERT INTO's tried to put decimal data into a INT type and rounded the values.
So I went back to basics. I now explicity create the table with the correct data types and just truncate it before
a fresh data load.
INSERT INTO's are lethal if you do not fully understand what is going on.
To use an old support tactic 'Is it plugged in'. By this I mean I could have saved myself a world of
hurt had a just looked for the most obvious thing first, namely the data types.
Always have an explanation as to why figures do not match.