2. Transaction Date
Time dimension based on the date the transaction occurred
3. Processing Date
Time dimension based on the date the transaction was processed by us
4. Country (involved in transaction)
eg CountryCode=US, CountryName=UNITED STATES OF AMERICA, CurrencyCode=USD, CurrencyName=UNITED STATES DOLLAR
The fact table is the transaction itself. Apart from FKs to the above, there are a number of codes that are hanging around:
ReportType: eg, Cash, International, Gaming, etc
TransactionType: eg, Cash Deposit, Cash Withdrawal, Cheque Deposit, Wire, etc
ReportMethod: eg, Via File, Via Paper, etc
Direction: In, Out
etc etc. About 10 of these codes, each with very few distinct values. The fact table contains code versions of the above "text value".
From what I understand, each of these should be a dimension too. They are all unrelated, and no hierarchy exists for any of them. If they existed as dimensions, those dimensions would simply contain "code" and "text value"
My colleague is against splitting these out as dimensions, saying the users would not like to have to hunt through a lot of dimensions to find the appropriate columns to choose (we will be using Oracle Discoverer). He would prefer to incorporate the code's "text value" into the fact table, eg:
Is this a legitimate approach for simple codes? I can see the attraction of doing it, to reduce dimensions and reduce joins, but obviously at the expense of storage. We have a few naysayers indicating that their past experience with Discoverer was one of poor performance, so we're touchy about ensuring the performance of our DW is acceptable - this is causing me to waver.
I intend to try to use materialized views and query rewrite to aid summarisation, so we're aware that lots of storage will be required anyway, and that many of the MVs will be joins as well.
To me it seems quite arbitrary and inconsistent to have some dimensions treated "properly", and some denormalized into the fact table. I've googled for quite a while trying to find examples of this; the closest I can find is degenerate dimensions, but these seem to be for single-attribute dimensions only.
Although this link is specific to db2, it is generic enough to be useful for most database implementations.
Remember, you want to speed up retrievals in a DW system at the expense of updates. Removing joins, (albeit at the expense of increased data storage requirements) generally has this effect.
Under normal operation a data warehouse is read-only. So, you add lots of indexes (to speed up select queries) and remove most, if not all RI functionality (as relational integrity is handled by the SOURCE system(s).) You have to adopt a different mindset for data warehousing systems than you do for OLTP systems.
Last edited by loquin; 08-31-06 at 18:26.
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert