First time here, and first time trying to learn Data Warehouse principles while "getting something out quickly for the users to see" as my manager puts it :-(
At the moment, I'm trying to model financial transactions captured by our agency. The dimensions I have so far are:
1. Cash dealer branch
eg DealerNo=101, DealerName=Xenia Bank, DealerType=BA, DealerTypeText=Bank, DealerStatus=A, DealerStatusText=Active, BranchKey=12345, BranchId=BOX-SMTHVL, Zipcode=56789, Suburb=SMITHVILLE, State=CA, StateText=California
2. Transaction Date
Time dimension based on the date the transaction occurred
3. Processing Date
Time dimension based on the date the transaction was processed by us
4. Country (involved in transaction)
eg CountryCode=US, CountryName=UNITED STATES OF AMERICA, CurrencyCode=USD, CurrencyName=UNITED STATES DOLLAR
The fact table is the transaction itself. Apart from FKs to the above, there are a number of codes that are hanging around:
ReportType: eg, Cash, International, Gaming, etc
TransactionType: eg, Cash Deposit, Cash Withdrawal, Cheque Deposit, Wire, etc
ReportMethod: eg, Via File, Via Paper, etc
Direction: In, Out
etc etc. About 10 of these codes, each with very few distinct values. The fact table contains code versions of the above "text value".
From what I understand, each of these should be a dimension too. They are all unrelated, and no hierarchy exists for any of them. If they existed as dimensions, those dimensions would simply contain "code" and "text value"
My colleague is against splitting these out as dimensions, saying the users would not like to have to hunt through a lot of dimensions to find the appropriate columns to choose (we will be using Oracle Discoverer). He would prefer to incorporate the code's "text value" into the fact table, eg:
TransactionType=AD, TransactionTypeText=Account Deposit
to make the Discoverer business area simpler.
Is this a legitimate approach for simple codes? I can see the attraction of doing it, to reduce dimensions and reduce joins, but obviously at the expense of storage. We have a few naysayers indicating that their past experience with Discoverer was one of poor performance, so we're touchy about ensuring the performance of our DW is acceptable - this is causing me to waver.
I intend to try to use materialized views and query rewrite to aid summarisation, so we're aware that lots of storage will be required anyway, and that many of the MVs will be joins as well.
To me it seems quite arbitrary and inconsistent to have some dimensions treated "properly", and some denormalized into the fact table. I've googled for quite a while trying to find examples of this; the closest I can find is degenerate dimensions, but these seem to be for single-attribute dimensions only.
Thanks for any advice.