Results 1 to 3 of 3
  1. #1
    Join Date
    Aug 2006

    Fact table has many "code" columns - what to do?

    First time here, and first time trying to learn Data Warehouse principles while "getting something out quickly for the users to see" as my manager puts it :-(

    At the moment, I'm trying to model financial transactions captured by our agency. The dimensions I have so far are:

    1. Cash dealer branch
    eg DealerNo=101, DealerName=Xenia Bank, DealerType=BA, DealerTypeText=Bank, DealerStatus=A, DealerStatusText=Active, BranchKey=12345, BranchId=BOX-SMTHVL, Zipcode=56789, Suburb=SMITHVILLE, State=CA, StateText=California

    2. Transaction Date
    Time dimension based on the date the transaction occurred

    3. Processing Date
    Time dimension based on the date the transaction was processed by us

    4. Country (involved in transaction)
    eg CountryCode=US, CountryName=UNITED STATES OF AMERICA, CurrencyCode=USD, CurrencyName=UNITED STATES DOLLAR

    The fact table is the transaction itself. Apart from FKs to the above, there are a number of codes that are hanging around:

    ReportType: eg, Cash, International, Gaming, etc
    TransactionType: eg, Cash Deposit, Cash Withdrawal, Cheque Deposit, Wire, etc
    ReportMethod: eg, Via File, Via Paper, etc
    Direction: In, Out

    etc etc. About 10 of these codes, each with very few distinct values. The fact table contains code versions of the above "text value".

    From what I understand, each of these should be a dimension too. They are all unrelated, and no hierarchy exists for any of them. If they existed as dimensions, those dimensions would simply contain "code" and "text value"

    My colleague is against splitting these out as dimensions, saying the users would not like to have to hunt through a lot of dimensions to find the appropriate columns to choose (we will be using Oracle Discoverer). He would prefer to incorporate the code's "text value" into the fact table, eg:

    TransactionType=AD, TransactionTypeText=Account Deposit

    to make the Discoverer business area simpler.

    Is this a legitimate approach for simple codes? I can see the attraction of doing it, to reduce dimensions and reduce joins, but obviously at the expense of storage. We have a few naysayers indicating that their past experience with Discoverer was one of poor performance, so we're touchy about ensuring the performance of our DW is acceptable - this is causing me to waver.

    I intend to try to use materialized views and query rewrite to aid summarisation, so we're aware that lots of storage will be required anyway, and that many of the MVs will be joins as well.

    To me it seems quite arbitrary and inconsistent to have some dimensions treated "properly", and some denormalized into the fact table. I've googled for quite a while trying to find examples of this; the closest I can find is degenerate dimensions, but these seem to be for single-attribute dimensions only.

    Thanks for any advice.

  2. #2
    Join Date
    Dec 2003
    Have you read Kimball's Data Warehouse Toolkit

  3. #3
    Join Date
    Jun 2004
    Arizona, USA
    Although this link is specific to db2, it is generic enough to be useful for most database implementations.

    Remember, you want to speed up retrievals in a DW system at the expense of updates. Removing joins, (albeit at the expense of increased data storage requirements) generally has this effect.

    Under normal operation a data warehouse is read-only. So, you add lots of indexes (to speed up select queries) and remove most, if not all RI functionality (as relational integrity is handled by the SOURCE system(s).) You have to adopt a different mindset for data warehousing systems than you do for OLTP systems.
    Last edited by loquin; 08-31-06 at 19:26.
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts