If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Fact table has many "code" columns - what to do?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-29-06, 10:15
Alabaster Alabaster is offline
Registered User
 
Join Date: Aug 2006
Posts: 1
Fact table has many "code" columns - what to do?

First time here, and first time trying to learn Data Warehouse principles while "getting something out quickly for the users to see" as my manager puts it :-(

At the moment, I'm trying to model financial transactions captured by our agency. The dimensions I have so far are:

1. Cash dealer branch
eg DealerNo=101, DealerName=Xenia Bank, DealerType=BA, DealerTypeText=Bank, DealerStatus=A, DealerStatusText=Active, BranchKey=12345, BranchId=BOX-SMTHVL, Zipcode=56789, Suburb=SMITHVILLE, State=CA, StateText=California

2. Transaction Date
Time dimension based on the date the transaction occurred

3. Processing Date
Time dimension based on the date the transaction was processed by us

4. Country (involved in transaction)
eg CountryCode=US, CountryName=UNITED STATES OF AMERICA, CurrencyCode=USD, CurrencyName=UNITED STATES DOLLAR

The fact table is the transaction itself. Apart from FKs to the above, there are a number of codes that are hanging around:

ReportType: eg, Cash, International, Gaming, etc
TransactionType: eg, Cash Deposit, Cash Withdrawal, Cheque Deposit, Wire, etc
ReportMethod: eg, Via File, Via Paper, etc
Direction: In, Out

etc etc. About 10 of these codes, each with very few distinct values. The fact table contains code versions of the above "text value".

From what I understand, each of these should be a dimension too. They are all unrelated, and no hierarchy exists for any of them. If they existed as dimensions, those dimensions would simply contain "code" and "text value"

My colleague is against splitting these out as dimensions, saying the users would not like to have to hunt through a lot of dimensions to find the appropriate columns to choose (we will be using Oracle Discoverer). He would prefer to incorporate the code's "text value" into the fact table, eg:

TransactionType=AD, TransactionTypeText=Account Deposit

to make the Discoverer business area simpler.

Is this a legitimate approach for simple codes? I can see the attraction of doing it, to reduce dimensions and reduce joins, but obviously at the expense of storage. We have a few naysayers indicating that their past experience with Discoverer was one of poor performance, so we're touchy about ensuring the performance of our DW is acceptable - this is causing me to waver.

I intend to try to use materialized views and query rewrite to aid summarisation, so we're aware that lots of storage will be required anyway, and that many of the MVs will be joins as well.

To me it seems quite arbitrary and inconsistent to have some dimensions treated "properly", and some denormalized into the fact table. I've googled for quite a while trying to find examples of this; the closest I can find is degenerate dimensions, but these seem to be for single-attribute dimensions only.

Thanks for any advice.
Reply With Quote
  #2 (permalink)  
Old 08-31-06, 17:02
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
Have you read Kimball's Data Warehouse Toolkit
__________________
visit: relationary
Reply With Quote
  #3 (permalink)  
Old 08-31-06, 18:17
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
Although this link is specific to db2, it is generic enough to be useful for most database implementations.

Remember, you want to speed up retrievals in a DW system at the expense of updates. Removing joins, (albeit at the expense of increased data storage requirements) generally has this effect.

Under normal operation a data warehouse is read-only. So, you add lots of indexes (to speed up select queries) and remove most, if not all RI functionality (as relational integrity is handled by the SOURCE system(s).) You have to adopt a different mindset for data warehousing systems than you do for OLTP systems.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Last edited by loquin; 08-31-06 at 18:26.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On