I am a new member of this forum, I have a question about star schema and I hope those nice guy can help me.
I am now working with a assignment of Data warehouse, and now I am designing the star schema but get problem.
The first one I want to ask is did Star Schema only can 1 FACT table for storing calculable information?
The assignment background is integrate 3 independent DB, 2 is very similar (DB1 & DB2) while 1 is different (DB3). DB1 and DB2 have "project" and "activity" table but DB3 only contains "project" table. 1 Project can contain many activities.
And now Data warehouse want to store time spent in Project and Activity.
What can I do about this problem?
I have thought that using 2 FACT table for storing project time spent and activity time spent, but it seems star schema only can use 1 table. So I want for help.........
Thank you for answering my question first, you don't understand my question my due to my english problem, sorry about that.
The attachment is about sample data of 3 DBs.
DB1 and DB2 is very similar, while inconsistent with DB3.
In DB1 and DB2, 1 project can contain more than 1 activities, but DB3 do not have activity table.
Moreover, 1 project contains more than 1 employee and 1 activity also.
Actually, I don't know what is your meaning of Star schema approach, did Kimball approach is the simplest and common one?
In this stage, I need to create Star schema from the user requirement, which is used for analyzing time spent of project and activities. After that, I need to do the ETL process from those DBs.
The reason of why I ask did Star Schema can only contain 1 FACT table is becuase I want to know whether my model is correct or not. The last attachment is my model. I don't know how to deal with between consultant, project and activity, so I haven't add into star schema.
Your English wouldn't be taken for a native speaker's, but it is significantly better than my Chinese!
The problem I have answering your question is that there are many definitions for Star Schema, and different definitions can either permit one or many fact tables. The vast majority of the definitions allow more than one fact table.
While I don't know what your instructor teaches or thinks, I see a star schema as a presentation mechanism, something that is used to deliver a data mart. In my mind, a star schema is literally the output of a data warehouse, not the warehouse itself.
Without clearly understanding your assignment, I can only offer advice as an observer. View this with considerable skepticism. I would only use one fact table for what you've described so far. I would roll up the staging data by employee, project, activity, and even though you haven't specified date/time I'd include them in the fact table too.
NULL values in a fact table are a bad thing, so you need to avoid that. I'd probably create an arbitrary "empty" activity key (simply spaces, or any arbitrary value that isn't likely to ever be a valid activity id) and a matching row in the activity dimension table with a description like "blank, for the case of DB3). This allows you to roll all three of your databases into a single fact table, and avoids the need for a lot of special handling by both your data warehouse administrator and much more importantly by your data mart users.
There are some data warehouse methodologies that don't like introducing an arbitrary "artifact" like this. I've been doing this kind of thing long enough that I've found this to be the optimum solution for the vast majority of cases.
One of the primary purposes of a data warehouse (I'd argue the only real purpose) is to allow functional power users to find nuggets of information that are hidden in mountains of data. This kind of solution makes perfect sense to those users, it is easy to explain, easy to understand, and easy to manipulate. It makes their existing queries easy to write, and their yet-unimagined queries a lot more likely to be correct.