I'm a newbie to the forum, and faced with a bit of a design dilemma.
Background: We provide information/analysis tools to large utilities. Database is Oracle, and we use php coded webservices to interface with the data.
The problem: We need to source and store data from 3 distinct sources. Sourcing is not the problem. The storing is.
The data from the 3 sources are unrelated, except for the following:
All data is periodic (hourly)
The data sources provide us with the following data:
1. Measured customer usage data (the fields of interest being customer ID, date_hour,field1A, field1B, field1C, field1D)
2. Measured production data (unit_id, date_hour, field2A,field2B,field2C)
3. Measured weather data (station_id, date_hour, temperature, windspeed, precipitation, cloud cover)
Instead of querying these sources directly, we want to import all the data into local storage. This storage will then be used to populate our olap databases.
The question now is:
What is the best way to do it?
I see 2 possibilities: 1. Create seperate tables for all 3 of the different sources. One table for weather, one for customers usage, one for production data. Keep the column definitions as is (field1A, temperature, etc etc).Weather data is afterall not the same as production data.
2. Create a single table containing data from all three sources. All the data is afterall timeseries/periodic data. The table will basically have the following columns: entity_id, entity_type,entity_element, hour, value.
entity_id = consumer/producer/weather_station id
entity_type says weather it is consumer/producer/weather_station data
entity_element says what data element it is, temperature, field1A, field2B etc.
hour is the hour of measurement
value is obvious
With the associated anciliary tables.
I think 2) is known as data normalisation.
System accessing the db will probably only ever be interested in data emanating from a single source, and in all the fields for that source. All data access through a webservice, that will use pl/sql stored procs to access data.
The pro's and cons as I see it:
Single "value" column Pros
1. Single webservice for all data.
2. No need to update webservice when new type of data gets added.
3. Easy for one person to take full ownership
4. Very tight control
5. Nice for OLAP
3. Query complexity
4. Data doesn't make "sense" at first glance.
5. Mixing of different customers' data in one table.
Different tables Pros
3. Simple queries
4. More closely resembles source data.
5. OLAP not required in Storage db.
1. Difficult for one person to take and keep ownership of all tables
2. Multiple webservices. OK, you can probably get away with one, but it will have to be smart, and you'll need a global data dictionary.
3. If source data changes (new column, etc), Storage table must change as well
4. Not so tight control - more webservices, more users
5. Possibility of chaos if lots of new datasources get added in future.
Any comments or recommendations will be appreciated.
Any website I can check out that addresses problems similar to this?
and have it auto create the table whenever a record is added to dataSources. you would also have all the collaborating tables with descriptive info such as source name, description etc etc
this would allow you to allocate each table to different servers depending on the load requirements, and also allow you to create an application that could easily scale for as many sources as you want.
I personally prefer the multi table approach, using a single table with dynamic use does not "feel" right in this situation, although it would work, and while this sounds rather unscientific, alot of schema design decisions are just as well made by gut feeling as through application of normalization theory