If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Periodic Data,more than one source

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 09-03-03, 05:00
BruceWarne BruceWarne is offline
Registered User
 
Join Date: Sep 2003
Posts: 1
Question Periodic Data,more than one source

Hi

I'm a newbie to the forum, and faced with a bit of a design dilemma.

Background: We provide information/analysis tools to large utilities. Database is Oracle, and we use php coded webservices to interface with the data.

The problem: We need to source and store data from 3 distinct sources. Sourcing is not the problem. The storing is.

The data from the 3 sources are unrelated, except for the following:

All data is periodic (hourly)

The data sources provide us with the following data:
1. Measured customer usage data (the fields of interest being customer ID, date_hour,field1A, field1B, field1C, field1D)
2. Measured production data (unit_id, date_hour, field2A,field2B,field2C)
3. Measured weather data (station_id, date_hour, temperature, windspeed, precipitation, cloud cover)

Instead of querying these sources directly, we want to import all the data into local storage. This storage will then be used to populate our olap databases.

The question now is:
What is the best way to do it?

I see 2 possibilities:
1. Create seperate tables for all 3 of the different sources. One table for weather, one for customers usage, one for production data. Keep the column definitions as is (field1A, temperature, etc etc).Weather data is afterall not the same as production data.

2. Create a single table containing data from all three sources. All the data is afterall timeseries/periodic data. The table will basically have the following columns: entity_id, entity_type,entity_element, hour, value.
Where
entity_id = consumer/producer/weather_station id
entity_type says weather it is consumer/producer/weather_station data
entity_element says what data element it is, temperature, field1A, field2B etc.
hour is the hour of measurement
value is obvious

With the associated anciliary tables.

I think 2) is known as data normalisation.

System accessing the db will probably only ever be interested in data emanating from a single source, and in all the fields for that source. All data access through a webservice, that will use pl/sql stored procs to access data.

The pro's and cons as I see it:

Single "value" column
Pros
1. Single webservice for all data.
2. No need to update webservice when new type of data gets added.
3. Easy for one person to take full ownership
4. Very tight control
5. Nice for OLAP

Cons
1. Size
2. Speed
3. Query complexity
4. Data doesn't make "sense" at first glance.
5. Mixing of different customers' data in one table.

Different tables
Pros
1. Speed
2. Size
3. Simple queries
4. More closely resembles source data.
5. OLAP not required in Storage db.

Cons
1. Difficult for one person to take and keep ownership of all tables
2. Multiple webservices. OK, you can probably get away with one, but it will have to be smart, and you'll need a global data dictionary.
3. If source data changes (new column, etc), Storage table must change as well
4. Not so tight control - more webservices, more users
5. Possibility of chaos if lots of new datasources get added in future.

Any comments or recommendations will be appreciated.
Any website I can check out that addresses problems similar to this?

Thanks!

Last edited by BruceWarne; 09-03-03 at 07:16.
Reply With Quote
  #2 (permalink)  
Old 09-05-03, 01:24
naz naz is offline
Registered User
 
Join Date: Jul 2003
Posts: 18
If you are going to have multiple data sources and want to use multiple tables then you could possible go with a dynamic schema by using:

- dataSources
sourceName
fields

- sourceName (This table name matches dataSources.sourceName)
field1
field2
...
fieldN

and have it auto create the table whenever a record is added to dataSources. you would also have all the collaborating tables with descriptive info such as source name, description etc etc

this would allow you to allocate each table to different servers depending on the load requirements, and also allow you to create an application that could easily scale for as many sources as you want.

I personally prefer the multi table approach, using a single table with dynamic use does not "feel" right in this situation, although it would work, and while this sounds rather unscientific, alot of schema design decisions are just as well made by gut feeling as through application of normalization theory
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On