DB design handling time-sensitive data (somewhat long post)
Sorry to post yet another DB that needs an expert review
I've tried to do as much research as I can before posting.
Anyway, I have the following database so far (see image). Notes:
-Input.id_sensor is dependent on sensor.io = "input" and Output.id_sensor is dependent on sensor.io = "output", if that makes sense.
-the type column on the input and output tables determine the id_sensor column for the input_x and output_x tables.
- I think all the relationships are correct - I can have many buildings; buildings can have more than one controller; controllers have many sensors; sensors are input or output; input is a current, flow, or temperature; output is a valve or pump; an output or input device has many data entries based on the date.
This is my first 'pro-level' work attempt at a database, so I want to ask a few questions before I put things in motion:
-Typical databases are taken to what nth-level normalization?
-I hate to have to ask this, but Is this a good setup? I tried to logically split the data and think about what queries might look like, but I'm not sure if this is too little or too much - should I leave everything as a sensor and keep one big "data" table? I doubt this...
-I might as well blanket everything and ask "Anything I may have missed?"
i don't think i'm grasping the physical system you're trying to model. what exactly is a sensor? can a sensor ever change from input to output, or between in/out types? how can a single sensor be assigned to multiple controllers?
i've been out of database work for too long, and have lost most knowledge of terminology. all i can go with is my gut. and what it's telling me is that i don't like the sensor.io field at all (redundant data), or the input and output tables, or the line of tables below it. why not have a sensor.type field which is a foreign key of a sensor_types table (this table would enumerate all possible input and output sensor types) and skip straight to data being in the data tables you outlined? i would replace everything under controller with:
// each sensor type has a name associated
- type_id (primary key)
// a sensor belongs to one controller and has an enumerated type
- sensor_id (primary key)
- controller_id (references controllers)
- type_id (references sensor_types)
// stores current for a given sensor at a given time
- sensor_id (references sensors)
- timestamp (unique)
// stores temperature for a given sensor at a given time
- sensor_id (references sensors)
- timestamp (unique)
now we have a given controller having one or more sensors, a sensor having one type, and multiple data points for a given sensor. simplified, but i don't know how accurately it reflects your reality. sensors may need status fields, maintenance dates, upgrade information/dates, etc.
look for more feedback and clarify the functionality you need.
Sorry about that, lets see if I can make this clearer:
I am modeling this off of a piece of hardware (the controller) that will be installed in a house. I only have one controller and one house at the moment, but I wanted the ability for future expansion should my customer need it.
I guess the term 'sensor' is misleading - I was basing it off of some legacy documentation. The controller I have in my possession has about 30 input and 20 output devices that are unique to the controller, but may have the same name as another controller's i/o device. A snapshot of the state or value of the input/output devices is taken every 5 minutes, which needs to be recorded in the database for statistical graphing/plotting/whatever later (hence the whole reason for the database).
-Input devices are current sensors, temperature meters, and water flow meters.
-Output devices are pumps and valves the controller uses.
-Input can have the values of "75 degrees celsius", "3 gal/min", "5 amperes", etc.
-Valves can be in the state of "open" or "closed"
-Pumps can be in the state of "on" or "off"
After looking at things, I see where I was getting things messed up relationship-wise, and a lot doesn't necessarily make sense. I was looking at 2nd Normal at datamodel.org and getting things wrong. Let's see if I can fix it real quick:
-there are several buildings
-each building can have several controllers
-a controller has several inputs
-a controller has several outputs
-an input device is either a currentmeter, flowmeter, or temperaturemeter
-an output device is either a pump or a valve
-an input device has several datetime-specific data entries based off the input's recorded value
-an output device has several datetime-specific data entries based off the output's recorded state
looks better, but i'm not so sure you need the separate input and output tables. sure it's good to know that every sensor in the input table is for input, but you still have to do a query to find out exactly what kind of sensor it is. that's why i merged them into a single 'sensor' entity that had a type, where type is input_temp, input_current, output_valve, etc. unfortunately i can't justify this in terms of technical reasoning
the only other thing i noticed is that you don't really need an id for the data tables to act as a primary key - the primary key would be a composite between the sensor id and the timestamp - there should never be a piece of data with the same sensor and time as another piece of data - that would mean the sensor would be producing multiple values at a single point in time. so the primary key is the combination between sensor and time, as that combination is unique. this is a natural vs. surrogate primary key situation - the table's natural primary key would be sensor_id + timestamp, whereas its surrogate or artificial primary key would be a unique id like data_id that doesn't really contain any new information about the piece of data. here's what seems to be a good comparison between the two types of keys.
My background is with C/C++/C# programming, so I try attacking things in terms of classes and inheritance, so maybe it's not the most efficient way to go about things in a DB environment
By putting all the sensors together and then using a singular data table, it introduces NULL cells in the data.units column - that's something to stay away from, right (can't remember)?
As to my choice in keys, that's just how I learned it back when I took DB classes. I lean a bit towards it also because I prefer the uniformity and knowledge that I don't have to worry about things when teaming this with different ODBC-enabled (or whatever) environments. It's the one time that I can step in as the designer and demand to have my way
oh i didn't mean unifying the data tables, i meant combining the two tables titled "input" and "output" into the "sensor" table. sensor would contain:
sensor_id - unique id per sensor
name - (optional?) name for the sensor
type - foreign key to a "sensor_types" table, such that a sensor can be of type "input_current", "output_valve", "input_videostream", etc.
to figure out which of the five data tables you query, you use the value stored in the sensor's type field. i'm a c programmer too, here's what i'm thinking:
fill_sensor_values(sensor, queryobj, "select * from sensors where sensor_id = requested_sensor_id");
fill_current_data(cdata, queryobj, "select * from input_current where sensor_id = sensor->sensor_id;");
fill_valve_data(vdata, queryobj, "select * from output_valve where sensor_id = sensor->sensor_id;");
case etc. etc.
error_log("unknown type %d", sensor->type);
of course the abstraction there is horrible, but that's my idea - once you know a sensor_id you're interested in, you query its type, then use that type to determine which of the data tables you query next.
BTW, do you know of a better modeling program than visio? I find the way they do crow's feet sometimes misleading. For instance, my drawing below should show that we have 1 controller that has many sensors, and in my mind the crows feet should point towards the sensor table.
Turning off the crow's feet and and looking at the relational markers is ridiculous.
I might create another thread on the matter in another thread.
er wait, your last diagram has confused foreign keys.
it shows a building only having one controller, but controllers having many buildings. also, controllers only having one sensor, but a sensor having many controllers.
buildings just needs (pk:building_id, building_name), controllers just needs (pk:controller_id, fk:building_id, controller_name), sensors just needs (pk:sensor_id, fk:controller_id, fk:sensor_type, sensor_name).. and that's it.
Ok, forgot about that - I was thinking that buildings were aware of their controllers instead of controllers being aware of their buildings - if I could remember better (peh), the way to say things is "a controller is assigned to a building". No wonder I was looking at the crow's feet and complaining! :P Shows you how much I retain stuff over the years - just enough to know something's amiss! Thanks for everything, I think this is finalized
looks happy enough.. though i don't know what you're going to store in the sensortype.iotype field - the sensortype table just needs a unique id for each type of sensor possible, and a name that you associate with each type. e.g.
select * from sensortypes;
also, you should finalize what you want to have in the data tables themselves - right now you have the sensor_id as the sole primary key for each data entry which won't work - it won't be unique for every data point. either each of the data tables has a combination of (sensor_id + timestamp) as the primary key, or a new "currentdata_id". example:
sole primary key for each data entry which won't work
Oops, the PK I had got eliminated in one of the revisions. I'll add it back.
i don't know what you're going to store in the sensortype.iotype field
I was figuring sensortypes.type = "current" and sensortypes.iotype = "input". That way I could get a query on only output or only input. I envision this way would cut down on looping that my client program would have to do. Otherwise I have to do something of the nature:
//Form query here - I'm too lazy to write it out right now
iotype = MyOdbcDataReader.GetString();
Instead, I can directly ask for all output sensors by using (WHERE sensortypes.iotype = "output").