I'm having more troubles with the design of the table layout for my database. The problem I have is that my database needs to store floating point data values for our fuel cells but the number of those data points may change. However, the frequency of those changes is more frequent for the first couple of months during commissioning and then its frequency is almost nill when it runs steady state. This makes it more challenging for table layouts and so far we figure we have two options. And just as a disclaimer, I'm just a C programmer with very little db experience.
| Time | Data1 | Data2 | Data3 | Grows Horz
Feb 1 13.14 21.14 42.19
| Time | Data | Value |
Feb 1 Data1 12.12
Feb 1 Data2 23.41
We typically log 100 to 150 data values per minute per fuel cell. The problem with the first solution is that when the number of data values are changing then you are altering the table and having holes in the data. However, that design is much better for when the number of data values doesn't change which is what happens after commissioning.
The second solution handles both cases but your inserting 100 to 150 records per minute per fuel cell. And there seems to be overlap of data. Ie. The timestamp will be the same for 100 to 150 values. And do we need to be concerned with overhead between 1 INSERT statement versus 100 to 150 per minute per fuel cell?
Either solution would only need to handle data coming from 1 - 20 fuel cells (approx). We would also likely have 1 table per fuel cell, but it would be necessary for the first solution.
The other problem is that for 90% of the time the number of data values won't change like after the commissioning so it seems to make more sense use the first solution which will work well for most of the time.
Is altering a table during run-time a bad idea? It seems to go against the grain of database design. Do you think we are trying to solve two problems with one solution?
Any thoughts on this would be greatly appreciated.
1) Performance: is inserting 100-150 records per minute for 20 fuel cells going to perform OK? That is, a maximum of 150*20 = 3000 records per minute, or 50 per second. Well, obviously performance depends on the particular DBMS, hardware etc. But 50 records per second doesn't sound excessive.
2) Uniqueness: with the second design, the unique key to a record will be (time, data) so uniqueness is preserved.
What you don't appear to be addressing is the usability of this data. I don't know how you intend to use the data, but in general it is much easier to work with the second table design than the first with queries like:
what was the average data value for fuel cell X between 10:00 and 12:00 today: "SUM(value)" is easier than "SUM(value1+value2+...+value150)"
how often did the value exceed 42: "value > 42" is easier than "value1 > 42 or ... or value150 > 42".
Only if you never have to query "across" those 150 columns is the first table design perhaps OK.
Also I would not like to be considering doing ALTER TABLE at run time!
In fact, I would probably want to go for a single table to capture the data for ALL fuel cells:
+ Cell | Time | Data | Value |
Cell1 Feb 1 Data1 12.12
Cell1 Feb 1 Data2 23.41
Thanks for the input. There's a couple things I don't understand yet, which is your examples of queries. Those a perfect examples of how we would like to look at the data but I don't understand how it makes it easier.
In SUM(Value) example, is "value" referring to a whole field? If so, then we would be averaging all of the values in the second db layout which wouldn't work. You would only want to average the values of the same data point. So SUM(value) WHERE Data = Data1.