If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Interesting Problem

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-12-04, 11:11
King_Aero King_Aero is offline
Registered User
 
Join Date: Feb 2004
Posts: 6
Interesting Problem

Hello DB Guru's,

I'm having more troubles with the design of the table layout for my database. The problem I have is that my database needs to store floating point data values for our fuel cells but the number of those data points may change. However, the frequency of those changes is more frequent for the first couple of months during commissioning and then its frequency is almost nill when it runs steady state. This makes it more challenging for table layouts and so far we figure we have two options. And just as a disclaimer, I'm just a C programmer with very little db experience.


+-------+-------+--------+--------+-------------
| Time | Data1 | Data2 | Data3 | Grows Horz
+-------+-------+--------+--------+-------------
Feb 1 13.14 21.14 42.19

OR

+-------+-------+--------+
| Time | Data | Value |
+-------+-------+--------+
Feb 1 Data1 12.12
Feb 1 Data2 23.41

We typically log 100 to 150 data values per minute per fuel cell. The problem with the first solution is that when the number of data values are changing then you are altering the table and having holes in the data. However, that design is much better for when the number of data values doesn't change which is what happens after commissioning.

The second solution handles both cases but your inserting 100 to 150 records per minute per fuel cell. And there seems to be overlap of data. Ie. The timestamp will be the same for 100 to 150 values. And do we need to be concerned with overhead between 1 INSERT statement versus 100 to 150 per minute per fuel cell?

Either solution would only need to handle data coming from 1 - 20 fuel cells (approx). We would also likely have 1 table per fuel cell, but it would be necessary for the first solution.

The other problem is that for 90% of the time the number of data values won't change like after the commissioning so it seems to make more sense use the first solution which will work well for most of the time.

Is altering a table during run-time a bad idea? It seems to go against the grain of database design. Do you think we are trying to solve two problems with one solution?

Any thoughts on this would be greatly appreciated.
Reply With Quote
  #2 (permalink)  
Old 02-12-04, 12:05
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 5,171
Re: Interesting Problem

You are addressing a couple of issues here:

1) Performance: is inserting 100-150 records per minute for 20 fuel cells going to perform OK? That is, a maximum of 150*20 = 3000 records per minute, or 50 per second. Well, obviously performance depends on the particular DBMS, hardware etc. But 50 records per second doesn't sound excessive.

2) Uniqueness: with the second design, the unique key to a record will be (time, data) so uniqueness is preserved.

What you don't appear to be addressing is the usability of this data. I don't know how you intend to use the data, but in general it is much easier to work with the second table design than the first with queries like:
  • what was the average data value for fuel cell X between 10:00 and 12:00 today: "SUM(value)" is easier than "SUM(value1+value2+...+value150)"
  • how often did the value exceed 42: "value > 42" is easier than "value1 > 42 or ... or value150 > 42".
Only if you never have to query "across" those 150 columns is the first table design perhaps OK.

Also I would not like to be considering doing ALTER TABLE at run time!

In fact, I would probably want to go for a single table to capture the data for ALL fuel cells:

+------+-------+-------+--------+
+ Cell | Time | Data | Value |
+------+-------+-------+--------+
Cell1 Feb 1 Data1 12.12
Cell1 Feb 1 Data2 23.41
...
__________________
Tony Andrews
http://tinyurl.com/tonyandrews
Reply With Quote
  #3 (permalink)  
Old 02-12-04, 13:21
King_Aero King_Aero is offline
Registered User
 
Join Date: Feb 2004
Posts: 6
Thanks for the input. There's a couple things I don't understand yet, which is your examples of queries. Those a perfect examples of how we would like to look at the data but I don't understand how it makes it easier.

In SUM(Value) example, is "value" referring to a whole field? If so, then we would be averaging all of the values in the second db layout which wouldn't work. You would only want to average the values of the same data point. So SUM(value) WHERE Data = Data1.
Reply With Quote
  #4 (permalink)  
Old 02-13-04, 04:49
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 5,171
Right: one of the things I'm not clear about is what these 100-150 data points are. So maybe adding them all up is a bad example!

But consider from a SQL programming perspective. If you want to produce a summary for each data point for the day (assuming that is meaningful!), then which is easier:

select data, sum(value)
from t
where time = 'Feb1'
group by data;

or

select sum(value1), sum(value2), sum(value3), ... sum(value150)
from t
where time = 'Feb1';

Probably you should think of the sort of reports/queries that are realistic for your system, and how you would write the SQL in each case.
__________________
Tony Andrews
http://tinyurl.com/tonyandrews
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On