Hi everyone. I just found this Data Modeling BB, so I’m having high hopes that I can get some good feedback and provide some myself.
I was reviewing a model from an application team recently and their model made sense when the team described it at the time. But in looking at it later, the relationships in one part of it looks like multiple inheritance. I attached a distilled version of the model, containing four entities. Here are descriptions of the entities:
Device – A manufacturing machine that can log data of what it is producing.
DeviceTimeseries – Defines a time series of data from a device. Each time series denotes the starting time of each day.
DeviceAttribute – The attributes of a device, such as product quality, machine configuration, etc.
DeviceAttrTimeseries – A value associated with a device’s attribute.
I have a problem with the entity at the bottom of the diagram inheriting its identity from two other entities, especially ones that have the same unique identifiers (DeviceName). ERwin allows this and even generates the correct DDL, meaning the DeviceAttrTimeseries table has two FK’s each pointing to a parent. This means the child’s PK values must be in both parent tables. So mechanically it works.
I suggested dropping the DeviceTimeseries entity, but they said they MUST have the ObservationMode attribute at that level.
I may be missing something here, so let me know if I don't make any sense at all or have something wrong
Just to confirm, this is to be a relational database? If not, my comments won't apply.
From what I can see the design is fine. FKs don't denote parent/child relationships or inheritance chains, they're just links to other rows in other tables.
I'm not 100% on what kind of data they're trying to express here. I'm going to guess that DeviceTimeseries refers to an individual log or job of a device. If that's true, you'd have data something like:
Device - list of devices
DeviceAttribute - list of attributes per device, so different devices can have different attribute lists
DeviceTimeseries - list of jobs that have been run, each device has its own list
DeviceAttrTimeseries - this is where the values of each attribute, for each timeseries, for each device are stored.
A AConfiguration 01/01/10 "standard A config"
A AConfiguration 01/01/10 "high quality"
A AConfiguration 01/02/10 "new testing A config"
A AConfiguration 01/02/10 "medium quality"
B BConfiguration 01/03/10 "standard B config"
So no data seems to be repeated anywhere. You have an efficient, concise list of devices you can get by querying Device. The same is true of the other tables:
To get a list of attributes that apply to device A you'd go:
select AttributeName from DeviceAttribute where DeviceName = 'A';
To get all of the timeseries' run by device A you'd go:
select Timestamp from DeviceTimeseries where DeviceName = 'A';
If you wanted to see the results of the 01/01/10 run of device A, you'd go
select AttributeName, Value from DeviceAttrTimeseries where DeviceName = 'A' and Timestamp = '01/01/10';
So unless I'm missing something, this seems to be a solid model. The only thing I'm confused about is the lack of ID columns. Normally of course you'd do your FK->PK linking with an ID column separate from its name. So device A might have an ID of "1" - and if its name ever had to change, you could just update the one record in the Device table and you'd be done. Plus numeric IDs are faster.