If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Design for documenting data contained within the DB

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 09-03-10, 14:37
carlosn carlosn is offline
Registered User
 
Join Date: Aug 2010
Posts: 48
Design for documenting data contained within the DB

Thank you once again, to those of you who have helped me with some of my prior questions. This aerospace engineer has learned a lot about databases the last month.

One of the key requirements in the database I'm designing is to be able to capture the source of a given piece of data. For each data source, I want to capture the document name, author, date, etc. Clearly this is a simple table to implement.

The question is how to link a data source to the data it has provided - which is after all the main content of the database. My requirements are as follows:
  • I know multiple data items might come from a single data source.
  • The data items will be spread across multiple tables.
  • The data in one row of any given table could potentially have multiple sources
  • Sometimes I might have multiple data sources for a single data item (though redundancy is good, I suppose if push came to shove this requirement could be eliminated)
  • There is no requirement to "easily" see all items that a given data source documents. The requirement is only to see what sources document a given piece of data.

I've toyed around with some ideas, but would love some third-party insights. Some of the concepts I've toyed around with:
  • Keep all data sources in one single table. I really don't see any other useful concept for storing the data source information.
  1. Each table that has data whose source needs to be documented has a corresponding junction table that joins the data table with the source table in a many-to-may relationship. This has two disadvantages.
    1. By definition, I'll have as many junction tables as real data tables.
    2. All of the contents of a given row will point to the same set of data sources. It is up to the reader to determine that A came from Doc1 and B from Doc2. However, since presumably the columns of each table are closely related, this is probably not awful.
  2. Another alternative is to have a single junction table. As before the junction table would have a FK to the data source table. But instead of having just a FK to the data table, I would have to have some sort of column that serves as a data table identifier. This is in order to retain uniqueness in all of the rows of the junction table. This has the advantage of reducing the number of tables, but it does make the design somewhat more obscure
  3. A third alternative is to basically have a FK column for each and every row in any table that requires documenting the source of that column. This would allow me to document the source on a per column basis,not just a per row basis. On the other hand, I would loose the ability to have multiple documentation sources for a single piece of data. This is probably OK, if it significantly simplifies the design. Having a data source is a requirement, redundant data sources are nice-to-have.
    • If I went this route, I could see each table having almost twice the number of columns (i.e. one for data, one for its source), or having a "mirror" table in a 1-1 relationship to the data table, and whose contents are just the FKs to the data sources.

At this point I'm leaning toward Options 1 or 3. From a content perspective Option 1 has the advantage of multiple sources, at the expense of all the data in a row having the same set of sources. Option 2 functionally is the same, but seems "ugly". Option 3 has the advantage of sourcing every column, at the expense of having a single data source for each data bit. I'm willing to trade some of these options against ease of front-end implementation and/or performance.

Comments, concerns, or suggestions are appreciated!
Carlos
Reply With Quote
  #2 (permalink)  
Old 09-03-10, 15:38
Teddy Teddy is offline
Purveyor of Discontent
 
Join Date: Mar 2003
Location: The Bottom of The Barrel
Posts: 6,075
What kind of "data" are we talking about here? Is this all data that can be represented by a single table? What kind of real world entity is this data supposed to represent? Also, how would it come to pass that you have multiple sources for a single data item?

It's easy enough to spool up a table of sources with a valid source id. Deciding where, when and how to use that source id is entirely dependent on the business rules/workflow involved with your data. "Piece of data" can mean ANYTHING the way you've described it so far. It will be very difficult to lend any guidance on possible approaches without understanding the core of your model (the data item itself). On the plus side, tracking source of origin is a pretty common requirement in all kinds of random business processes so there's a pretty good chance of having some existing models to work from once you've better defined what it is you're tracking and why...
__________________
oh yeah... documentation... I have heard of that.

*** What Do You Want In The MS Access Forum? ***
Reply With Quote
  #3 (permalink)  
Old 09-03-10, 15:56
carlosn carlosn is offline
Registered User
 
Join Date: Aug 2010
Posts: 48
Quote:
Originally Posted by Teddy View Post
What kind of "data" are we talking about here? Is this all data that can be represented by a single table? What kind of real world entity is this data supposed to represent?
The database is a consolidate point to store information regarding our products. We make one-of-a-kind products, and we want to capture a broad range of information. We are capturing technical parameters and performance (i.e. mass, power), contractual information (program cost, customer, etc), historical schedule information (i.e. how milestones shifted during the development), events related to each product (start of contract, technical reviews, product delivery, decommissioning), etc. The data is most definitely not in one table. For instance, although each product is unique they might share common attributes: customer, power supply design, delivery information (if they were part of a single acquisition).

Quote:
Also, how would it come to pass that you have multiple sources for a single data item?
Some of this data is historical, and we have contradicting pieces of information. Other times, the data captured in a "single" data point is derived from different sources. This is largely a result that various programs historically might have reported different flavors of the same data, and for capturing it in the database we want to convert all the information to the same reference frame. A (trivial) example I recently ran into, for instance date for the start of operations. For a number of the products, I noticed that there was a difference of one day between two data sources I had. Turns out one source had UTC, the other local time.

The contradicting information and the different ways of looking at the data is actually one of the primary reasons for creating this database.

Quote:
It's easy enough to spool up a table of sources with a valid source id. Deciding where, when and how to use that source id is entirely dependent on the business rules/workflow involved with your data. "Piece of data" can mean ANYTHING the way you've described it so far.
As, I mentioned, "piece of data" does mean a number of different things. It is all historical, however. The goal is that when someone pulls up a page of information, they can press a button and be told where all that information came from.

Does this help clarify things?

Carlos
Reply With Quote
  #4 (permalink)  
Old 09-13-10, 10:57
carlosn carlosn is offline
Registered User
 
Join Date: Aug 2010
Posts: 48
Bump. Any further thoughts on this question? I'd love to get some additional insights before I go deeper into my design.
Reply With Quote
  #5 (permalink)  
Old 09-13-10, 11:05
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
I think a fully normalised, lossless data model (option 4) would be horrific to implement and would contain thousands of tables.
Option 2 is funny - it is like a One True Association Table (if you have come across OTLTs before then you'll understand what I mean).

I reckon option 3 is your best bet. You could make the source columns bitmasks (if you have few enough sources) and can stand to lose referential integrity. This would allow you to store the multiple sources.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #6 (permalink)  
Old 09-14-10, 12:19
MarkATrombley MarkATrombley is offline
Registered User
 
Join Date: Jul 2009
Location: Michigan
Posts: 125
I would create a source table to describe the data sources (June 2010 stockholder report).

I would create a fact table to describe the data itself (Sales Manager is a moron).

I would create a many to many table to tie the data sources to the fact table (so multiple sources can say the sales manager is a moron).

Then I would include the key of the fact table in every table that references the facts (needs to note that the sales manager is a moron).
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On