If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > a beginners design headache

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-11-04, 17:19
jst jst is offline
Registered User
 
Join Date: Jan 2004
Posts: 18
a beginners design headache

hello

im rather new to database design and im building a sequence and lab management web database for a conservation genetics company.

the main design headache im encountering is modelling the data recorded for the automatic dna sequencer and pcr machine, these have a large number of lanes or positions, usually about 96, that each can store an individual sample that has a number of different fields associated with it e.g. tissue type, elution volume etc etc.

i need to record the data foreach particular run of these machines; so i have a table containing the id number of the run with other general information not specific to a lane/position, and then have alot of further one to one related tables that hold one piece of data foreach of the lanes/positions. consequently my table count is growing huge (to my eyes anyway).

im considering combining data into single fields but from what i understand of db design this is a big no no - but might be the only solution. if anyone has the inclination to discuss this with me, offer comments or point me to some info sources on designing dbs that arent the usual business models or ecom sites, id be hugely appreciative.

cheers.

j
Reply With Quote
  #2 (permalink)  
Old 01-11-04, 21:11
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
I would be interested in discussing it.

From what you have written I would say you have to start a couple of hierarchy levels higher. You create a table for the entire sample, then a table for each of the positions, then a table for each of the tests for each of the positions and then a table for the results for each of the tests. It may be more complicated at the lower end but it seems close.

Machine(MachineID, MachineDescription)

Lane(LaneID, MachineID, LaneDescription)

Position(PositionID, LaneID, PositionDescription)

That's your equipment. Now you put samples in it.

Run(RunID, RunDescription)

Sample(SampleID, RunID, SampleDescription)

Property(PropertyID, PropertyDescription)

SampleProperty(SampleID, PropertyID, SamplePropertyDescription)

Result(ResultID, PositionID, SampleID, ResultValue)

For more I need to know more.
__________________
visit: relationary

Last edited by certus; 01-12-04 at 00:15.
Reply With Quote
  #3 (permalink)  
Old 01-12-04, 01:10
jst jst is offline
Registered User
 
Join Date: Jan 2004
Posts: 18
Thanks for the response and the aid certus....

I think im basically following you, however i don't fully understand how you're envisaging the Lane and Position tables: currently, i have a table for each of the properties, at each position or lane, in a run that need to be recorded, e.g.

SequenceRunTissue(SequenceRunTissueID, SequenceRunID, Lane1, Lane2..... Lane96)

But are you suggesting a seperate table for the lanes where each lane is recorded as a row? is that preferable to having a maximum of 96 collums? also i think i misled you, lanes and positions are the same thing - it just depends on whether its the Sequencer or PCR machine.

Again, cheers for running through this stuff with me.

j
Reply With Quote
  #4 (permalink)  
Old 01-12-04, 08:22
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
Yes, each lane should be recorded as a row.
__________________
visit: relationary
Reply With Quote
  #5 (permalink)  
Old 01-12-04, 10:34
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
Treating Lanes as rows would enforce First Normal Form. "Eliminate repeating groups."
__________________
visit: relationary
Reply With Quote
  #6 (permalink)  
Old 01-12-04, 20:19
jst jst is offline
Registered User
 
Join Date: Jan 2004
Posts: 18
Thanks certus

Please excuse my ignorance, but in terms of implementing such a table a set of lanes (rows) would be identified by their RunID? Would there still need to be a seperate LaneID for each row and would that still be the primary key? i would also then simply have all the properties of each lane as collums right? i know this must seem bleedin obvious, but i just need to make sure.
cheers
j
Reply With Quote
  #7 (permalink)  
Old 01-12-04, 20:44
jst jst is offline
Registered User
 
Join Date: Jan 2004
Posts: 18
one more thing....

the lane table would begin to grow to a quite large size i would guess, particularly if we use alot of 96 lane runs, at what stage does size become an issue? i read in a previous thread 100,000,000 is big - so i suppose its going to be a while before they reach 1 000 000 sequencing runs, would that be correct (considering each row would have about 5 - 10 attributes of small value)?
Reply With Quote
  #8 (permalink)  
Old 01-12-04, 22:20
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
First we have to get some of the concepts straight so I will share with you my understanding to save time.

A machine has 96 lanes. period.

Each processing sequence is called a run.

For a run each lane contains a sample with a set of properties.

What I am saying is that you create a lane table containing 96 rows that never gets additional rows. The lanes never change physically neither should the table. It's only purpose is to tell you which lane a sample is going into through a join between lane and sample.

Maybe this would make more sense.
Lane(LaneID, LaneName) -- 96 rows
Run(RunID, RunFacts...)
Sample(SampleID, SampleFacts...)

Test(RunID, LaneID, SampleID, TestResult)
__________________
visit: relationary
Reply With Quote
  #9 (permalink)  
Old 01-12-04, 22:28
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
You have to create an entity relationship diagram. Or barring that you have to write out the rules of you model as I attempted in brief. Tell me the story of a run. What are the properties of a lane. What are the properties of a sample. What are the properties of a run. What else is involved.

Note that it is clear to me that a run, a lane and a sample are separate entities don't mix them up.
__________________
visit: relationary
Reply With Quote
  #10 (permalink)  
Old 01-13-04, 01:32
jst jst is offline
Registered User
 
Join Date: Jan 2004
Posts: 18
ok, here goes

So the simplest machine process to describe is the DNA Sequencing run of the Automated Sequencer(AS).

Within the AS is a tray that has 96 lanes.

Each lane of the AS can be loaded with finished PCR product (simply a piece of DNA that has been repeatedly copied to give a stronger signal).

PCRs are produced on a special PCR machine in large runs aswell.

A PCR machine has a tray also with 96 positions. Each position can produce PCR product from one Sample. By Sample I mean the DNA of an individual animal.

For the Sequencing Run we also need to know, for each lane, the PCRRunID, the position of this particular product on the PCR tray and the ID of the Sample at that position aswell.

Additionaly, for the Sequencing Run as a whole, we need a RunID, the Date, StaffID of who carried out the process, ID of which member of the management staff approved it, the location of the AS files (on a serperate computer) and 10 values detailing the various volumes of solutions added to each lane e.g. how much PCR product, buffer, dye and primers.

So thats the information im trying to model for a straight forward sequencing run. Other types of runs are more complicated, for instance, performing Genotyping analysis. This is essentially the same as the sequencing run except that instead of a single pcr (with its associated data) up to 30 can be included (with similar data aswell). Thats my biggest headache.

Each Sample has the following: an ID, an Original code (of those who sent it to us), the ProjectID thats its associated with, Species name, Location, GPS, SampleDate, a description of the sample (eg roadkill), sex of the animal, contact (ie who sent it), its position number in the freezer, DNAExtract storage number and notes about the data inputer and approval.

I have got an ERD but it changes frequently as i've been struggling with how to represent the data. Hence my call for aid.

A further question; im not fully understanding the concept of the Lanes table. If you use it as a join for the sample table where is the data for each sequencing run actually stored?

I cant thankyou enough for your willingness to help - if you ever need a place to stay in Malaysia (where alot of our work is based) then give me a yell.
Reply With Quote
  #11 (permalink)  
Old 01-13-04, 11:11
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
Malaysia. How about that. My wife is from Singapore.

Alright, based on your description

Sample(PCRSampleID, PCRSampleAttributes) -- your animal

PCRRun(PCRRunID, PCRAttributes)

PCRPosition(PCRPositionID, PCRRunID, PCRPositionNumber, PCRSampleID, PCRPositionAttributes)

The above gives you a database of all your samples, PCR runs and positions. Every position ID is a unique value for the entire database. This should give you what you need for your genotype work.

ASRun(ASRunID, ASRunAttributes)

ASVolumes(ASRunID, ASRunVolumeAttributes) a 1:1 table with ASRun

ASLane(ASLaneID, ASRunID, ASLaneNumber, PCRPositionID)

Each ASLaneID is unique and correlates each AS run lane with one of the unique PCR run postions.
__________________
visit: relationary
Reply With Quote
  #12 (permalink)  
Old 01-13-04, 20:29
jst jst is offline
Registered User
 
Join Date: Jan 2004
Posts: 18
if i can just see if ive got this right....

every pcr position recorded has its own row with the variety of data required.

every AS lane recorded has its own row that, among other things, contains the positionID of the pcr in there.

with our genotyping then, are we not limited to one pcr in each lane? there can be up to 30 per lane.
Reply With Quote
  #13 (permalink)  
Old 01-13-04, 23:23
jst jst is offline
Registered User
 
Join Date: Jan 2004
Posts: 18
so i guess id have the following then....

GenotypingRun(GenotypingRunID, GenotypingAttributes....)

GenotypingLane(GenotypingLaneID, GenotypingLaneNumber, GenotypingRunID)

GenotypingLanePCR(GenotypingLanePCRID, GenotypingLaneID, GenotypingRunID, PCRPositionID, PCRRunID)

?
Reply With Quote
  #14 (permalink)  
Old 01-14-04, 13:07
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
You could set up PCR like this

PCRRun(PCRRunID, PCRRunAttributes...)

PCRPostion(PCRPositionID, PCRPositionNumber)

PCRRunPosition(PCRRunPositionID, PCRRunID, PCRPositionID, Attributes...)

The nomenclature makes more sense this way.

I don't understand the up to 30 pcr to one lane. Do you mean you mix 30 pcr position contents into one lane? Or do you mean that you are taking pcr positions from 30 runs and producing a tray?

If it is the latter then you have

GenotypeRun(GenotypeRunID, Attributes...)

GenotypeLane(GenotypeLaneID, GenotypeNumber)

GenotypeRunLane(GenotypeRunLaneID, GenotypeRunID, GenotypeLaneID, PCRRunPositionID, Attributes...)
__________________
visit: relationary

Last edited by certus; 01-14-04 at 13:21.
Reply With Quote
  #15 (permalink)  
Old 01-14-04, 20:02
jst jst is offline
Registered User
 
Join Date: Jan 2004
Posts: 18
It's the former, you mix 30 different pcr products (positions) into one lane for DNA fingerprinting, each pcr represents a different section of the genome from one animal. You then place different animals in the other lanes and you can quickly infer relatedness (just incase you where interested).

Above you have the PCRPosition and GenotypeLane tables, are they still static, as in not acquiring anymore rows, like you were suggesting earlier? Couldnt you just have a PCRPositionNumber field in the PCRRunPosition table instead?

thanks again certus....
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On