Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Jan 2004
    Posts
    18

    a beginners design headache

    hello

    im rather new to database design and im building a sequence and lab management web database for a conservation genetics company.

    the main design headache im encountering is modelling the data recorded for the automatic dna sequencer and pcr machine, these have a large number of lanes or positions, usually about 96, that each can store an individual sample that has a number of different fields associated with it e.g. tissue type, elution volume etc etc.

    i need to record the data foreach particular run of these machines; so i have a table containing the id number of the run with other general information not specific to a lane/position, and then have alot of further one to one related tables that hold one piece of data foreach of the lanes/positions. consequently my table count is growing huge (to my eyes anyway).

    im considering combining data into single fields but from what i understand of db design this is a big no no - but might be the only solution. if anyone has the inclination to discuss this with me, offer comments or point me to some info sources on designing dbs that arent the usual business models or ecom sites, id be hugely appreciative.

    cheers.

    j

  2. #2
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    I would be interested in discussing it.

    From what you have written I would say you have to start a couple of hierarchy levels higher. You create a table for the entire sample, then a table for each of the positions, then a table for each of the tests for each of the positions and then a table for the results for each of the tests. It may be more complicated at the lower end but it seems close.

    Machine(MachineID, MachineDescription)

    Lane(LaneID, MachineID, LaneDescription)

    Position(PositionID, LaneID, PositionDescription)

    That's your equipment. Now you put samples in it.

    Run(RunID, RunDescription)

    Sample(SampleID, RunID, SampleDescription)

    Property(PropertyID, PropertyDescription)

    SampleProperty(SampleID, PropertyID, SamplePropertyDescription)

    Result(ResultID, PositionID, SampleID, ResultValue)

    For more I need to know more.
    Last edited by certus; 01-12-04 at 01:15.

  3. #3
    Join Date
    Jan 2004
    Posts
    18
    Thanks for the response and the aid certus....

    I think im basically following you, however i don't fully understand how you're envisaging the Lane and Position tables: currently, i have a table for each of the properties, at each position or lane, in a run that need to be recorded, e.g.

    SequenceRunTissue(SequenceRunTissueID, SequenceRunID, Lane1, Lane2..... Lane96)

    But are you suggesting a seperate table for the lanes where each lane is recorded as a row? is that preferable to having a maximum of 96 collums? also i think i misled you, lanes and positions are the same thing - it just depends on whether its the Sequencer or PCR machine.

    Again, cheers for running through this stuff with me.

    j

  4. #4
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Yes, each lane should be recorded as a row.

  5. #5
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Treating Lanes as rows would enforce First Normal Form. "Eliminate repeating groups."

  6. #6
    Join Date
    Jan 2004
    Posts
    18
    Thanks certus

    Please excuse my ignorance, but in terms of implementing such a table a set of lanes (rows) would be identified by their RunID? Would there still need to be a seperate LaneID for each row and would that still be the primary key? i would also then simply have all the properties of each lane as collums right? i know this must seem bleedin obvious, but i just need to make sure.
    cheers
    j

  7. #7
    Join Date
    Jan 2004
    Posts
    18
    one more thing....

    the lane table would begin to grow to a quite large size i would guess, particularly if we use alot of 96 lane runs, at what stage does size become an issue? i read in a previous thread 100,000,000 is big - so i suppose its going to be a while before they reach 1 000 000 sequencing runs, would that be correct (considering each row would have about 5 - 10 attributes of small value)?

  8. #8
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    First we have to get some of the concepts straight so I will share with you my understanding to save time.

    A machine has 96 lanes. period.

    Each processing sequence is called a run.

    For a run each lane contains a sample with a set of properties.

    What I am saying is that you create a lane table containing 96 rows that never gets additional rows. The lanes never change physically neither should the table. It's only purpose is to tell you which lane a sample is going into through a join between lane and sample.

    Maybe this would make more sense.
    Lane(LaneID, LaneName) -- 96 rows
    Run(RunID, RunFacts...)
    Sample(SampleID, SampleFacts...)

    Test(RunID, LaneID, SampleID, TestResult)

  9. #9
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    You have to create an entity relationship diagram. Or barring that you have to write out the rules of you model as I attempted in brief. Tell me the story of a run. What are the properties of a lane. What are the properties of a sample. What are the properties of a run. What else is involved.

    Note that it is clear to me that a run, a lane and a sample are separate entities don't mix them up.

  10. #10
    Join Date
    Jan 2004
    Posts
    18
    ok, here goes

    So the simplest machine process to describe is the DNA Sequencing run of the Automated Sequencer(AS).

    Within the AS is a tray that has 96 lanes.

    Each lane of the AS can be loaded with finished PCR product (simply a piece of DNA that has been repeatedly copied to give a stronger signal).

    PCRs are produced on a special PCR machine in large runs aswell.

    A PCR machine has a tray also with 96 positions. Each position can produce PCR product from one Sample. By Sample I mean the DNA of an individual animal.

    For the Sequencing Run we also need to know, for each lane, the PCRRunID, the position of this particular product on the PCR tray and the ID of the Sample at that position aswell.

    Additionaly, for the Sequencing Run as a whole, we need a RunID, the Date, StaffID of who carried out the process, ID of which member of the management staff approved it, the location of the AS files (on a serperate computer) and 10 values detailing the various volumes of solutions added to each lane e.g. how much PCR product, buffer, dye and primers.

    So thats the information im trying to model for a straight forward sequencing run. Other types of runs are more complicated, for instance, performing Genotyping analysis. This is essentially the same as the sequencing run except that instead of a single pcr (with its associated data) up to 30 can be included (with similar data aswell). Thats my biggest headache.

    Each Sample has the following: an ID, an Original code (of those who sent it to us), the ProjectID thats its associated with, Species name, Location, GPS, SampleDate, a description of the sample (eg roadkill), sex of the animal, contact (ie who sent it), its position number in the freezer, DNAExtract storage number and notes about the data inputer and approval.

    I have got an ERD but it changes frequently as i've been struggling with how to represent the data. Hence my call for aid.

    A further question; im not fully understanding the concept of the Lanes table. If you use it as a join for the sample table where is the data for each sequencing run actually stored?

    I cant thankyou enough for your willingness to help - if you ever need a place to stay in Malaysia (where alot of our work is based) then give me a yell.

  11. #11
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Malaysia. How about that. My wife is from Singapore.

    Alright, based on your description

    Sample(PCRSampleID, PCRSampleAttributes) -- your animal

    PCRRun(PCRRunID, PCRAttributes)

    PCRPosition(PCRPositionID, PCRRunID, PCRPositionNumber, PCRSampleID, PCRPositionAttributes)

    The above gives you a database of all your samples, PCR runs and positions. Every position ID is a unique value for the entire database. This should give you what you need for your genotype work.

    ASRun(ASRunID, ASRunAttributes)

    ASVolumes(ASRunID, ASRunVolumeAttributes) a 1:1 table with ASRun

    ASLane(ASLaneID, ASRunID, ASLaneNumber, PCRPositionID)

    Each ASLaneID is unique and correlates each AS run lane with one of the unique PCR run postions.

  12. #12
    Join Date
    Jan 2004
    Posts
    18
    if i can just see if ive got this right....

    every pcr position recorded has its own row with the variety of data required.

    every AS lane recorded has its own row that, among other things, contains the positionID of the pcr in there.

    with our genotyping then, are we not limited to one pcr in each lane? there can be up to 30 per lane.

  13. #13
    Join Date
    Jan 2004
    Posts
    18
    so i guess id have the following then....

    GenotypingRun(GenotypingRunID, GenotypingAttributes....)

    GenotypingLane(GenotypingLaneID, GenotypingLaneNumber, GenotypingRunID)

    GenotypingLanePCR(GenotypingLanePCRID, GenotypingLaneID, GenotypingRunID, PCRPositionID, PCRRunID)

    ?

  14. #14
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    You could set up PCR like this

    PCRRun(PCRRunID, PCRRunAttributes...)

    PCRPostion(PCRPositionID, PCRPositionNumber)

    PCRRunPosition(PCRRunPositionID, PCRRunID, PCRPositionID, Attributes...)

    The nomenclature makes more sense this way.

    I don't understand the up to 30 pcr to one lane. Do you mean you mix 30 pcr position contents into one lane? Or do you mean that you are taking pcr positions from 30 runs and producing a tray?

    If it is the latter then you have

    GenotypeRun(GenotypeRunID, Attributes...)

    GenotypeLane(GenotypeLaneID, GenotypeNumber)

    GenotypeRunLane(GenotypeRunLaneID, GenotypeRunID, GenotypeLaneID, PCRRunPositionID, Attributes...)
    Last edited by certus; 01-14-04 at 14:21.

  15. #15
    Join Date
    Jan 2004
    Posts
    18
    It's the former, you mix 30 different pcr products (positions) into one lane for DNA fingerprinting, each pcr represents a different section of the genome from one animal. You then place different animals in the other lanes and you can quickly infer relatedness (just incase you where interested).

    Above you have the PCRPosition and GenotypeLane tables, are they still static, as in not acquiring anymore rows, like you were suggesting earlier? Couldnt you just have a PCRPositionNumber field in the PCRRunPosition table instead?

    thanks again certus....

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •