Page 1 of 2 12 LastLast
Results 1 to 15 of 20
  1. #1
    Join Date
    Jan 2005
    Location
    uk
    Posts
    4

    relationships between individuals in a family tree (was "simple design question")

    Hi, I am working on designing a website/database for (my families) family tree project. I will probably be using mySQL and PHP, I have some limited experience of setting up a membership and content management type database but I'm no expert.

    I haven't actually got all the data yet (the research is an ongoing labour of love by my uncle) but it's a fairly complex web. Of course the actual size is pretty small overall in relation to most databases, just a few hundred records, but its large enough that there is sense in making the site data-driven.

    I've been having some problems with what I think should be very simple! That is namely how best to store the relationships between individuals.

    This is what I've got so far, parent-child and sibling-sibling realtionships can be deduced by reference to just the People table.
    (and what are the parentone_ID/parenttwo_ID keys called? are they foreign keys or maybe 'recursive' keys??)

    Code:
    People
    ---------------------
    person_ID      INT  (PK)
    parentone_ID   INT   ('FK' - personID)
    parenttwo_ID   INT   ('FK' - personID)
    first_name     VARCHAR
    middle_names   VARCHAR
    last_name      VARCHAR
    maiden_name    VARCHAR
    nick_names     VARCHAR
    date_birth     VARCHAR
    place_birth    VARCHAR
    date_death     VARCHAR
    place_death    VARCHAR
    mediapath      VARCHAR  *relative url to folder with various media/docs.
    The Marriages table seperately records marriages.

    Code:
    Marriages
    ----------------
    marriage_ID      INT   (PK)
    spouseone_ID     INT   (FK - personID)
    spousetwo_ID     INT   (FK - personID)
    place_marriage   VARCHAR
    date_marriage    VARCHAR
    date_divorce     VARCHAR
    mediapath        VARCHAR

    Is this a robust design?
    I'm not sure if it's 'correct' to have so many variable length fields, my instinct is that if you have one you may as well have many (holds true in file I/O anyway) - but maybe I should fix the field lengths. What's the accepted wisdom in storage vs speed? I really have no idea.

    I was also considering scrapping the parentone_ID and parenttwo_ID fields in the People table and having a 'Relationships' table to record all parent-child and sibling-sibling realtionships. I guess it would look up quicker but feels a lot less elegant.


    Any comments are welcome

  2. #2
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Some suggestions.

    Remove the parentone_Id and parenttwo_id from the person table. Create a table called parents and have columns called father_id and mother_id along with person_id in there. The PK (primary key) will consist of the three columns.

    In the person table, the PK will simply be the person_id.

    I'd recommend not having the surrogate key marriage_id in the table marriages. Instead let the two columns, husband_id and wife_id, be the PK (assuming there are no same-sex marriages).

    Since divorces are less common than marriages, I'd suggest creating a separate table for divorces.

    Hope that was useful.

    Ravi

  3. #3
    Join Date
    Aug 2004
    Posts
    330
    Quote Originally Posted by rajiravi
    I'd recommend not having the surrogate key marriage_id in the table marriages. Instead let the two columns, husband_id and wife_id, be the PK (assuming there are no same-sex marriages).
    Ravi
    Having the husband_id, wife_id as the PK would not allow for re-marriages.

  4. #4
    Join Date
    Jan 2005
    Location
    uk
    Posts
    4
    Thanks for the replies. I like the idea of a Parents table replacing the two fields in Persons. When you say to use the 3 columns in Parents as the PK how does that work in actual useage?
    Does it just mean if I wanted to update a record in Parents I would use syntax like;
    UPDATE Parents SET field=value WHERE father_id='prev_fid' AND mother_id='prev_mid' AND person_id='prev_pid' ?

    I'm think I'm going to leave the marriageID PK in Marriages just in case there is a re-marriage somewhere... (there would appear to be some bigamy already so I wouldn't rule anything out.) By making the columns be spouseone_id and spousetwo_id I think I can sidestep any issues with same-sex marriages.

    Code:
    People
    ---------------------
    person_ID      INT  (PK)
    first_name     VARCHAR
    middle_names   VARCHAR
    last_name      VARCHAR
    maiden_name    VARCHAR
    nick_names     VARCHAR
    date_birth     VARCHAR
    place_birth    VARCHAR
    date_death     VARCHAR
    place_death    VARCHAR
    mediapath      VARCHAR
    
    
    Parents
    --------------
    parentone_ID   INT   (FK - personID)
    parenttwo_ID   INT   (FK - personID)
    child_ID       INT   (FK - personID)
    
    
    Marriages
    ----------------
    marriage_ID      INT   (PK)
    spouseone_ID     INT   (FK - personID)
    spousetwo_ID     INT   (FK - personID)
    place_marriage   VARCHAR
    date_marriage    VARCHAR 
    mediapath        VARCHAR
    
    Divorces
    ----------------
    marriage_ID      INT   (PK) 
    date_divorce     VARCHAR
    Can I get away with just the two columns in Divorces? It somehow feels wrong to use a tables PK as the PK in a second table, but it seems to work....

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by mifune
    Can I get away with just the two columns in Divorces? It somehow feels wrong to use a tables PK as the PK in a second table, but it seems to work....
    yes, you can get away with it, and no, it's not wrong, in fact, there are some modellers who will insist that the divorce date must be in a separate table

    the reasoning they give is that with a separate table, any marriage which has no divorce will have no row in the divorce table, thereby avoiding the thorny (to them, anyway) problem of having the divorce date in the marriage table have to be null for those marriages which have no divorce

    "the failure of my marriage was divorce experience of my life" (badump-bump tshhh)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  6. #6
    Join Date
    Jan 2005
    Location
    uk
    Posts
    4
    hmm, I certainly see what your saying. How far should you go though? If I applied that logic rigorously to the People table then couldn't I seperate most columns out into their own tables, as nearly all of them will have null entries in some records.

    Wouldn't that make the query syntax rather cumbersome though?

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    you have grasped the essential problem

    followed to its logical extreme, you'd have each attribute in its own table, all so that you wouldn't have to store a null anywhere (this is known as sixth normal form)

    and yes, you'd have to re-assemble the data with humungous left outer joins, which would (surprise!) require nulls to represent the missing data

    some data modellers (and i hasten to say i'm not one of them) don't care, though, their focus is on the abhorrence of three-valued logic, and they relegate unimportant matters (such as efficiency) to the database engine, and if the database engine isn't up to the task, then obviously it's a deficient engine

    for a most interesting document about this, see How to Handle Missing Information without Using Nulls (PDF)

    did i mention i like nulls?

    heck, i even go so far as to say that all columns in a table except the primary key could/should be nullable

    this has certain advantages, for example, if you need to insert a child record and you don't have a parent row for it, just do an insert into the parent table with the primary key value (everything else null), and voila, relational integrity is preserved

    but this is, admittedly, a bit controversial among modellers
    Last edited by r937; 02-02-05 at 00:08.
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Quote Originally Posted by r937
    you have grasped the essential problem

    followed to its logical extreme, you'd have each attribute in its own table, all so that you wouldn't have to store a null anywhere (this is known as sixth normal form)

    and yes, you'd have to re-assemble the data with humungous left outer joins, which would (surprise!) require nulls to represent the missing data
    Well, a simple solution (note: simple is not always the same as feasible with current software) relies on the notion that base tables and derived tables (views) are conceptually identical.

    Example:

    Code:
    Table DinnerTrays
      SerialNumber INT PK,
      BreadType BREAD NULLABLE AS NOBREAD('Doesn't like bread'),
      SoupType SOUP NULLABLE AS NOSOUP('No Soup For You!'),
    
    Data:
    
    1   Wheat               No Soup For You!
    2   Doesn't like bread  Vegetable
    3   White               Chicken
    That description contains all the information a DBMS need to create the 6NF tables automatically and then join them back together to get the DinnerTrays table the user requested. The DBMS would generate:

    Code:
    Table DinnerTrays_SerialNumbers
      SerialNumber INT PK
    
    Table DinnerTrays_BreadTypes_BREADS
      SerialNumber INT PK FK(DinnerTrays_SerialNumbers),
      BreadType BREAD
    
    Table DinnerTrays_BreadTypes_NULLS
      SerialNumber INT PK FK(DinnerTrays_SerialNumbers),
      BreadType NOBREAD
    
    Table DinnerTrays_SoupTypes_SOUPS
      SerialNumber INT PK FK(DinnerTrays_SerialNumbers),
      SoupType SOUP
    
    Table DinnerTrays_SoupTypes_NULLS
      SerialNumber INT PK FK(DinnerTrays_SerialNumbers),
      SoupType NOSOUP
    
    Constraint ISEMPTY(
        DinnerTrays_BreadTypes_BREADS PROJECT SerialNumber
            INTERSECT
        DinnerTrays_BreadTypes_NULLS PROJECT SerialNumber
    )
    
    Constraint ISEMPTY(
        DinnerTrays_SoupTypes_SOUPS PROJECT SerialNumber
            INTERSECT
        DinnerTrays_SoupTypes_NULLS PROJECT SerialNumber
    )
    
    View DinnerTrays = 
        DinnerTrays_SerialNumbers
            JOIN
        DinnerTrays_BreadTypes_BREADS UNION DinnerTrays_BreadTypes_NULLS
            JOIN
        DinnerTrays_SoupTypes_SOUPS UNION DinnerTrays_SoupTypes_NULLS
    Yes, I use the term "null" here... it gets tedious to write out missing data 50 times. You ought to be able to define any sorts of missing information, but you're actually explaining *how* it's missing, e.g. the person doesn't like bread or was rejected soup or even "I don't know!"

    No left outer joins here. Those are natural inner joins, which assume that you join on identically named attributes.

    Here are some of the messy details of handling types correctly. There is a BREAD type (an enumerated type with values like wheat, white, rye, pumpernickle, etc) and a NOBREAD type (with just "doesn't like bread") and a BREADORNOBREAD type that is the "union" of both types as the Real domain is the union of the domains of Rational and Irrational numbers.

    But whereas the domain of Real numbers is still a numeric domain, no operations are defined for BREADORNOBREAD!

    That means that to work with this type, you need to either check its value manually or downcast it to either BREAD or NOBREAD (which is pointless since it only has one value) manually. This would be like casting a signed integer type to an unsigned integer type: if it contained a negative value, the DBMS would have to throw an error.

    So if you cast BREADORNOBREAD to BREAD and it has a NOBREAD value, an error gets thrown. No nullogical mistakes are going to slip through. There is no three-value logic.

    Also, this is true for number types: if you have a nullable Integer column and do arithmetic you must check for null values and handle them correctly because there's no rule that 1 + NULL = NULL.

    some data modellers (and i hasten to say i'm not one of them) don't care, though, their focus is on the abhorrence of three-valued logic, and they relegate unimportant matters (such as efficiency) to the database engine, and if the database engine isn't up to the task, then obviously it's a deficient engine
    To my thinking, the efficiency issue is the easiest one yet.

    You see, all that code that should be generated is the logical schema.

    The physical schema would look like:

    Code:
    Table DinnerTrays
      SerialNumber INT PK,
      BreadType BREAD NULLABLE AS NOBREAD('Doesn't like bread'),
      SoupType SOUP NULLABLE AS NOSOUP('No Soup For You!'),
    
    
    View DinnerTrays_SerialNumbers = DinnerTrays PROJECT SerialNumber
    
    View DinnerTrays_BreadTypes_BREADS = DinnerTrays PROJECT SerialNumber, Breadtype WHERE BreadType ISA BREAD
    
    View DinnerTrays_BreadTypes_NULLS = DinnerTrays PROJECT SerialNumber, Breadtype WHERE BreadType ISA NOBREAD
    
    View DinnerTrays_SoupTypes_SOUPS = DinnerTrays PROJECT SerialNumber, Souptype WHERE Souptype ISA SOUP
    
    View DinnerTrays_SoupTypes_NULLS = DinnerTrays PROJECT SerialNumber, Souptype WHERE Souptype ISA NOSOUP
    All that's been added is a bunch of views representing the 6NF tables, and the actual data is stored in the most natural, efficient form. The query optimizer has hints so that it is guaranteed that the 6NF derived tables behave exactly as base tables.

  9. #9
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710

    Genealogy Data Model

    Here's an alternate model

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    sco08y, thanks very much for the detailed example

    where did you get that?

    you didn't make it up on the spot, you've done this type of 6nf design before, right?

    again, thanks for taking the time
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Jan 2005
    Location
    uk
    Posts
    4
    wow, I nearly followed all that but my heads hurting! I'm going to have to redigest the theory a few times

    certus - that alternate model is a bit more similar to the orginal concept I had but I don't know if it is any better. It has (a lot) more flexibility in terms of defining a role in a relationship but will need many more entries in the relationships table to describe just a simple family structure. 4 kids + 2 parents is 4 entries in Parents and 1 in Marriages, but is 5+4+3+2 = 14 in Relationships. I can't think of family relationships that really demand the extra utility offered by the second model.

  12. #12
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    what about the case where mother died in childbirth father remarried had more children with new mother, in some cases the new mother may have been a relative of the old mother (eg a sister). Perhaps unlikely (except in the boondocks) in a modern environment but certainly happend in earlier generations. It can also add complexity if the parents are cousins.

    It was not that long ago (1990's) that in the UK a couple married, had children, then split up, the father went to live with his ex wifes mother, married her and had children. So the children of the first marriage had step brothers & sisters who were also their uncles & aunts. Way to go Chav's

  13. #13
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    IN case you are tired of thinking about the design....

    http://users.cis.net/sammy/grandpa.htm

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    so is MrCrowley your cousin?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    The reason I had suggested that there be a separate table for divorces is the same as that for suggesting a parents table: each seems to be a distinct concept separate from the entity it was included in.

    The fact that this results in a design with fewer nulls is just incidental, and not the main objective of the suggestion.
    While I do like to design my tables with as few nullable columns as possible, I am not paranoid about it.

    If I were to design a model without any null columns, then our users, especially the "superior" object-oriented crowd, would find it impossible to understand it.

    Ravi

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •