| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |
|

02-01-05, 13:24
|
|
Registered User
|
|
Join Date: Jan 2005
Location: uk
Posts: 4
|
|
|
relationships between individuals in a family tree (was "simple design question")
|
|
Hi, I am working on designing a website/database for (my families) family tree project. I will probably be using mySQL and PHP, I have some limited experience of setting up a membership and content management type database but I'm no expert.
I haven't actually got all the data yet (the research is an ongoing labour of love by my uncle) but it's a fairly complex web. Of course the actual size is pretty small overall in relation to most databases, just a few hundred records, but its large enough that there is sense in making the site data-driven.
I've been having some problems with what I think should be very simple! That is namely how best to store the relationships between individuals.
This is what I've got so far, parent-child and sibling-sibling realtionships can be deduced by reference to just the People table.
(and what are the parentone_ID/parenttwo_ID keys called? are they foreign keys or maybe 'recursive' keys??)
Code:
People
---------------------
person_ID INT (PK)
parentone_ID INT ('FK' - personID)
parenttwo_ID INT ('FK' - personID)
first_name VARCHAR
middle_names VARCHAR
last_name VARCHAR
maiden_name VARCHAR
nick_names VARCHAR
date_birth VARCHAR
place_birth VARCHAR
date_death VARCHAR
place_death VARCHAR
mediapath VARCHAR *relative url to folder with various media/docs.
The Marriages table seperately records marriages.
Code:
Marriages
----------------
marriage_ID INT (PK)
spouseone_ID INT (FK - personID)
spousetwo_ID INT (FK - personID)
place_marriage VARCHAR
date_marriage VARCHAR
date_divorce VARCHAR
mediapath VARCHAR
Is this a robust design?
I'm not sure if it's 'correct' to have so many variable length fields, my instinct is that if you have one you may as well have many (holds true in file I/O anyway) - but maybe I should fix the field lengths. What's the accepted wisdom in storage vs speed? I really have no idea.
I was also considering scrapping the parentone_ID and parenttwo_ID fields in the People table and having a 'Relationships' table to record all parent-child and sibling-sibling realtionships. I guess it would look up quicker but feels a lot less elegant.
Any comments are welcome 
|
|

02-01-05, 16:02
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Toronto, Canada
Posts: 249
|
|
Some suggestions.
Remove the parentone_Id and parenttwo_id from the person table. Create a table called parents and have columns called father_id and mother_id along with person_id in there. The PK (primary key) will consist of the three columns.
In the person table, the PK will simply be the person_id.
I'd recommend not having the surrogate key marriage_id in the table marriages. Instead let the two columns, husband_id and wife_id, be the PK (assuming there are no same-sex marriages).
Since divorces are less common than marriages, I'd suggest creating a separate table for divorces.
Hope that was useful.
Ravi
|
|

02-01-05, 16:11
|
|
Registered User
|
|
Join Date: Aug 2004
Posts: 330
|
|
|
|
Quote:
|
Originally Posted by rajiravi
I'd recommend not having the surrogate key marriage_id in the table marriages. Instead let the two columns, husband_id and wife_id, be the PK (assuming there are no same-sex marriages).
Ravi
|
Having the husband_id, wife_id as the PK would not allow for re-marriages.
|
|

02-01-05, 20:42
|
|
Registered User
|
|
Join Date: Jan 2005
Location: uk
Posts: 4
|
|
Thanks for the replies. I like the idea of a Parents table replacing the two fields in Persons. When you say to use the 3 columns in Parents as the PK how does that work in actual useage?
Does it just mean if I wanted to update a record in Parents I would use syntax like;
UPDATE Parents SET field=value WHERE father_id='prev_fid' AND mother_id='prev_mid' AND person_id='prev_pid' ?
I'm think I'm going to leave the marriageID PK in Marriages just in case there is a re-marriage somewhere... (there would appear to be some bigamy already so I wouldn't rule anything out.) By making the columns be spouseone_id and spousetwo_id I think I can sidestep any issues with same-sex marriages.
Code:
People
---------------------
person_ID INT (PK)
first_name VARCHAR
middle_names VARCHAR
last_name VARCHAR
maiden_name VARCHAR
nick_names VARCHAR
date_birth VARCHAR
place_birth VARCHAR
date_death VARCHAR
place_death VARCHAR
mediapath VARCHAR
Parents
--------------
parentone_ID INT (FK - personID)
parenttwo_ID INT (FK - personID)
child_ID INT (FK - personID)
Marriages
----------------
marriage_ID INT (PK)
spouseone_ID INT (FK - personID)
spousetwo_ID INT (FK - personID)
place_marriage VARCHAR
date_marriage VARCHAR
mediapath VARCHAR
Divorces
----------------
marriage_ID INT (PK)
date_divorce VARCHAR
Can I get away with just the two columns in Divorces? It somehow feels wrong to use a tables PK as the PK in a second table, but it seems to work....
|
|

02-01-05, 21:08
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
Quote:
|
Originally Posted by mifune
Can I get away with just the two columns in Divorces? It somehow feels wrong to use a tables PK as the PK in a second table, but it seems to work....
|
yes, you can get away with it, and no, it's not wrong, in fact, there are some modellers who will insist that the divorce date must be in a separate table
the reasoning they give is that with a separate table, any marriage which has no divorce will have no row in the divorce table, thereby avoiding the thorny (to them, anyway) problem of having the divorce date in the marriage table have to be null for those marriages which have no divorce
"the failure of my marriage was divorce experience of my life" (badump-bump tshhh)
|
|

02-01-05, 21:33
|
|
Registered User
|
|
Join Date: Jan 2005
Location: uk
Posts: 4
|
|
hmm, I certainly see what your saying. How far should you go though? If I applied that logic rigorously to the People table then couldn't I seperate most columns out into their own tables, as nearly all of them will have null entries in some records.
Wouldn't that make the query syntax rather cumbersome though?
|
|

02-01-05, 23:02
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
you have grasped the essential problem
followed to its logical extreme, you'd have each attribute in its own table, all so that you wouldn't have to store a null anywhere (this is known as sixth normal form)
and yes, you'd have to re-assemble the data with humungous left outer joins, which would (surprise!) require nulls to represent the missing data
some data modellers (and i hasten to say i'm not one of them) don't care, though, their focus is on the abhorrence of three-valued logic, and they relegate unimportant matters (such as efficiency) to the database engine, and if the database engine isn't up to the task, then obviously it's a deficient engine
for a most interesting document about this, see How to Handle Missing Information without Using Nulls (PDF)
did i mention i like nulls?
heck, i even go so far as to say that all columns in a table except the primary key could/should be nullable
this has certain advantages, for example, if you need to insert a child record and you don't have a parent row for it, just do an insert into the parent table with the primary key value (everything else null), and voila, relational integrity is preserved
but this is, admittedly, a bit controversial among modellers
|
Last edited by r937; 02-01-05 at 23:08.
|

02-02-05, 01:10
|
|
Registered User
|
|
Join Date: Oct 2002
Location: Baghdad, Iraq
Posts: 697
|
|
Quote:
|
Originally Posted by r937
you have grasped the essential problem
followed to its logical extreme, you'd have each attribute in its own table, all so that you wouldn't have to store a null anywhere (this is known as sixth normal form)
and yes, you'd have to re-assemble the data with humungous left outer joins, which would (surprise!) require nulls to represent the missing data
|
Well, a simple solution (note: simple is not always the same as feasible with current software) relies on the notion that base tables and derived tables (views) are conceptually identical.
Example:
Code:
Table DinnerTrays
SerialNumber INT PK,
BreadType BREAD NULLABLE AS NOBREAD('Doesn't like bread'),
SoupType SOUP NULLABLE AS NOSOUP('No Soup For You!'),
Data:
1 Wheat No Soup For You!
2 Doesn't like bread Vegetable
3 White Chicken
That description contains all the information a DBMS need to create the 6NF tables automatically and then join them back together to get the DinnerTrays table the user requested. The DBMS would generate:
Code:
Table DinnerTrays_SerialNumbers
SerialNumber INT PK
Table DinnerTrays_BreadTypes_BREADS
SerialNumber INT PK FK(DinnerTrays_SerialNumbers),
BreadType BREAD
Table DinnerTrays_BreadTypes_NULLS
SerialNumber INT PK FK(DinnerTrays_SerialNumbers),
BreadType NOBREAD
Table DinnerTrays_SoupTypes_SOUPS
SerialNumber INT PK FK(DinnerTrays_SerialNumbers),
SoupType SOUP
Table DinnerTrays_SoupTypes_NULLS
SerialNumber INT PK FK(DinnerTrays_SerialNumbers),
SoupType NOSOUP
Constraint ISEMPTY(
DinnerTrays_BreadTypes_BREADS PROJECT SerialNumber
INTERSECT
DinnerTrays_BreadTypes_NULLS PROJECT SerialNumber
)
Constraint ISEMPTY(
DinnerTrays_SoupTypes_SOUPS PROJECT SerialNumber
INTERSECT
DinnerTrays_SoupTypes_NULLS PROJECT SerialNumber
)
View DinnerTrays =
DinnerTrays_SerialNumbers
JOIN
DinnerTrays_BreadTypes_BREADS UNION DinnerTrays_BreadTypes_NULLS
JOIN
DinnerTrays_SoupTypes_SOUPS UNION DinnerTrays_SoupTypes_NULLS
Yes, I use the term "null" here... it gets tedious to write out missing data 50 times. You ought to be able to define any sorts of missing information, but you're actually explaining *how* it's missing, e.g. the person doesn't like bread or was rejected soup or even "I don't know!"
No left outer joins here. Those are natural inner joins, which assume that you join on identically named attributes.
Here are some of the messy details of handling types correctly. There is a BREAD type (an enumerated type with values like wheat, white, rye, pumpernickle, etc) and a NOBREAD type (with just "doesn't like bread") and a BREADORNOBREAD type that is the "union" of both types as the Real domain is the union of the domains of Rational and Irrational numbers.
But whereas the domain of Real numbers is still a numeric domain, no operations are defined for BREADORNOBREAD!
That means that to work with this type, you need to either check its value manually or downcast it to either BREAD or NOBREAD (which is pointless since it only has one value) manually. This would be like casting a signed integer type to an unsigned integer type: if it contained a negative value, the DBMS would have to throw an error.
So if you cast BREADORNOBREAD to BREAD and it has a NOBREAD value, an error gets thrown. No nullogical mistakes are going to slip through. There is no three-value logic.
Also, this is true for number types: if you have a nullable Integer column and do arithmetic you must check for null values and handle them correctly because there's no rule that 1 + NULL = NULL.
Quote:
|
some data modellers (and i hasten to say i'm not one of them) don't care, though, their focus is on the abhorrence of three-valued logic, and they relegate unimportant matters (such as efficiency) to the database engine, and if the database engine isn't up to the task, then obviously it's a deficient engine
|
To my thinking, the efficiency issue is the easiest one yet.
You see, all that code that should be generated is the logical schema.
The physical schema would look like:
Code:
Table DinnerTrays
SerialNumber INT PK,
BreadType BREAD NULLABLE AS NOBREAD('Doesn't like bread'),
SoupType SOUP NULLABLE AS NOSOUP('No Soup For You!'),
View DinnerTrays_SerialNumbers = DinnerTrays PROJECT SerialNumber
View DinnerTrays_BreadTypes_BREADS = DinnerTrays PROJECT SerialNumber, Breadtype WHERE BreadType ISA BREAD
View DinnerTrays_BreadTypes_NULLS = DinnerTrays PROJECT SerialNumber, Breadtype WHERE BreadType ISA NOBREAD
View DinnerTrays_SoupTypes_SOUPS = DinnerTrays PROJECT SerialNumber, Souptype WHERE Souptype ISA SOUP
View DinnerTrays_SoupTypes_NULLS = DinnerTrays PROJECT SerialNumber, Souptype WHERE Souptype ISA NOSOUP
All that's been added is a bunch of views representing the 6NF tables, and the actual data is stored in the most natural, efficient form. The query optimizer has hints so that it is guaranteed that the 6NF derived tables behave exactly as base tables.
|
|

02-02-05, 04:34
|
|
Registered User
|
|
Join Date: Dec 2003
Location: Canada
Posts: 710
|
|
|
Genealogy Data Model
|

02-02-05, 05:33
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
sco08y, thanks very much for the detailed example
where did you get that?
you didn't make it up on the spot, you've done this type of 6nf design before, right?
again, thanks for taking the time
|
|

02-02-05, 07:10
|
|
Registered User
|
|
Join Date: Jan 2005
Location: uk
Posts: 4
|
|
wow, I nearly followed all that but my heads hurting! I'm going to have to redigest the theory a few times
certus - that alternate model is a bit more similar to the orginal concept I had but I don't know if it is any better. It has (a lot) more flexibility in terms of defining a role in a relationship but will need many more entries in the relationships table to describe just a simple family structure. 4 kids + 2 parents is 4 entries in Parents and 1 in Marriages, but is 5+4+3+2 = 14 in Relationships. I can't think of family relationships that really demand the extra utility offered by the second model.
|
|

02-02-05, 09:20
|
|
Jaded Developer
|
|
Join Date: Nov 2004
Location: out on a limb
Posts: 9,246
|
|
what about the case where mother died in childbirth father remarried had more children with new mother, in some cases the new mother may have been a relative of the old mother (eg a sister). Perhaps unlikely (except in the boondocks) in a modern environment but certainly happend in earlier generations. It can also add complexity if the parents are cousins.
It was not that long ago (1990's) that in the UK a couple married, had children, then split up, the father went to live with his ex wifes mother, married her and had children. So the children of the first marriage had step brothers & sisters who were also their uncles & aunts. Way to go Chav's
|
|

02-02-05, 11:56
|
|
Wage drone 24601
|
|
Join Date: Jan 2003
Location: Massachusetts
Posts: 4,899
|
|
|
|

02-02-05, 13:37
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
|
|

02-02-05, 15:22
|
|
Registered User
|
|
Join Date: Apr 2004
Location: Toronto, Canada
Posts: 249
|
|
The reason I had suggested that there be a separate table for divorces is the same as that for suggesting a parents table: each seems to be a distinct concept separate from the entity it was included in.
The fact that this results in a design with fewer nulls is just incidental, and not the main objective of the suggestion.
While I do like to design my tables with as few nullable columns as possible, I am not paranoid about it.
If I were to design a model without any null columns, then our users, especially the "superior" object-oriented crowd, would find it impossible to understand it.
Ravi
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|