Results 1 to 5 of 5
  1. #1
    Join Date
    Jan 2011
    Posts
    4

    Unanswered: Multiple foreign keys in parent-child relationships

    Hello,

    I've been kicking around multiple ideas for a while, but wanted to see if there was some consensus from others. I think this design will work, but I want to avoid bloat in the data as well as circular referencing. Here's my scenario:

    We're keeping track of multiple projects which have within each project multiple samples. Eash sample has a number of other data elements (reads) and each of those reads has data describing it (classifications). The idea is to do 1-m from projects -> samples -> reads -> classifications. The problem is that I have many millions of rows at the reads and classifications level and frequently want to know how many classifications are there for projects. Normally, that is a multi-table join (depending on the query), and sometimes can be slow.

    To speed up queries at the read and classification level, I thought about having foreign keys to sample_id, project_id, and read_id and index those in the classifications and/or reads table. This would be to answer questions like "how many classifications of class X and type Y in project Z"

    Question: is this overly redundant/bad practice to refer to multiple parents in foreign keys?

    Interested in your thoughts.

    Thanks,
    Chris

  2. #2
    Join Date
    May 2008
    Posts
    277
    Quote Originally Posted by fried1 View Post
    Question: is this overly redundant/bad practice to refer to multiple parents in foreign keys?
    No, but the correct answer depends on your design requirements.

    If these relationships are not transferrable -- ie, a sample cannot move from one project to another -- then the foreign key of the parent table should be incorporated into the primary key of the child table, giving you a nice hierarchical structure:

    project -> *project_id, ...
    sample -> *project_id, *sample_id, ...
    read -> *project_id, *sample_id, *read_id, ...

    However, if the relationship IS transferrable, then unfortunately you're stuck with your current design, unless you decide to denormalize your tables.

  3. #3
    Join Date
    Jan 2011
    Posts
    4
    Thanks futurity! I was maintaining columns with the FK to parent, grandparent, etc, but not actually making them PKs. Great suggestion.

    The only big other change is that I'm using the serial type to generate id's for projects, samples, reads, etc for PK and then putting unique constraints on the names, but if I'm not needing to join all over the place to get reads/classifications from projects, reads from samples, etc then that is no longer necessary, I guess. I think this makes things simpler and will work much better.

    Chris

  4. #4
    Join Date
    May 2008
    Posts
    277
    Quote Originally Posted by fried1 View Post
    The only big other change is that I'm using the serial type to generate id's for projects, samples, reads, etc for PK and then putting unique constraints on the names, but if I'm not needing to join all over the place to get reads/classifications from projects, reads from samples, etc then that is no longer necessary, I guess.
    Whether or not you should use surrogate keys is a debate I won't get into, but regardless, you must always identify and enforce the natural key to prevent duplicate rows from getting into your database. As a result, if you use surrogate keys, many of your tables will look something like this:

    Code:
    create table sample (
        project_id int references project,
        sample_id serial,
        primary key (project_id, sample_id),
    
        sample_name text not null,
        unique (project_id, sample_name), -- natural key
        ...
    )
    Actually, you probably want to create the index on the lower (or upper) case of the name, otherwise 'Foo' and 'FOo' will be seen as unique values:

    Code:
    create unique index sample_nk on sample (project_id, lower(sample_name))

  5. #5
    Join Date
    Jan 2011
    Posts
    4
    Good idea on the index case; I hadn't thought of that. As for the unique constraint on the natural key containing both the parent_id and the name, I get your point, but don't really need that in my case since I do not allow duplicate names across samples, reads, etc. It's a good check for me, too, because they are identifiers in other systems (that I don't manage) that are independent of the "project" that I'm creating.

    Chris

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •