Results 1 to 13 of 13

Thread: Keys!!??

  1. #1
    Join Date
    Dec 2003
    Posts
    3

    Keys!!??

    Hi.

    I am fairly new to relational databases. I am having a bit of difficulty fully understanding the proper use of keys (primary, foreign) in relational databases. I've read several articles on DB design, but I'm having some doubts about my understanding. What I do understand is that the primary key makes records unique and foreign keys are used to relate to primary keys in other tables.

    My question is, can one or more foreign key(s) in a table be considered the primary key (without a separate id for the table)?

    Let me illustrate...

    I have a table called PERSON with PID, First Name, Last Name, Country (PK = PID).
    I have a table called SYMPTOM with SID, Name (PK = SID).

    I have a table called HISTORY with PID, SID, Date which represents the date each person developed a symptom. To make the records for each record unique, the Primary Key (PK) should be PID,SID (assumption is that each symptom can only develop once, thus one date, and each person can have more than one symptom - more than one person can have the same symptom). PID and SID are both foreign keys to different tables - can they be the primary keys or must I create another key for this table? i.e., HID, PID, SID, Date (PK = HID), where HID is a unique number for this table and it would then be the primary key and PID and SID would be the foreign keys?

    Can either way be used? If I can use PID/SID as the PK and avoid creating a unique key HID, my table would be smaller. Is this a matter of pros and cons of using one over the other? If so, what are they?

    Thanks very much!

  2. #2
    Join Date
    Sep 2003
    Location
    Virginia, USA
    Posts
    246

    Re: Keys!!??

    Yes, that can happen. However, many projects will implement dummy keys in all tables and would create a surrogate key using the two foreign keys you mentioned.

    Some RDBMS vendors, like Oracle, teach that every table should have a primary key that has 5 characteristics:
    1. unique
    2. nerver null
    3. short
    4. numeric
    5. meaningless

    The first two are obvious. The next two are for performance and memory tuning issues. The fifth characteristic confuses a lot of people, but I have found in my 10 years as a designer/dba that it's very important.

    an example of a really bad primary key is the American Social Security Number.
    1. it's not unique - due to the original way numbers were issued, there were up to 5 people with the same number. Even today, the numbers of deceased people get reissued.
    2. there are 10's of millions of people in the USA without one.
    3. it is short.
    4. it is numeric.
    5. it is not meanlingless. when I got married and my wife took my last name and she got a new SSN. her employer's database used SSN as a primary key, and they could not change the data (easily), so her taxes were going to be paid incorrectly. They finally spent a few hours manually updating the database.
    MarkRem
    Author, Oracle Database 10g: From Nuts to Soup
    http://www.remidata.com/book_nuts2soup.htm

  3. #3
    Join Date
    Dec 2003
    Posts
    3

    Re: Keys!!??

    Thanks for you reply, Mark.

    Hmmm, I think just using the foreign key combinationm PID, SID as my Primary key in my History table satisfies all the criteria you mentioned.

    They make the table records unique, they're non-NULL, they're numeric and are meaningless (just sequential numbers) - they're not as short as one numeric ID, but are only two numbers - is this considered short?

    This also leads to my next question...

    My example was simple. If I have something more complex and end up with a relation table containing 4 foreign keys which together make the table unique, each is a non-null number and meaningless, does it make sense to use the 4 as the Primary Key? At what point does efficiency, space and query complexity begin to degrade when combining multiple keys to form a Primary Key?

    Thanks!

  4. #4
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Composite Keys composed of foreign surrogate keys are acceptable, however dealing with large composite keys could call for a new surrogate key just for coding manageability. Composite keys in my experience do not cause performance hits but they can begin to take up quite a few columns in their children.

    Personally, I tend to use surrogate keys for all my tables and do not include foreign keys as part of the primary keys. There are a few design cases where composite keys are a must, but they are few.

  5. #5
    Join Date
    Sep 2003
    Location
    Virginia, USA
    Posts
    246

    Re: Keys!!??

    Personally, I would create a surrogate (dummy) primary key, and I would also create a unique index on the combined foreign keys (i.e., unique on fk1 + fk2 + ....) I would not find fault in anyone who did not create the surrogate in a simple join table.

    I remember looking at a Financials application about 6 or 7 years ago. There were many tables where the primary key was composite of 8 columns, and this composite was the foreign key in many other tables. It made coding queries a chore, because in every WHERE clause you had to join tables using all 8 columns.
    MarkRem
    Author, Oracle Database 10g: From Nuts to Soup
    http://www.remidata.com/book_nuts2soup.htm

  6. #6
    Join Date
    Oct 2003
    Posts
    706

    Re: Keys!!??

    Originally posted by amigo
    ... Hmmm, I think just using the foreign key combinationm PID, SID as my Primary key in my History table satisfies all the criteria you mentioned. ...
    No, there is one important difference: PID and SID are numbers known to humans. For this reason they should not be primary-keys in the tables.

    It may well be that the combination {PID,SID} should be unique, in which case you can define a rule or an index to enforce that restriction. That's fine... but that does not make them good candidates for primary-keys.

    The PID and SID values "mean something to someone besides the computer." Therefore, they are subject to change without notice. A salesman somewhere could decide that to win a $10 zillion account the PID should be changed to the customer's favorite color... and zingo! it becomes an application requirement and can't you have it ready by tomorrow?

    (Show of hands? Has this happened to you? Thank you. ... Amigo, I rest my case...)

    Primary-keys should mean absolutely nothing. Their only purpose should be: to uniquely identify the row in the table in the database of the computer.
    ChimneySweep(R): fast, automatic
    table repair at a click of the
    mouse! http://www.sundialservices.com

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Sundial, such blanket statements are irresponsible.

    The idea that primary keys should be meaningfull only to the database is just one philosphy of database design. It is one that I normally agree with, but for many small applications it is not necessarily required, and therefore not worth the coding overhead.

    So what if the PID needs to be changed to the client's favorite color? In a large database this might cause problems for this to occur frequently, but in a small-midsize database it would be inconsequential.

    The advantage of using PID as a primary key propogated through sub-tables is that you don't need to join to the parent table to look up the PID if it is already the foreign key in the subtable.

    The disadvantage is that the use of "Natural" keys such as Employee IDs leads to composite keys that can muck up relational integrity and require confusing join clauses.

    There are, of course, other benefits and costs to both strategies, but the point is that the developer needs to weigh both options and then decide on the best implementation.


    blindman

  8. #8
    Join Date
    Oct 2003
    Posts
    706
    Originally posted by blindman
    Sundial, such blanket statements are irresponsible.

    The idea that primary keys should be meaningfull only to the database is just one philosphy of database design. It is one that I normally agree with, but for many small applications it is not necessarily required, and therefore not worth the coding overhead. [...]

    blindman
    You stand by your opinion, I'll stand by mine. It's good to hear differing views and I don't call that "irresponsible" at all. That's what a forum's for, eh?

    My views here are primarily shaped by having spent a year and a half profitably (to me, anyway) re-writing a major app for an insurance provider who had linked everything in the world using a "Provider ID" that was not only full of embedded information but incompatible with the format used by a company they acquired. Suddenly those IDs were ambiguous (causing dupes) and inconsistent (causing data to drop out of sight). That was a nice, $150,000 problem. Nice for me because I got the grands. Not nice for the client because with a better-designed database the money would not have needed to have been spent.

    They acquired two more companies after that, and then were acquired, and never had to re-redesign the database. Thanks to me. But the anonymous person who made the original decision probably saw it as "a small database, an okay way to do this that works now, nobody outside of this department's gonna ever see this anyway ..." It was a sensible, defensible decision at the time, but it proved to be a short-sighted one.

    The gist of my redesign was to introduce a truly-private primary key and to spin-off the provider-ID resolution to a separate table. And yes, it took eighteen months to do it.
    ChimneySweep(R): fast, automatic
    table repair at a click of the
    mouse! http://www.sundialservices.com

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    natural versus surrogate... the debate will never end

    all blanket statements are bad, including this one

    blindman, i mainly agree with your opinions, in this thread as well as many others, but one thing you said is really weird:
    the use of "Natural" keys such as Employee IDs leads to composite keys that can muck up relational integrity and require confusing join clauses.
    either you didn't explain this point very well, or else i'm gonna have to disagree strongly

    natural keys cannot possibly muck up relational integrity, and as far as confusing joins are concerned, i'm sorry, i just don't understand how you could think this

    let me also make a comment or two on some other points in this thread

    beware of just declaring a surrogate out of habit

    it's a bad habit to get into, especially if you think that composite keys are something to be avoided

    there is an advantage to using composite foreign keys, and that is that you can do simple joins with lookup tables, without having to go "up the chain" joining to parent tables, grandparent tables, etc., just to figure out which child you actually are

    e.g.

    division -> department -> section -> employee

    count the number of employees in the XYZ division

    if the department has its own surrogate key, and carries the division foreign key as an attribute (cf. ERwin makes the distinction between identifying and non-identifying relationships), and if section has its own surrogate key, and carries the department foreign key -- which is a surrogate, recall -- as an attribute, and if employee carries the section foreign key -- which is a also surrogate, recall -- as an attribute, then you must join employee to section to department to division in order to determine which employees belong to XYZ division

    on the other hand, if division foreign key is propagated down into employee, those joins are not necessary, and even if you wanted to find the name of the XYZ division, you could join employee to division to get it, and not go through section or department

    sundial, your comment that primary keys should mean nothing, and that their only purpose is to uniquely identify a row, is a Post Hoc Ergo Propter Hoc fallacy

    yes, the primary key should uniquely identify a row

    no, this does not imply that it may not have meaning

    of course it may have meaning

    yes, i know, i know, you guys are gonna jump all over me and say that if the primary key has meaning, then it can change, and if it can change, then your app will get into trouble

    well, duh

    that's what ON UPDATE CASCADE is for
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  10. #10
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Heh. I liked that close, rudy.

  11. #11
    Join Date
    Dec 2003
    Posts
    3

    Re: Keys!!??

    Originally posted by sundialsvcs
    No, there is one important difference: PID and SID are numbers known to humans. For this reason they should not be primary-keys in the tables.

    It may well be that the combination {PID,SID} should be unique, in which case you can define a rule or an index to enforce that restriction. That's fine... but that does not make them good candidates for primary-keys.

    The PID and SID values "mean something to someone besides the computer." Therefore, they are subject to change without notice.
    Well, either I didn't explain fully or clearly, or I don't quite understand you...

    The PID and SID are meant to be auto-increment numbers (as far as I understand it, they therefore don't mean anything and are not known to humans) - they are not anything like a social security number, for example. Assuming this, was my assumption correct that they would be good candidates for the primary key in the History table? In this scenario, are they considered surrogate keys? I haven't read this term being used in any of the articles I read.

    Just remember ... I'm a bit new at this. Thanks.

  12. #12
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    r937, here's what I meant by natural keys sometimes mucking up relation integrity:

    In a complex database with multiple relational layers and multiple relationships between tables, each natural key is included in the natural key of it's child tables. Before I knew better I once tried to use natural keys in a schema that had six distinct relational layers. The lowest layer ended up with six columns as a composite natural key. That made the joins hairy. Then I realized that data in the upper table might have more than one relational path to data in the lower tables. That caused cascading updates to fail, because it was trying to update records in the lower tables twice. From then on, I've tended toward surrogate keys except in simple schemas.

    blindman

  13. #13
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    amigo:

    Yes, your SIDs and PIDs are surrogate keys.

    Yes, they are probably good candidates for a composite key in the History table. If there are child table build off of the History table (unlikely) then you should consider creating another surrogate key in the history table so that the lower tables don't end up using the SIDs, PIDs, KIDs, NIDs, VIDs, ZIDs, and whatever else as mongo composite keys.

    That's just my rule of thumb, of course. This is the sort of judgement that can only be made with experience, so create your schema and get some!

    blindman

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •