Results 1 to 7 of 7
  1. #1
    Join Date
    Nov 2011
    Posts
    2

    Normalization Question

    Hello all, this is entirely a theory question (relating to my database course), and not a practical application question.

    I'm having a little confusion about 3rd Normal Form, and have been reading over these forums trying to find an answer.

    I have a database table called Department that has fields:
    Department Code (PK)
    Dept Name
    Room #
    Phone #
    College

    The textbook i am using says that this table is in third normal form, as every key is functionally dependent on the PK. The thing that is nagging me, however, is that Colleges can have multiple departments, so there can be redundant information in the database. If for example we had:

    M&IS, 302, 555-1212, College of Business
    MQABL, 302, 555-1213, College of Business

    And the name of the college of business changed, we'd have to change it on multiple rows.

    However I can't decide if creating a table that is just "PK, College Name", and then changing department to be Dept Name, Phone, Room, College PK is 'better' or not.

    Is this not something we worry about in the normalization process, since it is a single column in the table? Is this something that is handled by 4th or 5th normal form? If a student asks me "Why dont we break out College into its own table as part of normalization", what do i tell them?

    Thanks for any guidance you can give me.....

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by scanfield View Post
    However I can't decide if creating a table that is just "PK, College Name", and then changing department to be Dept Name, Phone, Room, College PK is 'better' or not.
    this is a very astute question and really shows that you are thinking and absorbing the concepts

    there is absolutely nothing in normalization theory that says you should replace the college name with a surrogate key

    if you do, then college_fk (not college_pk) would be the column you use in the departments table

    but....

    if you don't replace the college name with a surrogate key, then the college name would be the foreign key, and yes, you should probably have a colleges table, where the primary key is the college name

    relationships do not require surrogate keys, natural keys work just fine

    and since you brought it up, this business about "redundant" information is a total red herring

    sure, as it stands now, the college name is "redundant" i.e. it repeats multiple times in the departments table, simply due to the nature of the one-to-many relationship

    but guess what, if you replace it with a surrogate key, then the surrogate key will repeat exactly the same number of times!!! no redundancy is eliminated!!!

    this last point is inexplicably difficult for some people to grasp, so i hope i have explained it well enough...
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    May 2008
    Posts
    277
    While r937's explanation is correct, this is also true:

    Quote Originally Posted by scanfield View Post
    And the name of the college of business changed, we'd have to change it on multiple rows.
    Now most (all?) databases out there have mechanisms in place to handle this update automatically, so this isn't necessarily a problem, but the fact remains that an update IS necessary. From a purely theoretical point of view, this update is unavoidable. But this is why many people end up using surrogate keys, to prevent cascading updates if the college name changes. Note that nothing prevents you from changing the surrogate key either, which would similarly cause a cascading update. However, the point of the surrogate key is to create a meaningless identifier that will always remain constant.

    Now, it's a very pedantic, philosophical, contentious debate about whether one way is or more or less "correct" than the other. Practical considerations will generally lead to using both. But, if you are teaching this, please, please, please, PLEASE drive home the fact that:
    1. The definition of "surrogate" is "a substitute for the real thing." Consequently:
    2. A surrogate key is a substitute, NOT a replacement, for the natural key. Consequently:
    3. Even if you use a surrogate key, you must still, and always, always, ALWAYS enforce the natural key.

    So the college table will require a unique (and not null) constraint on the college name, regardless of whether or not a surrogate key is used.

    It might also help to think of it this way:
    - A natural key establishes an entity's existence and its identity.
    - A surrogate key only establishes existence.
    - In some cases, an entity's identity could change during its existence.
    - If so, the natural key will change, the surrogate key won't (presumably).
    - Although the surrogate key might add stability, it is nonetheless completely redundant.
    Last edited by futurity; 11-14-11 at 15:31.

  4. #4
    Join Date
    Nov 2011
    Posts
    2
    Thanks for the information guys, it is very helpful. This is an example in the chapter about normalization, so i wanted to be prepared to understand the topic a little more fully.

    I had made the statement (in class) previously that "3rd Normal Form" solves "most" update anomalies. But then in this example i spotted that anomaly and it got me wondering.

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    yeah, but this update is not an update "anomaly" -- it's just an update
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  6. #6
    Join Date
    May 2008
    Posts
    277
    Another point to emphasize, and as r937 has already mentioned, surrogate keys really have nothing to do with normalization, or even relational theory or data modeling in general. They are a solution to a set of specific problems that occur during the physical design (i.e., implementation) phase. Ideally, in an introductory theory class, the notion of surrogate and natural keys wouldn't even be mentioned. However, given how ubiquitous yet misunderstood they are, it seems that most people just start off defining a bunch of "id" columns without even attempting to identify the natural key. Consequently, it may be worth taking some time to explain exactly what they are and how to properly use them.

    That's my 2 cents, anyway.

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    excellent synopsis, futurity, of an all too common situation

    i've been on database forums for many years, and have seen thousands and thousands of threads, and right up there in the top ten of most frequently asked questions is the one that goes something like this...

    Halp!!! How do I remove all but one of the duplicates in my table?

    based on our discussion, scanfield, i'm sure you can see now how and why this happens...

    i know there are people out there advocating that you should put an autonumber id on every table, and so help me, if i ever meet one of these guys in real life, i'm gonna punch him right in the nose
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •