| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

11-14-11, 10:51
|
|
Registered User
|
|
Join Date: Nov 2011
Posts: 2
|
|
|
Normalization Question
|
|
Hello all, this is entirely a theory question (relating to my database course), and not a practical application question.
I'm having a little confusion about 3rd Normal Form, and have been reading over these forums trying to find an answer.
I have a database table called Department that has fields:
Department Code (PK)
Dept Name
Room #
Phone #
College
The textbook i am using says that this table is in third normal form, as every key is functionally dependent on the PK. The thing that is nagging me, however, is that Colleges can have multiple departments, so there can be redundant information in the database. If for example we had:
M&IS, 302, 555-1212, College of Business
MQABL, 302, 555-1213, College of Business
And the name of the college of business changed, we'd have to change it on multiple rows.
However I can't decide if creating a table that is just "PK, College Name", and then changing department to be Dept Name, Phone, Room, College PK is 'better' or not.
Is this not something we worry about in the normalization process, since it is a single column in the table? Is this something that is handled by 4th or 5th normal form? If a student asks me "Why dont we break out College into its own table as part of normalization", what do i tell them?
Thanks for any guidance you can give me.....
|
|

11-14-11, 13:41
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
Quote:
Originally Posted by scanfield
However I can't decide if creating a table that is just "PK, College Name", and then changing department to be Dept Name, Phone, Room, College PK is 'better' or not.
|
this is a very astute question and really shows that you are thinking and absorbing the concepts
there is absolutely nothing in normalization theory that says you should replace the college name with a surrogate key
if you do, then college_fk (not college_pk) would be the column you use in the departments table
but....
if you don't replace the college name with a surrogate key, then the college name would be the foreign key, and yes, you should probably have a colleges table, where the primary key is the college name
relationships do not require surrogate keys, natural keys work just fine
and since you brought it up, this business about "redundant" information is a total red herring
sure, as it stands now, the college name is "redundant" i.e. it repeats multiple times in the departments table, simply due to the nature of the one-to-many relationship
but guess what, if you replace it with a surrogate key, then the surrogate key will repeat exactly the same number of times!!! no redundancy is eliminated!!!
this last point is inexplicably difficult for some people to grasp, so i hope i have explained it well enough...
|
|

11-14-11, 14:19
|
|
Registered User
|
|
Join Date: May 2008
Posts: 270
|
|
|
|
While r937's explanation is correct, this is also true:
Quote:
Originally Posted by scanfield
And the name of the college of business changed, we'd have to change it on multiple rows.
|
Now most (all?) databases out there have mechanisms in place to handle this update automatically, so this isn't necessarily a problem, but the fact remains that an update IS necessary. From a purely theoretical point of view, this update is unavoidable. But this is why many people end up using surrogate keys, to prevent cascading updates if the college name changes. Note that nothing prevents you from changing the surrogate key either, which would similarly cause a cascading update. However, the point of the surrogate key is to create a meaningless identifier that will always remain constant.
Now, it's a very pedantic, philosophical, contentious debate about whether one way is or more or less "correct" than the other. Practical considerations will generally lead to using both. But, if you are teaching this, please, please, please, PLEASE drive home the fact that:
1. The definition of "surrogate" is "a substitute for the real thing." Consequently:
2. A surrogate key is a substitute, NOT a replacement, for the natural key. Consequently:
3. Even if you use a surrogate key, you must still, and always, always, ALWAYS enforce the natural key.
So the college table will require a unique (and not null) constraint on the college name, regardless of whether or not a surrogate key is used.
It might also help to think of it this way:
- A natural key establishes an entity's existence and its identity.
- A surrogate key only establishes existence.
- In some cases, an entity's identity could change during its existence.
- If so, the natural key will change, the surrogate key won't (presumably).
- Although the surrogate key might add stability, it is nonetheless completely redundant.
|
Last edited by futurity; 11-14-11 at 14:31.
|

11-14-11, 14:54
|
|
Registered User
|
|
Join Date: Nov 2011
Posts: 2
|
|
Thanks for the information guys, it is very helpful. This is an example in the chapter about normalization, so i wanted to be prepared to understand the topic a little more fully.
I had made the statement (in class) previously that "3rd Normal Form" solves "most" update anomalies. But then in this example i spotted that anomaly and it got me wondering.
|
|

11-14-11, 15:49
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
yeah, but this update is not an update "anomaly" -- it's just an update
|
|

11-14-11, 20:42
|
|
Registered User
|
|
Join Date: May 2008
Posts: 270
|
|
Another point to emphasize, and as r937 has already mentioned, surrogate keys really have nothing to do with normalization, or even relational theory or data modeling in general. They are a solution to a set of specific problems that occur during the physical design (i.e., implementation) phase. Ideally, in an introductory theory class, the notion of surrogate and natural keys wouldn't even be mentioned. However, given how ubiquitous yet misunderstood they are, it seems that most people just start off defining a bunch of "id" columns without even attempting to identify the natural key. Consequently, it may be worth taking some time to explain exactly what they are and how to properly use them.
That's my 2 cents, anyway. 
|
|

11-14-11, 21:00
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
excellent synopsis, futurity, of an all too common situation
i've been on database forums for many years, and have seen thousands and thousands of threads, and right up there in the top ten of most frequently asked questions is the one that goes something like this...
Halp!!! How do I remove all but one of the duplicates in my table?
based on our discussion, scanfield, i'm sure you can see now how and why this happens...
i know there are people out there advocating that you should put an autonumber id on every table, and so help me, if i ever meet one of these guys in real life, i'm gonna punch him right in the nose
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|