Thread: Surrogate Key Mania
12-05-03, 18:21 #1Registered User
- Join Date
- Dec 2003
- Essex, UK
Surrogate Key Mania
I am interested in peoples opinions on surrogate keys.
In the past I didn't really see the point of using them since I could nearly always identify natural keys for my tables. Sometimes the natural keys were composites of 2 or 3 fields.
For example, to pinpoint a location on earth you need 3 things.
3. Height above sea level.
These three fields form a natural key to identify a location.
So we can have a table called "Location" with those 3 fields as a composite primary key. Non-key fields may include, Location name, description etc.
Now imagine we have a temperature table which stores the avergage temperature for each day for a location
The primary key of the Temperature table includes the composite key of the Location table and also adds the date.
This is the sort of thing I've done in applications in the past.
Now people are telling me that this sort of thing is not normalized and I should be using a LocationID surrogate on the location table etc.
Well i've studied the Normal forms and I don't see that surrogate keys have anything to do with normalization at all. Just because you have a composite key showing up in a child table doesnt mean things are not normalized. (Am I wrong?)
However I can certainly see the benefits of a surrogate key. If we have a location ID as the primary key in the Location table we can also use it in the Temperature table and it simplifies the relationship (and probably makes indexes smaller and faster). It also isolates child tables from changes in the location. For example, imagine that the longitude had to be corrected for a location. Without the surrogate key we would have to update the longitude on all the child tables. Business data is often subject to change so composite keys are not usually as immutable as you'd like.
One disturbing thing I have noticed at work though is that when people use a surrogate key they seem to think that is all they need to guarantee uniqueness.
"I've added an Autonumber as the key, so now every record is unique!".
Well, yes every Location ID is unique, but what about the real data!
If we make LocationID the surrogate key on the location table, we still have to ensure the uniqueness of the records. Even though Longitude,Latitude and Height are no longer the primary key we must ensure that we cannot have multiple records with the same Longitude
Latitude and Height. i.e. These three fields together need a unique index.
Time and time again people seem to think they no longer have to worry about uniqueness once they have created a surrogate key.
Has anyone else noticed this Surrogate key confusion?
(Or have I just got everything totally wrong here?)
12-08-03, 06:54 #2Moderator.
- Join Date
- Sep 2002
Re: Surrogate Key Mania
No, you haven't got it wrong. Surrogate keys are a convenience for avoiding various issues, some of which you have mentioned:
1) Natural keys sometimes need to be updated (corrected) and that change has to propagate to dependent tables if the natural key is used as the primary key.
2) Natural keys may be cumbersome to use in foreign keys, e.g. many columns, long text columns etc.
3) Some data doesn't have a natural key that is 100% unique - people being the obvious example.
Anyone who says natural keys are not normalised is wrong, though they may be right in suggesting that a surrogate key is a better choice in a given situation. However, as you have said, wherever a surrogate is used then unique constraints must also be used to enforce any natural keys that exist, and this tends to get overlooked by those who fit a surrogate to everything without thinking.Tony Andrews
12-08-03, 12:02 #3Registered User
- Join Date
- Sep 2003
- The extremely Royal borough of Kensington, London
"1) Natural keys sometimes need to be updated (corrected) and that change has to propagate to dependent tables if the natural key is used as the primary key."
Just wondering in what way this could be considered an issue? Unless the table was exposed to high volume updates.Bessie Braddock: Winston, you are drunk!
Churchill: And Madam, you are ugly. And tomorrow, I'll be sober, and you will still be ugly.
12-08-03, 12:45 #4Moderator.
- Join Date
- Sep 2002
It is a technical issue if your DBMS does not support "update cascade", which Oracle for example doesn't, or if your FK constraints are not deferred until commit (which Oracle does allow in later versions).
You need to update the parent PK and all the child FKs together because:
a) you can't update the parent first, otherwise the children's FKs will be violated temporarily
b) you can't update the children first, ditto.
Oracle guru Tom Kyte has published a fairly complex workaround for this involving triggers here:
One can also imagine other potential issues, such as:
- PK is referenced by millions of child records in various tables, and performing this update will be time/resource-intensive and undesirable
- historic dependent data has been archived and if restored would not correspond to any parent recordTony Andrews
12-08-03, 13:03 #5SQL Consultant
- Join Date
- Apr 2002
- Toronto, Canada
mrsteve, one advantage of using the natural keys as in your example of temperatures is that you can run queries against that table to filter by values on any of the parts of the key, e.g. average temps in the tropics (filter by latitude), without having to do a join to the location table, which the use of a surrogate key would force upon you
further, you can often make a decision about whether the natural key realistically has a chance of ever changing -- for example, lat/long likely won't
finally, i'd like to second the remark that "Anyone who says natural keys are not normalised is wrong"
as well, the remark "wherever a surrogate is used then unique constraints must also be used to enforce any natural keys that exist, and this tends to get overlooked by those who fit a surrogate to everything without thinking" bears repeating
i've seen far too many needless surrogate keys
for example, in a many-to-many relationship table:
foo (id PK, fooname)
bar (id PK, barname)
foobar (id PK, fooid FK, barid FK)
the composite (fooid,barid) needs a unique constraint, which is conveniently provided if the composite were made the PK instead of the useless surrogate