If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Lookup tables (aka domain/reference/code tables) surrogate vs natural keys?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-05-08, 19:51
tmpuzer tmpuzer is offline
Registered User
 
Join Date: Mar 2008
Posts: 52
Thumbs up Lookup tables (aka domain/reference/code tables) surrogate vs natural keys?

I know the surrogate vs. natural key debate has gone on for a long time, but I was wondering about it specifically in the context of lookup tables. I have often created lookup tables like the following:

Code:
status_id     status
---           ---
 1            pending
 2            cancelled
 3            completed
And in most queries involving a table that has a foreign key to the lookup table then you must do a join to get text value in the query result.

But now I'm starting to wonder what the point of using a surrogate key here is. So much so that I've stopped using them. So the lookup table becomes simply:

Code:
status
---
pending
cancelled
completed
So now the purpose of the lookup table primarily becomes one of ensuring domain integrity. And any changes to the records in the lookup table can be propagated via cascading updates.

Thoughts? What is the purpose of using a surrogate key for lookup tables anyway?
Reply With Quote
  #2 (permalink)  
Old 03-05-08, 22:20
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
this is a brilliant realization on your part

i agree (and have done for a long time), the surrogate key in this situation seems silly

back in the day, when disk space was very scarce, it might have made sense, but nowadays you can get 200gigabytes for the price of a samwidge

(speaking of samwidges, i is hungry)

do not forget to declare the FK (a very important part of using a lookup table, whether with a natural key or a surrogate)
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 03-06-08, 01:41
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
There is no such thing as a lookup table.
If the data has its own table, then it is a data element, not a data attribute.
To take your own example, you may start out with a simple "lookup table":

create table Status(Status varchar(50))

All well and good and simple. Then you populate some data:
pending
cancelled
completed

Great! Then, oops...the client wants more status settings. So you add:
deleted
overridden
on hold
needs certification

No problem. But which ones represent states that are "in progress"? Which ones represent states that are no longer valid? Are you going to hard-code the text values in your queries? Better to add some flags, in case the text changes:

create table Status(Status varchar(50), InProgress bit, Invalid bit)

Is it still just a "lookup table"? I don't think so. And if your standard tables use surrogate keys, then for consistency this one should too.

There is no such thing as a lookup table. Treat them the same as your other tables.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #4 (permalink)  
Old 03-06-08, 03:26
tmpuzer tmpuzer is offline
Registered User
 
Join Date: Mar 2008
Posts: 52
Quote:
Originally Posted by r937
this is a brilliant realization on your part
Thanks (I'll take it where I can get it)

Quote:
Originally Posted by blindman
There is no such thing as a lookup table.
I've heard this sentiment before, and from a purity standpoint, I agree. And I am open to be convinced of another way of looking at things. But you can't deny that we have the terminology "lookup table" (in fact several variations thereof, as in the title of this post) which suggests that there is a conceptual entity backing the use of the term. There are articles all over the web that discuss them. And, to me, that suggests that, somehow, lookup tables are different from other tables, even if there is no fundamental difference and the only difference is how we look at them.

I think there is a common use-case where we need an attribute in a table (or set of tables) that is restricted to some set of values. And I would argue that, frequently, the lookup table's schema will remain stable over time.

Quote:
Originally Posted by blindman
And if your standard tables use surrogate keys, then for consistency this one should too.
Should the surrogate key/natural key decision be made at the database level or at the table level? I think I'm comfortable using surrogate keys for some tables and natural keys for others. Indeed, since I've started using natural keys for lookup tables that's what I've been doing, although admittedly I'm not sure what the ramifications of this new style will be.

In the case of the requirements changing, since the schema is going to be changing anyway, is there a conflict if what we once considered a lookup table is now no longer a lookup table because the requirements now deem that that table has some expanded function? I mean, maybe the client's requirements change a little more than in your example and you need a different table all together or two tables?
Reply With Quote
  #5 (permalink)  
Old 03-06-08, 04:08
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
I prefer natural keys simply because it makes the data easier to understand and the resultant SQL easier to read & write. It's a pain having to join to umpteen tables just to read what should be simple values from the database. Also having more complicated SQL means it's going to take longer to write the code and more likely to contain errors. Just my 2c.
Reply With Quote
  #6 (permalink)  
Old 03-06-08, 09:05
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Quote:
Originally Posted by tmpuzer
But you can't deny that we have the terminology "lookup table" (in fact several variations thereof, as in the title of this post) which suggests that there is a conceptual entity backing the use of the term.
We also have the term "Compassionate Conservative", but that doesn't mean it actually exists either.

Quote:
Originally Posted by tmpuzer
Should the surrogate key/natural key decision be made at the database level or at the table level? I think I'm comfortable using surrogate keys for some tables and natural keys for others.
Consistency has value.

Quote:
Originally Posted by mike_bike_kite
I prefer natural keys simply because it makes the data easier to understand and the resultant SQL easier to read & write. It's a pain having to join to umpteen tables just to read what should be simple values from the database. Also having more complicated SQL means it's going to take longer to write the code and more likely to contain errors. Just my 2c.
Huh? Surrogate keys lead to simpler SQL in any moderately complex database.
You want to limit yourself to a Star Schema? Go ahead and use natural keys.
Are you going to have three, four, five or more data levels? Avoid the composite keys and use surrogates.

I used to use natural keys and surrogate keys interchangeably. But eventually I came to the realization that though I had many times needed to convert a table from a natural key to a surrogate key, I HAVE NEVER said to myself "Gee, I wish I had not used a surrogate key for that table."

My 3c.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #7 (permalink)  
Old 03-06-08, 09:12
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by blindman
Consistency has value.
but "Foolish consistency [my emphasis] is the hobgoblin of small minds." (Ralph Waldo Emerson)
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #8 (permalink)  
Old 03-06-08, 09:18
pootle flump pootle flump is offline
King of Understatement
 
Join Date: Feb 2004
Location: One Flump in One Place
Posts: 14,905
I like to go au natural

.... apart from when I shouldn't - then I don't.

And naturals do make the SQL easier\ shorter. I don't know how they would make it more complex other than you are thinking joins on composite columns.
__________________
Testimonial:
Quote:
pootle flump
ur codings are working excelent.
Reply With Quote
  #9 (permalink)  
Old 03-06-08, 10:29
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally posted by blindman
Surrogate keys lead to simpler SQL in any moderately complex database.
I just don't see that - perhaps all the databases I've worked with have just been too simple. Using the example above we get the following options for some simple SQL
Code:
-- confusing and prone to errors
select t.id
from   MyTab t
where t.status_id in ( 2,3 )

-- long winded
select t.id
from   MyTab t,
         Status s
where s.status_id = t.status_id 
         and s.status in ( 'cancelled', 'completed' )

-- natural and simple
select t.id
from   MyTab t
where t.status in ( 'cancelled', 'completed' )
The last example is the natural key one - I know what I prefer. Could you explain the argument against the natural key in this case.
Reply With Quote
  #10 (permalink)  
Old 03-06-08, 12:26
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
"from MyTab t, Status s"
Do you pet your puppy with the same hands you use to write this code? Joins, please.

"where t.status in ( 'cancelled', 'completed' )"
You are hard-coding values, which is sloppy design.

And yes, your "Natural and simple" falls apart when it comes to composite keys. Though if your example is as complex as your designs ever get, then I guess natural keys will suffice.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #11 (permalink)  
Old 03-06-08, 12:37
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by blindman
"where t.status in ( 'cancelled', 'completed' )"
You are hard-coding values, which is sloppy design.
horsesh1t

would you kindly explain how to pull out rows corresponding to two (out of several) statuses without coding into the query which two
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #12 (permalink)  
Old 03-06-08, 12:46
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
The bank's back office database (not one of mine) that I have in front of me uses surrogate keys for everything - I could imagine the blindman being very pleased with this one - but then again it also stores much of it's data as parameters (read EAV) so perhaps not. Sadly even the parameter names are surrogate keys so I have to look up a parameter's id, before looking up the current value of that parameter (another id), then finally look up the value of this new id to get the true value of the parameter! All these stages have a complete history in the database so each stage is just that little bit more complex.

It's one of those infinitely extendable databases that is so complex that no-one dares alter it in the slightest. Sadly there aren't functions to read / translate these values so even the simplest code quickly gets frightening. Give me simplicity any day.
Reply With Quote
  #13 (permalink)  
Old 03-06-08, 13:01
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Quote:
Originally Posted by r937
horsesh1t

would you kindly explain how to pull out rows corresponding to two (out of several) statuses without coding into the query which two
Yes! The fisticuffs begin! Finally a worthy opponent...

Rudy, I would use bit values, as in my example above.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #14 (permalink)  
Old 03-06-08, 13:08
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
which bit values, pls?

and how do you specify them without coding them into the query?
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #15 (permalink)  
Old 03-06-08, 13:26
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
Why, you insert the bit values into a #temp table, of course!

-PatP
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On