Page 1 of 5 123 ... LastLast
Results 1 to 15 of 62
  1. #1
    Join Date
    Mar 2008
    Posts
    52

    Thumbs up Lookup tables (aka domain/reference/code tables) surrogate vs natural keys?

    I know the surrogate vs. natural key debate has gone on for a long time, but I was wondering about it specifically in the context of lookup tables. I have often created lookup tables like the following:

    Code:
    status_id     status
    ---           ---
     1            pending
     2            cancelled
     3            completed
    And in most queries involving a table that has a foreign key to the lookup table then you must do a join to get text value in the query result.

    But now I'm starting to wonder what the point of using a surrogate key here is. So much so that I've stopped using them. So the lookup table becomes simply:

    Code:
    status
    ---
    pending
    cancelled
    completed
    So now the purpose of the lookup table primarily becomes one of ensuring domain integrity. And any changes to the records in the lookup table can be propagated via cascading updates.

    Thoughts? What is the purpose of using a surrogate key for lookup tables anyway?

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    this is a brilliant realization on your part

    i agree (and have done for a long time), the surrogate key in this situation seems silly

    back in the day, when disk space was very scarce, it might have made sense, but nowadays you can get 200gigabytes for the price of a samwidge

    (speaking of samwidges, i is hungry)

    do not forget to declare the FK (a very important part of using a lookup table, whether with a natural key or a surrogate)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    There is no such thing as a lookup table.
    If the data has its own table, then it is a data element, not a data attribute.
    To take your own example, you may start out with a simple "lookup table":

    create table Status(Status varchar(50))

    All well and good and simple. Then you populate some data:
    pending
    cancelled
    completed

    Great! Then, oops...the client wants more status settings. So you add:
    deleted
    overridden
    on hold
    needs certification

    No problem. But which ones represent states that are "in progress"? Which ones represent states that are no longer valid? Are you going to hard-code the text values in your queries? Better to add some flags, in case the text changes:

    create table Status(Status varchar(50), InProgress bit, Invalid bit)

    Is it still just a "lookup table"? I don't think so. And if your standard tables use surrogate keys, then for consistency this one should too.

    There is no such thing as a lookup table. Treat them the same as your other tables.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  4. #4
    Join Date
    Mar 2008
    Posts
    52
    Quote Originally Posted by r937
    this is a brilliant realization on your part
    Thanks (I'll take it where I can get it)

    Quote Originally Posted by blindman
    There is no such thing as a lookup table.
    I've heard this sentiment before, and from a purity standpoint, I agree. And I am open to be convinced of another way of looking at things. But you can't deny that we have the terminology "lookup table" (in fact several variations thereof, as in the title of this post) which suggests that there is a conceptual entity backing the use of the term. There are articles all over the web that discuss them. And, to me, that suggests that, somehow, lookup tables are different from other tables, even if there is no fundamental difference and the only difference is how we look at them.

    I think there is a common use-case where we need an attribute in a table (or set of tables) that is restricted to some set of values. And I would argue that, frequently, the lookup table's schema will remain stable over time.

    Quote Originally Posted by blindman
    And if your standard tables use surrogate keys, then for consistency this one should too.
    Should the surrogate key/natural key decision be made at the database level or at the table level? I think I'm comfortable using surrogate keys for some tables and natural keys for others. Indeed, since I've started using natural keys for lookup tables that's what I've been doing, although admittedly I'm not sure what the ramifications of this new style will be.

    In the case of the requirements changing, since the schema is going to be changing anyway, is there a conflict if what we once considered a lookup table is now no longer a lookup table because the requirements now deem that that table has some expanded function? I mean, maybe the client's requirements change a little more than in your example and you need a different table all together or two tables?

  5. #5
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I prefer natural keys simply because it makes the data easier to understand and the resultant SQL easier to read & write. It's a pain having to join to umpteen tables just to read what should be simple values from the database. Also having more complicated SQL means it's going to take longer to write the code and more likely to contain errors. Just my 2c.

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by tmpuzer
    But you can't deny that we have the terminology "lookup table" (in fact several variations thereof, as in the title of this post) which suggests that there is a conceptual entity backing the use of the term.
    We also have the term "Compassionate Conservative", but that doesn't mean it actually exists either.

    Quote Originally Posted by tmpuzer
    Should the surrogate key/natural key decision be made at the database level or at the table level? I think I'm comfortable using surrogate keys for some tables and natural keys for others.
    Consistency has value.

    Quote Originally Posted by mike_bike_kite
    I prefer natural keys simply because it makes the data easier to understand and the resultant SQL easier to read & write. It's a pain having to join to umpteen tables just to read what should be simple values from the database. Also having more complicated SQL means it's going to take longer to write the code and more likely to contain errors. Just my 2c.
    Huh? Surrogate keys lead to simpler SQL in any moderately complex database.
    You want to limit yourself to a Star Schema? Go ahead and use natural keys.
    Are you going to have three, four, five or more data levels? Avoid the composite keys and use surrogates.

    I used to use natural keys and surrogate keys interchangeably. But eventually I came to the realization that though I had many times needed to convert a table from a natural key to a surrogate key, I HAVE NEVER said to myself "Gee, I wish I had not used a surrogate key for that table."

    My 3c.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by blindman
    Consistency has value.
    but "Foolish consistency [my emphasis] is the hobgoblin of small minds." (Ralph Waldo Emerson)
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    I like to go au natural

    .... apart from when I shouldn't - then I don't.

    And naturals do make the SQL easier\ shorter. I don't know how they would make it more complex other than you are thinking joins on composite columns.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Originally posted by blindman
    Surrogate keys lead to simpler SQL in any moderately complex database.
    I just don't see that - perhaps all the databases I've worked with have just been too simple. Using the example above we get the following options for some simple SQL
    Code:
    -- confusing and prone to errors
    select t.id
    from   MyTab t
    where t.status_id in ( 2,3 )
    
    -- long winded
    select t.id
    from   MyTab t,
             Status s
    where s.status_id = t.status_id 
             and s.status in ( 'cancelled', 'completed' )
    
    -- natural and simple
    select t.id
    from   MyTab t
    where t.status in ( 'cancelled', 'completed' )
    The last example is the natural key one - I know what I prefer. Could you explain the argument against the natural key in this case.

  10. #10
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    "from MyTab t, Status s"
    Do you pet your puppy with the same hands you use to write this code? Joins, please.

    "where t.status in ( 'cancelled', 'completed' )"
    You are hard-coding values, which is sloppy design.

    And yes, your "Natural and simple" falls apart when it comes to composite keys. Though if your example is as complex as your designs ever get, then I guess natural keys will suffice.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by blindman
    "where t.status in ( 'cancelled', 'completed' )"
    You are hard-coding values, which is sloppy design.
    horsesh1t

    would you kindly explain how to pull out rows corresponding to two (out of several) statuses without coding into the query which two
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  12. #12
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    The bank's back office database (not one of mine) that I have in front of me uses surrogate keys for everything - I could imagine the blindman being very pleased with this one - but then again it also stores much of it's data as parameters (read EAV) so perhaps not. Sadly even the parameter names are surrogate keys so I have to look up a parameter's id, before looking up the current value of that parameter (another id), then finally look up the value of this new id to get the true value of the parameter! All these stages have a complete history in the database so each stage is just that little bit more complex.

    It's one of those infinitely extendable databases that is so complex that no-one dares alter it in the slightest. Sadly there aren't functions to read / translate these values so even the simplest code quickly gets frightening. Give me simplicity any day.

  13. #13
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by r937
    horsesh1t

    would you kindly explain how to pull out rows corresponding to two (out of several) statuses without coding into the query which two
    Yes! The fisticuffs begin! Finally a worthy opponent...

    Rudy, I would use bit values, as in my example above.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    which bit values, pls?

    and how do you specify them without coding them into the query?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Why, you insert the bit values into a #temp table, of course!

    -PatP

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •