Page 1 of 2 12 LastLast
Results 1 to 15 of 29
  1. #1
    Join Date
    Sep 2007
    Posts
    9

    Smile Design considerations about primary and foreign keys

    Hi everyone, here´s my question:

    Consider that you want a model a problem in which there are N buildings, numbered sequentially, each building has N floors numbered sequentially and each floor has N rooms numbered sequentially too. So there are several ways to model this:

    A)

    Buildings

    building_num INTEGER (PRIMARY KEY)

    Floors

    floor_num INTEGER (PRIMARY KEY)
    building_num INTEGER (PRIMARY KEY) (FOREIGN KEY to Buildings)

    Rooms

    room_num INTEGER (PRIMARY KEY)
    building_num INTEGER (PRIMARY KEY) (FOREIGN KEY to Buildings)
    floor_num INTEGER (PRIMARY KEY) (FOREIGN KEY to Floors)

    As you can see there´s a lot of redundancy in this tables, but if I conserve the model this way it will support easily changes in requirements like storing other data about floors without a lot of work. Now consider the following version:

    B)

    Buildings

    id_building INTEGER (PRIMARY KEY)
    num_building INTEGER

    Floors

    id_floor INTEGER (PRIMARY KEY)
    id_building INTEGER (FOREIGN KEY to Buildings)
    num_floor INTEGER

    Rooms

    id_room INTEGER (PRIMARY KEY)
    id_floor (FOREIGN KEY to Floors)
    num_floor

    With this approach we could reference from another table a single row of Rooms table with only the id_room value which is an autonumeric value but I would have to examine each table primary, foreign keys and other attributes in order to identify the desired room what could cause query complexity and performance hit in the future if those tables become very large. Also for this design I would need 2 indexes per table (the primary key and a unique constraint for num_x).

    Another way would be having a table called Rooms with all the data it requires and omitting tables Buildings and Floors like this way:

    C)

    Rooms

    room_num INTEGER (PRIMARY KEY)
    floor_num INTEGER (PRIMARY KEY)
    building_num INTEGER (PRIMARY KEY)

    But the real problem with this is that I if I want to reference a single row of this table I would have to conserve a composite key of 3 columns in the referencing tables which duplicates a lot of data.

    And maybe another way could be the following:

    D)

    Rooms

    id_room INTEGER (PRIMARY KEY)
    room_num INTEGER (PRIMARY KEY)
    floor_num INTEGER (PRIMARY KEY)
    building_num INTEGER (PRIMARY KEY)

    But this will need again two indexes (primary key and unique constraint) instead of only 1 (primary key).

    Every table in above possible solutions is normalized to the third normal form but as you can see each approach has its advantages and disadvantages but I´m not sure what to do? which one do you consider the best? is there another way to model this situation?, is there something I haven´t consider? in simple words... WHAT DO YOU THINK??????
    Last edited by rdx23; 09-28-07 at 23:07.

  2. #2
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    A is wrong. Should say:

    room_num INTEGER (PRIMARY KEY)
    building_num INTEGER (PRIMARY KEY)
    floor_num INTEGER (PRIMARY KEY)
    FOREIGN KEY building_num, floor_num ON Floors

    Why do you think there's "a lot of redundancy" in A? If you implement these foreign key constraints, you can tell the DBMS to cascade updates and deletes, so there's no additional work to manage it on your part. Using surrogates as you do in other examples requires extra code to create the surrogates.

    Personally, I've never been a big fan of surrogate keys. But, the fact is that building numbers and room numbers are *already* surrogate keys, so your designs B and D are using surrogates to surrogates. I don't think you can really simplify your SQL much by using an extra layer of surrogates, and the usual performance claims don't apply when everything is a bunch of ints.

    I think that if you went with C you'd wind up doing A anyway, simply because you'll want to store information about buildings and floors at some point. For instance, if you want to store street addresses, you'd want a buildings table to put that in. And if you need to record floor layouts, it only makes sense to stash that in a floors table. So my thinking is design A, with the correction I made above, of course.

  3. #3
    Join Date
    Sep 2007
    Posts
    9
    Hi!, you´re totally right, correct foreign key for table Rooms in design A is like you said:

    room_num INTEGER (PRIMARY KEY)
    building_num INTEGER (PRIMARY KEY)
    floor_num INTEGER (PRIMARY KEY)
    FOREIGN KEY building_num, floor_num ON Floors

    I´m sorry about the mistake.

    About redundancy in design A I meen that you carry each table´s information through all the others, what doesn´t happen with C, furthermore if you have Buildings and Floors tables you need extra space to store the same information contained in Rooms table and also you would need space for indexes too, but as you said, maybe sometime we could need to store information about buildings but my biggest concern is about other tables that could reference Rooms table if it doesn´t have a surrogate key because you would have to propagate 3 columns everywhere you need to reference it.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by rdx23
    ... because you would have to propagate 3 columns everywhere you need to reference it.
    and the problem with this is ... ?

    actually, i think it's a benefit

    if there actually is something that needs to refer to the room (e.g. a desk), it automatically refers to the floor and building as well, and you do not have to join to the floor table and/or the building table in order to find out which ones they are

    last time i checked, disk space was very cheap and excess processing was to be studiously avoided to make the app faster...

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Sep 2007
    Posts
    9
    ok, it´s true.. it´s better to propagate the 3 columns to other tables in terms of performance but will you choose design A or C if you know that you won´t need to store information about Buildings and Floors in the short term? I think that if C is choosen and in the future this need appears it should be easy to modify the database to add it the Buildings and Floors tables.. am I right? and let me ask you something else.. what do you think about alphanumeric primary keys? is there something wrong with them? should they always be numbers?

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by rdx23
    what do you think about alphanumeric primary keys? is there something wrong with them? should they always be numbers?
    i love them, no there isn't, and no they shouldn't always
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Sep 2007
    Posts
    9
    So it seems everybody hates surrogate keys, this is a little bit difficult because it says something different everywhere I look for information about this issue, maybe the whole problem can be summarized in: should you use composite primary keys for entitys?, and I meen keys composed of 4, 5 or even 6 columns instead of surrogate keys? there are two big considerations and I think both are completely valid: Surrogate keys are independent of bussines changes and natural keys (Frequently composite keys) improves performance minimizing the number of tables where you need to look for when you make a query.

  8. #8
    Join Date
    Dec 2004
    Location
    Madison, WI
    Posts
    3,926
    I use a combination of both methods.
    Expert Database Programming
    MSAccess since 1.0, SQL Server since 6.5, Visual Basic (5.0, 6.0)

  9. #9
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Gotta disagree with Rudy here.

    Go with design B. Composite cascading key annoy the hell out of me and become particularly problematic when your schema has four, five, or more levels, or when you have multiple paths such as:
    Code:
       |-->B--|
    A--|      |-->D
       |-->C--|
    You can already derive building from floors, so it is redundant to store building ID in the Rooms table. Unless you are dealing with a terrabyte of data, you'll never notice a difference in processing and it will make administration of the data much easier.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    "derive" (i.e. with an extra join) being the operative word
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Quote Originally Posted by r937
    i love them, no there isn't, and no they shouldn't always
    is it not marginally faster to join on numbers than on characters. numbers being the natural language of computers. do not get me wrong, I dig natural keys (unique constraints, indexing etc..), but on large tables I tend to take a surrogate.
    Last edited by Thrasymachus; 09-30-07 at 14:02.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    yeah, but when you get right down to it, characters are stored as numbers -- 0s and 1s

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  13. #13
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    So are blobs. Should we use those as primary keys?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    the issue was whether comparing character strings is inately faster than comparing integers

    i wonder how comparing two blobs of length 4 stacks up to comparing two integers
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Quote Originally Posted by blindman
    You can already derive building from floors, so it is redundant to store building ID in the Rooms table.
    Based on the way we do it where I work, I assumed that the first floor was 1 no matter what the building and that you couldn't derive the building from the floor. I still don't see how surrogate keys make administration easier than ON UPDATE CASCADE, though I admit that I always find myself surprised when it actually works. And I think I've just gotten used to writing SQL for composite keys.

    Quote Originally Posted by thrasymachus
    is it not marginally faster to join on numbers than on characters.
    Most database ops are disk bound, not CPU bound. And most string libraries try to work with an entire machine word at once, anyway.

    I'm surprised no one has thought of the very worst key: floating point numbers. For example, +0. != -0., and even if the column isn't NULLABLE, it can still be NaN.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •