Page 1 of 4 123 ... LastLast
Results 1 to 15 of 52
  1. #1
    Join Date
    Feb 2009
    Posts
    5

    "id" attribute in db tables...

    Hi everyone,

    I have a few tables in my database. For example users, packages, etc. And each table has a “name” attribute, so the “users” table has “user_name”, “packages” table has “package_name”, etc. is it a good idea to also have an “id” field for each table, to avoid data duplication when associating different tables?

    Thanks,

    Tom

  2. #2
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    It is a good thing for every table to have at least one Candidate Key. The relational model requires candidate keys whereas in SQL they are optional (but still extremely important).

    The term "ID field" is somewhat overloaded so it's hard to answer your question directly. I expect you are talking about surrogate keys. From a conceptual modelling perspective it's most important that every table has at least one natural key. If it doesn't then you should rethink your model. Whether you also add a surrogate key to your table is of secondary importance. There may be good reasons for doing so but unless you have identified such a reason then in my opinion it's better to leave it out. Reasonable people differ on that point however.

    If I haven't answered your question then please clarify what you mean by "Id" in this instance.

  3. #3
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by dportas
    Reasonable people differ on that point however.
    in this case i agree with you
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  4. #4
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    I've even heard of unreasonable people differing! Perish the thought.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by qmqmqm
    is it a good idea to also have an “id” field for each table, to avoid data duplication when associating different tables?
    To paraphrase from a book I recently started reading, "adding an id field does not remove data duplication". You just "duplicate" numbers instead of text.

    The "id" column is known as a surrogate key. It is a surrogate (i.e. used in place of) for the natural key (the names). You could try googling surrogate and natural keys if you want more info.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    you started at the back?

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by pootle flump
    To paraphrase from a book I recently started reading, "adding an id field does not remove data duplication". You just "duplicate" numbers instead of text.
    How does adding an ID field duplicate numbers?
    Just goes to show you can't believe everything you read, I guess.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by blindman
    How does adding an ID field duplicate numbers?
    okay, here's an example

    suppose you have a table of content entries for a content management system, where each content entry has its own PK, and let's assume this is an integer

    id title
    423 What If I Get Sick and Die?
    524 Uncle Karl and the Gasoline
    537 Be Nice to Everybody
    573 Hello Statue
    598 The Size of our Galaxy
    605 Windows Media Center Rocks

    and now suppose each content entry could have multiple keywords

    there would be a separate table, entrykeywords, that would look something like this --

    entry_id keyword
    524 family
    524 reckless
    537 family
    537 my three rules
    598 astronomy
    605 television
    605 windows

    many developers notice that the keywords are being repeated here -- obviously, it's a many-to-many relationship, as the same keyword can be assigned to multiple entries, while the same entry can have multiple keywords

    so they decide that they will "eliminate" the "redundancy" by storing the keywords in their own table, and using a numeric id to relate the entries to the keywords

    so then the keywords table would look like this --

    id keyword
    6 windows
    7 television
    9 family
    11 reckless
    13 astronomy
    16 my three rules

    and then the entrykeywords table would look like this --

    entry_id keyword_id
    524 9
    524 11
    537 9
    537 16
    598 13
    605 7
    605 6

    and and now we've reached the point where that statement applies -- many database developers think that they are "eliminating" the "redundancy" of having the actual keywords repeated in the entrykeywords table, whereas in fact the same amount of redundancy is still there, except now they are integer values that are being repeated

    further, note that with the surrogate key, we are forced to join to the keywords table to determine the keyword, an unnecessary extra step as compared with using the natural key, the keyword itself, in the entrykeywords table
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by blindman
    How does adding an ID field duplicate numbers?
    It "duplicates" numbers in exactly the same way that without using IDs "duplicates" names. Note the quotes around the word "duplicates". I was using the OP's definition of duplication, and making it clear with my quote marks that I don't respect that definition.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  10. #10
    Join Date
    Feb 2009
    Posts
    5
    Quote Originally Posted by pootle flump
    It "duplicates" numbers in exactly the same way that without using IDs "duplicates" names. Note the quotes around the word "duplicates". I was using the OP's definition of duplication, and making it clear with my quote marks that I don't respect that definition.
    Thank you very much guys. I read and studied all your replies. Yes indeed the id field is the surrogate key. For the "users" table that I talked about, "user name" field can serve as the primary key. However, in the "packages" table, the package name and a comment field are needed to uniquely identify a record. So in this table I guess it would be more efficient to introduce an "id" surrogate key?

    Thanks,

    Tom

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    more efficient? no, not really

    the deciding factor might be based on two things: how many tables are related to the users table via a foreign key? do you allow users to change their usernames?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  12. #12
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Quote Originally Posted by r937
    and and now we've reached the point where that statement applies -- many database developers think that they are "eliminating" the "redundancy" of having the actual keywords repeated in the entrykeywords table, whereas in fact the same amount of redundancy is still there, except now they are integer values that are being repeated
    True. However, N keywords in the intersection table will take up lots more space than N IDs. Especially when you include the index
    Quote Originally Posted by r937
    further, note that with the surrogate key, we are forced to join to the keywords table to determine the keyword, an unnecessary extra step as compared with using the natural key, the keyword itself, in the entrykeywords table
    It's a trade-off. Space versus time.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  13. #13
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    hey, these are not your dad's $200,000 98K disk platters -- today's storage is, what, a couple pennies per gigabyte?

    space should be the ~last~ consideration in database design

    simplicity ftw, eh
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  14. #14
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    And yet, depending upon the situation, space can be a constraint. And, while sata drive prices had plummeted, by the time you add raid controllers and redundant hardware, extra storage is still not free.

    Likewise, joins can fairly efficient, especially with integer type indexes... These aren't your dad's 10 mHz XT processors, after all...

    I will grant you - storage bang for the buck has increased faster than processing bang for the buck recently.
    Last edited by loquin; 03-02-09 at 17:10.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  15. #15
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by loquin
    These aren't your dad's 10 mHz XT processors, after all...
    .
    in other words, they're faster... which means that "premature optimizations" like using integers instead of natural keys is less important today

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •