Results 1 to 15 of 15
  1. #1
    Join Date
    Nov 2007
    Posts
    6

    Red face Noob question - I keep getting conflicting answers

    Hi, i am pretty new to databases, and have been getting conflicting information about this question.

    if i have 2 tables, people, and hobbies, where a person can have multiple hobbies. The person enters their name, and can then enter in any number of hobbies. So, Person 2 enters their name, and hobby, which is "tennis". Should i just add Tennis to the hobbies table, or should i check to see if its there already, and if it is, reference that.

    Does the answer depend on whether i need to find people base on a hobby?

    I was lead to believe that duplicate data was bad, but surly its going to be a lot quicker if i just add the hobby to the database, and dont have to check if its already there. And it also becomes more complicated should i need to delete the person, and their hobbies.

    Thankyou

  2. #2
    Join Date
    May 2005
    Location
    San Antonio, Texas
    Posts
    134
    ask yourself this... Is the following meaningful:

    amthomas tennis
    amthomas tennis
    amthomas tennis
    amthomas tennis
    amthomas tennis

    you should do meaningful things. everything else is just going to be clutter that gets in the way at some point.
    Vi veri veniversum vivus vici
    By the power of truth, I, a living man, have conquered the universe

  3. #3
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by amthomas
    ask yourself this... Is the following meaningful:

    amthomas tennis
    amthomas tennis
    amthomas tennis
    amthomas tennis
    amthomas tennis
    ...depends on what you've been smoking, and whether there is a lava lamp in the room.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  4. #4
    Join Date
    May 2005
    Location
    San Antonio, Texas
    Posts
    134
    :P whoa... like...whoa man.
    Vi veri veniversum vivus vici
    By the power of truth, I, a living man, have conquered the universe

  5. #5
    Join Date
    Nov 2007
    Posts
    6

    thanks for replies, but....

    Thanks for you replies

    but what about:

    oleat tennis
    amthomas tennis
    blindman tennis

    so your saying avoid duplication, even if it potentially makes things quicker. e.g deleting a person, without having to check if their hobbies are shared with anyone else.

  6. #6
    Join Date
    May 2005
    Location
    San Antonio, Texas
    Posts
    134
    these all look like reasonable entries to me:
    oleat tennis
    amthomas tennis
    blindman tennis

    as far as duplication, it only seems to make things quicker on the programming side. it really isn't beneficial.

    although it is hard to see in the simple case like this since and update would make changes to all instances, you are breaking data integrity.

    amthomas tennis
    amthomas tennis
    amthomas tennis

    could be changed to

    amthomas tennis
    amthomas tennis
    amthomas soccer

    although you meant to get rid of tennis and put in soccer instead.

    now your db is lying to you and basically useless.

    I am not sure what you mean by 'deleting a person....'

    You could delete a person without checking hobbies if you wanted to at any time. If you had foreign keys that cascade then you could have it remove all associated entries.
    Vi veri veniversum vivus vici
    By the power of truth, I, a living man, have conquered the universe

  7. #7
    Join Date
    Nov 2007
    Posts
    6
    Thanks amThomas very much, i dont think i have made my problem very clear - my fault.

    Here is an image of the 2 options as i see it:
    http://upload.imgspot.com/u/07/331/13/thing.jpg
    (The data is entered on a form by a user, and then uploaded.)

    Version 1 has duplicate text data, but its easy and quick to add new hobbies for people. It is also easy and quick to delete a person, and all their hobbies.

    Version 2 has no repeating data, but every time someone uploads a new hobby, i have to check to see if its been added previously, if so, i can reference it. Also if i were to try and delete a person, i would have to check all their hobbies to see if they were referenced by any other person, and only then could i safely delete it.

    Thats how it appears to me, but i think i may be getting confused. I dont know which is the best solution.

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    in my opinion, this is all you need --
    Code:
    create table people
    ( id   integer     not null primary key
    , name varchar(99) not null
    );
    
    create table personhobbies
    ( person_id integer     not null
    , hobby     varchar(99) not null
    , primary key ( person_id, hobby )
    , index (hobby, person_id)
    );
    notice the personhobbies primary key, which is the mechanism which automatically prevents this --

    amthomas tennis
    amthomas tennis
    amthomas tennis
    amthomas tennis

    the personhobbies table above is actually a relationship table, but the hobbies table doesn't actually exist

    the only time you'd go to the 3-table scheme with an actual hobbies table is if you were interested in controlling which hobbies they were allowed to enter

    if you want them to enter whatever they want, then you don't need the hobbies table

    as for deletions, you would never need to check whether other people had the hobbies of the person you're deleting, in neither the 2-table or 3-table scheme

    so, did this answer conflict with anything you've been told previoulsy?

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Nov 2007
    Posts
    6
    Thanks for the concise answer r937, it makes sense to me. It does conflict with other recommendations i have had, i was told that all duplication should be removed, hence the 3 table structure, but am beginning to think that is overkill.

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    "all duplication should be removed" is faulty advice anyway

    if you replace "tennis" in the relationship table with an integer foreign key to the hobbies table, the integer values will be duplicated exactly as often as the keyword was

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    May 2005
    Location
    San Antonio, Texas
    Posts
    134
    hehe, I guess the word duplication can be vague

    sorry.. I didn't realize you were worried about foreign key duplication instead of record duplication.

    I had the table given above in mind, but when you stated that it would be a lot quicker I though you were trying to avoid checking code or exception catching code. The table will give you an error if you try to input duplicates that violate the unique constraint. which is good!

    Now, I am just stating this as a preference... I usually would go ahead and make the join table since I usually don't just have a single attribute like tennis. And later if you wanted to add more attributes the structure already exists and you just add to the hobbies table. For example: you want to add a description to each hobby. These things do not need to be done for simple projects, and, as r937 pointed out, since the attribute is the only primary key/data that makes up the foreign key then you can use it just fine
    Vi veri veniversum vivus vici
    By the power of truth, I, a living man, have conquered the universe

  12. #12
    Join Date
    Nov 2007
    Posts
    6
    Thanks gain for your replies.
    r937 - i had never thought of it like that before.
    amthomas - You say the three table method allows you to add more attributes, but surly the 2 table design allows for that too?

    I think i can finally articulate my confusion in this statement:

    What rules govern whether you should check a table for an existing, matching record before insertion and use that, or add a new record each time.

  13. #13
    Join Date
    May 2005
    Location
    San Antonio, Texas
    Posts
    134
    the 'two table design' is not normalized if you add more hobby information. I will give an example:

    you might want to have: username (FK), hobby, hobby_description

    amthomas tennis "yellow ball racket whacking"
    oleat tennis "yellow ball racket whacking"
    r937 tennis "yellow ball racket whacking"

    Now you have repeating dependencies. A Hobby description is not directly dependent on the primary key, which is (username, hobby).

    There is also a side issue of misspelling What if tenNis were entered accidentally once or some other variation?

    I only mention these since I believe you may be curious. I do not know your circumstances, but they are probably issues you do not need to worry about.

    when you ask:
    What rules govern whether you should check a table for an existing, matching record before insertion and use that, or add a new record each time.
    then I assume you mean adding something like:
    athomas tennis
    r937 tennis
    oleat tennis
    and you know that you want (athomas, tennis) to be in that table.

    If you don't already know it is in there then you just don't know. You can check but even if it isn't there then it is possible that it gets added between the time you look and try to insert. Your best choice is just to insert and if a unique constraint violation is thrown then you know that it already exists.

    The only way to be sure without an exception (that I can think of right now.. others may have more info) would be to lock the whole table (usually you don't want to do this) then check for the entry and if it does not exist insert it. This would all have to be in one transaction.

    Please let me know if that isn't what you are asking here.. I just got out of a meeting and I am a little fuzzy heh.

    [EDIT] oops.. had two table designs in my head at once :P marked hobby as a FK by accident.
    Vi veri veniversum vivus vici
    By the power of truth, I, a living man, have conquered the universe

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by amthomas
    marked hobby as a FK by accident.
    well, it is, sort of -- a virtual FK to a PK that doen't exist because in the 2-table model the hobbies table isn't there

    nice post, amthomas -- nice explanations
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Nov 2007
    Posts
    6
    Thanks very much guy's, i think ive got it, thanks for being so patient with me, i know it must be frustrating for you. This is the line that sorted it for me, i think:

    If you don't already know it is in there then you just don't know. You can check but even if it isn't there then it is possible that it gets added between the time you look and try to insert. Your best choice is just to insert and if a unique constraint violation is thrown then you know that it already exists.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •