If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Noob question - I keep getting conflicting answers

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-28-07, 15:01
oleat oleat is offline
Registered User
 
Join Date: Nov 2007
Posts: 6
Red face Noob question - I keep getting conflicting answers

Hi, i am pretty new to databases, and have been getting conflicting information about this question.

if i have 2 tables, people, and hobbies, where a person can have multiple hobbies. The person enters their name, and can then enter in any number of hobbies. So, Person 2 enters their name, and hobby, which is "tennis". Should i just add Tennis to the hobbies table, or should i check to see if its there already, and if it is, reference that.

Does the answer depend on whether i need to find people base on a hobby?

I was lead to believe that duplicate data was bad, but surly its going to be a lot quicker if i just add the hobby to the database, and dont have to check if its already there. And it also becomes more complicated should i need to delete the person, and their hobbies.

Thankyou
Reply With Quote
  #2 (permalink)  
Old 11-28-07, 15:10
amthomas amthomas is offline
Registered User
 
Join Date: May 2005
Location: San Antonio, Texas
Posts: 134
ask yourself this... Is the following meaningful:

amthomas tennis
amthomas tennis
amthomas tennis
amthomas tennis
amthomas tennis

you should do meaningful things. everything else is just going to be clutter that gets in the way at some point.
__________________
Vi veri veniversum vivus vici
By the power of truth, I, a living man, have conquered the universe
Reply With Quote
  #3 (permalink)  
Old 11-28-07, 15:19
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Quote:
Originally Posted by amthomas
ask yourself this... Is the following meaningful:

amthomas tennis
amthomas tennis
amthomas tennis
amthomas tennis
amthomas tennis
...depends on what you've been smoking, and whether there is a lava lamp in the room.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #4 (permalink)  
Old 11-28-07, 15:27
amthomas amthomas is offline
Registered User
 
Join Date: May 2005
Location: San Antonio, Texas
Posts: 134
:P whoa... like...whoa man.
__________________
Vi veri veniversum vivus vici
By the power of truth, I, a living man, have conquered the universe
Reply With Quote
  #5 (permalink)  
Old 11-28-07, 16:08
oleat oleat is offline
Registered User
 
Join Date: Nov 2007
Posts: 6
thanks for replies, but....

Thanks for you replies

but what about:

oleat tennis
amthomas tennis
blindman tennis

so your saying avoid duplication, even if it potentially makes things quicker. e.g deleting a person, without having to check if their hobbies are shared with anyone else.
Reply With Quote
  #6 (permalink)  
Old 11-28-07, 16:24
amthomas amthomas is offline
Registered User
 
Join Date: May 2005
Location: San Antonio, Texas
Posts: 134
these all look like reasonable entries to me:
oleat tennis
amthomas tennis
blindman tennis

as far as duplication, it only seems to make things quicker on the programming side. it really isn't beneficial.

although it is hard to see in the simple case like this since and update would make changes to all instances, you are breaking data integrity.

amthomas tennis
amthomas tennis
amthomas tennis

could be changed to

amthomas tennis
amthomas tennis
amthomas soccer

although you meant to get rid of tennis and put in soccer instead.

now your db is lying to you and basically useless.

I am not sure what you mean by 'deleting a person....'

You could delete a person without checking hobbies if you wanted to at any time. If you had foreign keys that cascade then you could have it remove all associated entries.
__________________
Vi veri veniversum vivus vici
By the power of truth, I, a living man, have conquered the universe
Reply With Quote
  #7 (permalink)  
Old 11-28-07, 17:16
oleat oleat is offline
Registered User
 
Join Date: Nov 2007
Posts: 6
Thanks amThomas very much, i dont think i have made my problem very clear - my fault.

Here is an image of the 2 options as i see it:
http://upload.imgspot.com/u/07/331/13/thing.jpg
(The data is entered on a form by a user, and then uploaded.)

Version 1 has duplicate text data, but its easy and quick to add new hobbies for people. It is also easy and quick to delete a person, and all their hobbies.

Version 2 has no repeating data, but every time someone uploads a new hobby, i have to check to see if its been added previously, if so, i can reference it. Also if i were to try and delete a person, i would have to check all their hobbies to see if they were referenced by any other person, and only then could i safely delete it.

Thats how it appears to me, but i think i may be getting confused. I dont know which is the best solution.
Reply With Quote
  #8 (permalink)  
Old 11-28-07, 22:46
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
in my opinion, this is all you need --
Code:
create table people
( id   integer     not null primary key
, name varchar(99) not null
);

create table personhobbies
( person_id integer     not null
, hobby     varchar(99) not null
, primary key ( person_id, hobby )
, index (hobby, person_id)
);
notice the personhobbies primary key, which is the mechanism which automatically prevents this --

amthomas tennis
amthomas tennis
amthomas tennis
amthomas tennis

the personhobbies table above is actually a relationship table, but the hobbies table doesn't actually exist

the only time you'd go to the 3-table scheme with an actual hobbies table is if you were interested in controlling which hobbies they were allowed to enter

if you want them to enter whatever they want, then you don't need the hobbies table

as for deletions, you would never need to check whether other people had the hobbies of the person you're deleting, in neither the 2-table or 3-table scheme

so, did this answer conflict with anything you've been told previoulsy?

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #9 (permalink)  
Old 11-29-07, 02:50
oleat oleat is offline
Registered User
 
Join Date: Nov 2007
Posts: 6
Thanks for the concise answer r937, it makes sense to me. It does conflict with other recommendations i have had, i was told that all duplication should be removed, hence the 3 table structure, but am beginning to think that is overkill.
Reply With Quote
  #10 (permalink)  
Old 11-29-07, 06:55
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
"all duplication should be removed" is faulty advice anyway

if you replace "tennis" in the relationship table with an integer foreign key to the hobbies table, the integer values will be duplicated exactly as often as the keyword was

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #11 (permalink)  
Old 11-29-07, 11:33
amthomas amthomas is offline
Registered User
 
Join Date: May 2005
Location: San Antonio, Texas
Posts: 134
hehe, I guess the word duplication can be vague

sorry.. I didn't realize you were worried about foreign key duplication instead of record duplication.

I had the table given above in mind, but when you stated that it would be a lot quicker I though you were trying to avoid checking code or exception catching code. The table will give you an error if you try to input duplicates that violate the unique constraint. which is good!

Now, I am just stating this as a preference... I usually would go ahead and make the join table since I usually don't just have a single attribute like tennis. And later if you wanted to add more attributes the structure already exists and you just add to the hobbies table. For example: you want to add a description to each hobby. These things do not need to be done for simple projects, and, as r937 pointed out, since the attribute is the only primary key/data that makes up the foreign key then you can use it just fine
__________________
Vi veri veniversum vivus vici
By the power of truth, I, a living man, have conquered the universe
Reply With Quote
  #12 (permalink)  
Old 11-29-07, 17:06
oleat oleat is offline
Registered User
 
Join Date: Nov 2007
Posts: 6
Thanks gain for your replies.
r937 - i had never thought of it like that before.
amthomas - You say the three table method allows you to add more attributes, but surly the 2 table design allows for that too?

I think i can finally articulate my confusion in this statement:

What rules govern whether you should check a table for an existing, matching record before insertion and use that, or add a new record each time.
Reply With Quote
  #13 (permalink)  
Old 11-29-07, 18:44
amthomas amthomas is offline
Registered User
 
Join Date: May 2005
Location: San Antonio, Texas
Posts: 134
the 'two table design' is not normalized if you add more hobby information. I will give an example:

you might want to have: username (FK), hobby, hobby_description

amthomas tennis "yellow ball racket whacking"
oleat tennis "yellow ball racket whacking"
r937 tennis "yellow ball racket whacking"

Now you have repeating dependencies. A Hobby description is not directly dependent on the primary key, which is (username, hobby).

There is also a side issue of misspelling What if tenNis were entered accidentally once or some other variation?

I only mention these since I believe you may be curious. I do not know your circumstances, but they are probably issues you do not need to worry about.

when you ask:
Quote:
What rules govern whether you should check a table for an existing, matching record before insertion and use that, or add a new record each time.
then I assume you mean adding something like:
athomas tennis
r937 tennis
oleat tennis
and you know that you want (athomas, tennis) to be in that table.

If you don't already know it is in there then you just don't know. You can check but even if it isn't there then it is possible that it gets added between the time you look and try to insert. Your best choice is just to insert and if a unique constraint violation is thrown then you know that it already exists.

The only way to be sure without an exception (that I can think of right now.. others may have more info) would be to lock the whole table (usually you don't want to do this) then check for the entry and if it does not exist insert it. This would all have to be in one transaction.

Please let me know if that isn't what you are asking here.. I just got out of a meeting and I am a little fuzzy heh.

[EDIT] oops.. had two table designs in my head at once :P marked hobby as a FK by accident.
__________________
Vi veri veniversum vivus vici
By the power of truth, I, a living man, have conquered the universe
Reply With Quote
  #14 (permalink)  
Old 11-29-07, 18:58
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by amthomas
marked hobby as a FK by accident.
well, it is, sort of -- a virtual FK to a PK that doen't exist because in the 2-table model the hobbies table isn't there

nice post, amthomas -- nice explanations
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #15 (permalink)  
Old 11-30-07, 03:15
oleat oleat is offline
Registered User
 
Join Date: Nov 2007
Posts: 6
Thanks very much guy's, i think ive got it, thanks for being so patient with me, i know it must be frustrating for you. This is the line that sorted it for me, i think:

If you don't already know it is in there then you just don't know. You can check but even if it isn't there then it is possible that it gets added between the time you look and try to insert. Your best choice is just to insert and if a unique constraint violation is thrown then you know that it already exists.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On