If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Seeking advice on normalisation

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-12-09, 15:05
CALGACUS CALGACUS is offline
Registered User
 
Join Date: Nov 2009
Posts: 5
Question Seeking advice on normalisation

Hi All,

This is my first post here, and I'd be grateful for any advice you can give me.

Basically, I have a university assignment that requires me to produce an ER diagram to model the informational requirements of some data. I've had no problem doing so until the end where I find I have two seperate entities with fields that hold similar (in some cases identical) information.

e.g.

Footballer| FootballerId, FootballerName, FootballerHometown

Agent| AgentId, AgentName, AgentHometown

My own inclination is to leave it well enough alone, but a friend insists that I should create a new entity (e.g. Hometown) and make it's primary key a foreign key of the above two entities:

Hometown| HometownId, Hometown

Can anyone tell me which is correct (or more correct)? Is there any justification for creating another entity (or for not creating it)?

Sorry that this is such an elementary enquiry - but any advice will be gratefully received

CAL.

Last edited by CALGACUS; 11-12-09 at 15:08. Reason: Correcting spelling - bah!
Reply With Quote
  #2 (permalink)  
Old 11-12-09, 15:39
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
If...
...you need to store any other information related to HomeTowns
...or you want to restrict the values that can be entered into the HomeTown column
...or you want to provide a list of towns from which your user can select
...or you will be joining the two tables on the HomeTown columns
then you should consider a separate table of towns.
Otherwise I would consider it like an address field and not split it off to another table.
Your friend is not necessarily wrong with his advice, but deciding how far to normalize is a mixture of art and science, and in this case I think his approach is overkill. It really depends on how much your application will rely upon the consistency of the values in the HomeTown columns.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #3 (permalink)  
Old 11-12-09, 15:44
MarkATrombley MarkATrombley is offline
Registered User
 
Join Date: Jul 2009
Location: Michigan
Posts: 125
Creating a Hometown table is more correct. For justification you will have to read the normalization rules.

Given that it is at least possible (but unlikely) that an agent could also be a footballer I personally would create this as:

Hometown: HometownID, Name
Person: PersonID, Name, HometownID
Footballer: PersonID
Agent: PersonID

Note: This isn't the first time I have disagreed with blindman. There are many ways to do things.

Last edited by MarkATrombley; 11-12-09 at 15:45. Reason: Added note.
Reply With Quote
  #4 (permalink)  
Old 11-12-09, 15:51
CALGACUS CALGACUS is offline
Registered User
 
Join Date: Nov 2009
Posts: 5
Thumbs up

Thanks blindman, that was pretty much my own thinking.

- the only information held about HomeTown is its name
- HomeTown could be anywhere (although British ones will predominate)
- there won't be a list to chose from (as might be logical if the attribute was, for example, left or right handed)
- the entities won't be joined on the HomeTown column

I concede that my mate is right too (or at least not wrong) , but our lecturers have stressed that the ER diagrams that we all come up with will depend a lot upon assumptions that we make and how we justify them.

I was worried that I'd completely lost the plot and was missing something.

Thanks for the reassurance - and the blazingly fast reply,

Cheers

CAL.
Reply With Quote
  #5 (permalink)  
Old 11-12-09, 16:05
CALGACUS CALGACUS is offline
Registered User
 
Join Date: Nov 2009
Posts: 5
Thanks MarkATrombley,

I can understand your reservations, presumably regarding having possible anomalies crop-up should an agent also be a footballer.

In the relations that you posted - can I ask why you would have a Hometown entity, since the HometownName attribute is fully dependent upon the PersonId, couldn't you just include it in the Person entity?

Thanks again.
Reply With Quote
  #6 (permalink)  
Old 11-12-09, 16:35
MarkATrombley MarkATrombley is offline
Registered User
 
Join Date: Jul 2009
Location: Michigan
Posts: 125
You could put the Hometown name directly into the Person table. But then you will end up with 27 different spellings of the town Bingham. That might not matter in a classroom example, but in the real world sooner or later you will be asked to produce a report of all the footballers who live in Bingham and if it is spelled 27 different ways you won't be able to do it.

Many here say you shouldn't worry about requirements that you haven't received yet, but I disagree. Experience has taught me that if you don't plan for the future requirements you will cause yourself a lot of future work.

Another reason I would do it is to reduce the duplication of data. Seeing the town name repeated over and over makes me think of all the space that is being wasted.
Reply With Quote
  #7 (permalink)  
Old 11-12-09, 16:54
CALGACUS CALGACUS is offline
Registered User
 
Join Date: Nov 2009
Posts: 5
Thumbs up

I understand now Mark, thanks for explaining that to me.
Reply With Quote
  #8 (permalink)  
Old 11-12-09, 17:51
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
The problem with Trombley's thinking is illustrated reductio-ad-absurdem.
If you decide to start storing Streets and house numbers, would you then create StreetName and HouseNumber tables as well?
Perhaps yes if your application was primarily oriented around mailings and routes, but I'm guess from your descriptions that it is not.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #9 (permalink)  
Old 11-12-09, 18:11
MarkATrombley MarkATrombley is offline
Registered User
 
Join Date: Jul 2009
Location: Michigan
Posts: 125
You've never been asked to figure out how to do a direct mailing to your customers that live in the more expensive neighborhoods? I have, and I was stuck doing it with a customer table that had the address embedded in three string columns (with city, state, and zip stuck into one column!).

I have designed databases with Street and Block tables, and sure enough those tables came in real handy when some odd requests started coming down from management.

You and I will never agree. Fortunately we don't have to. The OP can take or leave what advice he likes.
Reply With Quote
  #10 (permalink)  
Old 11-12-09, 18:29
CALGACUS CALGACUS is offline
Registered User
 
Join Date: Nov 2009
Posts: 5
The OP is grateful to you both for taking the time to answer

Actually, both answers were thoroughly useful. The assignment initially concerns modelling historical data that isn't going to be subject to any change - no chance of anomalies occuring.

It then goes on to ask how the model could be ammended to provide better integrity, flexibility, etc. should a requirement arise that further data be added.

Thanks to you both I am far more confident that I'm on the right track.

Thanks again
Reply With Quote
  #11 (permalink)  
Old 11-12-09, 19:51
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by MarkATrombley View Post
I have designed databases with Street and Block tables, and sure enough those tables came in real handy when some odd requests started coming down from management.
Just curious - does that mean these tables would be pre-populated so people could only enter valid data? is such data available and what happens when new roads get built? or would the user still enter such data as strings via a form and you'd store what was entered but in a more normalised format? in which case how would this improve a simple string based design.
Reply With Quote
  #12 (permalink)  
Old 11-12-09, 20:30
MarkATrombley MarkATrombley is offline
Registered User
 
Join Date: Jul 2009
Location: Michigan
Posts: 125
In the last system I designed that cared about streets and blocks (it has been a while) the city, state, and zip code information was purchased. I am pretty sure we bought it from the U.S. post office but it may have come from the census people. Street names were initially pre-populated by gathering information from existing records, but once the system was up and running the users were conditionally allowed to add new street names. These were flagged as needing verification and there was a person who verified the spelling and any North/South/etc. information. Block information was only kept for special cases and was only entered by a special admin. It was primarily used to tell if a range of addresses for a given street were primarily for business, residential, rural, etc.

I haven't done any retail or financial systems in quite a while so I haven't worked with street information lately. I am sure you can buy a lot more of this information now than you could back when I did it. And of course now we have GPS locations and GIS databases. Still, I am sure some would be pre-populated and other information would be conditionally added by the users. Except now verification would by done by using Mapquest or GoogleEarth A lot better than what they had to go through in the old days.
Reply With Quote
  #13 (permalink)  
Old 11-12-09, 21:47
Brett Kaiser Brett Kaiser is offline
Window Washer
 
Join Date: Nov 2002
Location: Jersey
Posts: 10,303
what are the odds that many people share the same personal address?

You could have a separate address table with an id that is reference in the person_address table...but that's ludicrous

Quote:
Hometown: HometownID, Name
Person: PersonID, Name, HometownID
Footballer: PersonID
Agent: PersonID
We're not in Kansas anymore...we are in the land of surrogates
__________________
Brett
8-)

It's a Great Day for America everybody!

dbforums Yak CorralRadio 'Rita
dbForums Member List
I'm Good Once as I ever was

The physical order of data in a database has no meaning.
Reply With Quote
  #14 (permalink)  
Old 11-12-09, 22:44
MarkATrombley MarkATrombley is offline
Registered User
 
Join Date: Jul 2009
Location: Michigan
Posts: 125
I happen to like surrogate keys.

In the case of the original question it was Hometown and not address, and the odds of many people living in the same town is pretty good But there are *lots* of good business reasons to keep address separate from person. A retailer sending catalogs to its customers would save money if they only sent one to each house instead of one to everybody in the house. A parts supplier would certainly only want to store the address for Ford's headquarters once instead of keeping it stored with every contact they had within Ford. And if you keep address separate from person you have the future option of keeping track of a persons summer home, winter home, work address, and the P.O. box they want their paystub mailed to.

Again, all of this is how I would do things. I base my designs on my experiences, others have had different experiences.
Reply With Quote
  #15 (permalink)  
Old 11-12-09, 23:01
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
I suppose it depends on whether he is designing a football database that also stores addresses, or an address database that also stores footballer data.
Analysis-paralysis is a slippery slope.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On