Results 1 to 15 of 15
  1. #1
    Join Date
    Nov 2009
    Posts
    5

    Question Seeking advice on normalisation

    Hi All,

    This is my first post here, and I'd be grateful for any advice you can give me.

    Basically, I have a university assignment that requires me to produce an ER diagram to model the informational requirements of some data. I've had no problem doing so until the end where I find I have two seperate entities with fields that hold similar (in some cases identical) information.

    e.g.

    Footballer| FootballerId, FootballerName, FootballerHometown

    Agent| AgentId, AgentName, AgentHometown

    My own inclination is to leave it well enough alone, but a friend insists that I should create a new entity (e.g. Hometown) and make it's primary key a foreign key of the above two entities:

    Hometown| HometownId, Hometown

    Can anyone tell me which is correct (or more correct)? Is there any justification for creating another entity (or for not creating it)?

    Sorry that this is such an elementary enquiry - but any advice will be gratefully received

    CAL.
    Last edited by CALGACUS; 11-12-09 at 16:08. Reason: Correcting spelling - bah!

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    If...
    ...you need to store any other information related to HomeTowns
    ...or you want to restrict the values that can be entered into the HomeTown column
    ...or you want to provide a list of towns from which your user can select
    ...or you will be joining the two tables on the HomeTown columns
    then you should consider a separate table of towns.
    Otherwise I would consider it like an address field and not split it off to another table.
    Your friend is not necessarily wrong with his advice, but deciding how far to normalize is a mixture of art and science, and in this case I think his approach is overkill. It really depends on how much your application will rely upon the consistency of the values in the HomeTown columns.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    Creating a Hometown table is more correct. For justification you will have to read the normalization rules.

    Given that it is at least possible (but unlikely) that an agent could also be a footballer I personally would create this as:

    Hometown: HometownID, Name
    Person: PersonID, Name, HometownID
    Footballer: PersonID
    Agent: PersonID

    Note: This isn't the first time I have disagreed with blindman. There are many ways to do things.
    Last edited by MarkATrombley; 11-12-09 at 16:45. Reason: Added note.

  4. #4
    Join Date
    Nov 2009
    Posts
    5

    Thumbs up

    Thanks blindman, that was pretty much my own thinking.

    - the only information held about HomeTown is its name
    - HomeTown could be anywhere (although British ones will predominate)
    - there won't be a list to chose from (as might be logical if the attribute was, for example, left or right handed)
    - the entities won't be joined on the HomeTown column

    I concede that my mate is right too (or at least not wrong) , but our lecturers have stressed that the ER diagrams that we all come up with will depend a lot upon assumptions that we make and how we justify them.

    I was worried that I'd completely lost the plot and was missing something.

    Thanks for the reassurance - and the blazingly fast reply,

    Cheers

    CAL.

  5. #5
    Join Date
    Nov 2009
    Posts
    5
    Thanks MarkATrombley,

    I can understand your reservations, presumably regarding having possible anomalies crop-up should an agent also be a footballer.

    In the relations that you posted - can I ask why you would have a Hometown entity, since the HometownName attribute is fully dependent upon the PersonId, couldn't you just include it in the Person entity?

    Thanks again.

  6. #6
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    You could put the Hometown name directly into the Person table. But then you will end up with 27 different spellings of the town Bingham. That might not matter in a classroom example, but in the real world sooner or later you will be asked to produce a report of all the footballers who live in Bingham and if it is spelled 27 different ways you won't be able to do it.

    Many here say you shouldn't worry about requirements that you haven't received yet, but I disagree. Experience has taught me that if you don't plan for the future requirements you will cause yourself a lot of future work.

    Another reason I would do it is to reduce the duplication of data. Seeing the town name repeated over and over makes me think of all the space that is being wasted.

  7. #7
    Join Date
    Nov 2009
    Posts
    5

    Thumbs up

    I understand now Mark, thanks for explaining that to me.

  8. #8
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    The problem with Trombley's thinking is illustrated reductio-ad-absurdem.
    If you decide to start storing Streets and house numbers, would you then create StreetName and HouseNumber tables as well?
    Perhaps yes if your application was primarily oriented around mailings and routes, but I'm guess from your descriptions that it is not.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  9. #9
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    You've never been asked to figure out how to do a direct mailing to your customers that live in the more expensive neighborhoods? I have, and I was stuck doing it with a customer table that had the address embedded in three string columns (with city, state, and zip stuck into one column!).

    I have designed databases with Street and Block tables, and sure enough those tables came in real handy when some odd requests started coming down from management.

    You and I will never agree. Fortunately we don't have to. The OP can take or leave what advice he likes.

  10. #10
    Join Date
    Nov 2009
    Posts
    5
    The OP is grateful to you both for taking the time to answer

    Actually, both answers were thoroughly useful. The assignment initially concerns modelling historical data that isn't going to be subject to any change - no chance of anomalies occuring.

    It then goes on to ask how the model could be ammended to provide better integrity, flexibility, etc. should a requirement arise that further data be added.

    Thanks to you both I am far more confident that I'm on the right track.

    Thanks again

  11. #11
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by MarkATrombley View Post
    I have designed databases with Street and Block tables, and sure enough those tables came in real handy when some odd requests started coming down from management.
    Just curious - does that mean these tables would be pre-populated so people could only enter valid data? is such data available and what happens when new roads get built? or would the user still enter such data as strings via a form and you'd store what was entered but in a more normalised format? in which case how would this improve a simple string based design.

  12. #12
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    In the last system I designed that cared about streets and blocks (it has been a while) the city, state, and zip code information was purchased. I am pretty sure we bought it from the U.S. post office but it may have come from the census people. Street names were initially pre-populated by gathering information from existing records, but once the system was up and running the users were conditionally allowed to add new street names. These were flagged as needing verification and there was a person who verified the spelling and any North/South/etc. information. Block information was only kept for special cases and was only entered by a special admin. It was primarily used to tell if a range of addresses for a given street were primarily for business, residential, rural, etc.

    I haven't done any retail or financial systems in quite a while so I haven't worked with street information lately. I am sure you can buy a lot more of this information now than you could back when I did it. And of course now we have GPS locations and GIS databases. Still, I am sure some would be pre-populated and other information would be conditionally added by the users. Except now verification would by done by using Mapquest or GoogleEarth A lot better than what they had to go through in the old days.

  13. #13
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    what are the odds that many people share the same personal address?

    You could have a separate address table with an id that is reference in the person_address table...but that's ludicrous

    Hometown: HometownID, Name
    Person: PersonID, Name, HometownID
    Footballer: PersonID
    Agent: PersonID
    We're not in Kansas anymore...we are in the land of surrogates
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  14. #14
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    I happen to like surrogate keys.

    In the case of the original question it was Hometown and not address, and the odds of many people living in the same town is pretty good But there are *lots* of good business reasons to keep address separate from person. A retailer sending catalogs to its customers would save money if they only sent one to each house instead of one to everybody in the house. A parts supplier would certainly only want to store the address for Ford's headquarters once instead of keeping it stored with every contact they had within Ford. And if you keep address separate from person you have the future option of keeping track of a persons summer home, winter home, work address, and the P.O. box they want their paystub mailed to.

    Again, all of this is how I would do things. I base my designs on my experiences, others have had different experiences.

  15. #15
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    I suppose it depends on whether he is designing a football database that also stores addresses, or an address database that also stores footballer data.
    Analysis-paralysis is a slippery slope.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •