Results 1 to 11 of 11
  1. #1
    Join Date
    Jun 2009
    Posts
    6

    Index ID Attribute vs. Primary Keys

    Hi,

    I am very new to database and data modeling so bare with me. The realm of my question is about database design and ERD for the sake of the lingo. I have a set of identical attributes persisting in multiple entities/objects. For instance,

    University (country, state, city, name, etc.)
    Company (country, state, city, name, description, etc.)
    Institution (country, state, city, name, field, etc.)


    As it appears, (country, state, city) are repeated in each of these entities (i.e. University, Company, etc.), therefore, for some, there would be a duplicate in the tables (i.e. multiple companies, universities, etc. may be located in the same location). So I created a new entity, Location, which encapsulates these attributes (country, state, city).

    Location, for obvious reason, has (country, city) as the primary keys -- I could also augment state to the PK list but it would be irrelevant to my question. Now, in order to use each of Location instances in other child tables (i.e. University, Company, etc.), either I have to reference country-city-(and/or state), which defeats the whole purpose of avoiding spurious identical set of data in multiple tables/entities (as a matter of fact, in this case, I just managed to create an extra table in addition to the duplicates that had gone to any other table requiring to have a (country, state, city) attributes), or I could create an "id" for each of the Location's to uniquely identify the set -- we call it "locationid".

    Obviously locationid can uniquely specifies each instance but we also have to have (country, city) as primary keys because otherwise, we can insert same set of (country, city) but with different locationid number. For example, if (country, city) are not PK, we can have:

    locationid=0 , country=US , city=New York
    locationid=1 , country=FR , city=Paris
    locationid=2 , country=RU , city=Moscow
    locationid=3 , country=FR , city=Paris


    Since (country, city) are not PK but locationid is PK, then one can insert the same set of data but have it identified with different locationid's. But if (country, city) are to be PK, therefore unique which prevents duplication of country-city value in Location, then for any reference to the Location instance in University or Company table, we must include (country, city) along with locationid (or must I?) -- that, once again, causes for basically all the attributes in the Location object to be duplicated in other referencing entities (i.e. University, Institution, etc.).


    My question is, can I have the locationid uniquely identifies each row in Location so I can only use that attribute as foreign key in other child tables (i.e. University, Company, etc.) but preserve the uniqueness of (country, city) attributes at the same time to avoid duplications in the Location table? Or if I need to maintain a uniqueness of (country, city), I have no other choice to mark them as PK and use them in referencing tables along with locationid, in which case, I guess I won't be needing locationid?

    Sincerely,

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Inlish
    I am very new to database and data modeling so bare with me.
    bear with me = have patience

    bare with me = let's get undressed together



    if you were modelling people instead of universities, companies, and institutions, and you happened to notice that several people shared the same first name, e.g. john, mary, etc., would you set up a separate first names table?

    no, eh

    locations should have their own separate table only if you were interested in the locations in and of themselves, as locations, regardless of whether there were any universities, companies, or institutions located there

    remove the locations entity, and go back and start the modelling over again

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jun 2009
    Posts
    6
    Thank you for your reply and sorry about the typo. It must have somehow found its way through my fingers when my brain misfired -- it happens.

    Quote Originally Posted by r937
    if you were modelling people instead of universities, companies, and institutions, and you happened to notice that several people shared the same first name, e.g. john, mary, etc., would you set up a separate first names table?
    Well, in case of the firstname, we are simply talking about one attribute which by itself certainly does not represent the PK for the object. However, in case of the Location, several attributes are perhaps shared among multiple entities. For instance, there could be thousands of businesses, companies, universities, or institutions located in the same city. If each was to use the same set of attributes (country, city, state, etc.), wouldn't this call for a separate object to avoid duplication? I apology if my concern stems from naivety but I presume, joining multiple tables for such scenario is deemed more costly than having these locations as part of entities, even though there could be multiple objects that share the same set of attributes (i.e. University, Institution, Company, Business, etc.).

    Quote Originally Posted by r937
    locations should have their own separate table only if you were interested in the locations in and of themselves, as locations, regardless of whether there were any universities, companies, or institutions located there
    So, as far as proper data modeling, I should never spawn a new entity for the sake of eliminating spurious "set" of duplications in multiple tables? And just to play a devil's advocate, assuming there was a situation where Location entity had locationid, country, city, as primary keys, would I have to use all of them as FK in child tables or would merely locationid be suffice? Or all the PK must be referenced in the child table? Also, would it be sane to rather create a relation table between say, Location and each of the other objects having locations:

    University (name, description, etc.)
    University_Location (locationid, name)
    Location (locationid, country, city, state, etc.)

    as opposed to the initial scenario, which you also reprimanded:

    University (name, locationid, description, etc.)
    Location (locationid, country, city, state, etc.)

    Or the former ER is bound to be redundant in its nature? I suppose, at this point, I am more interested to know how the multiple PK's are referenced in different tables. If there is a unique index for each instance as a part of the PK, would that very PK attribute be suffice to be referenced else where?




    Sincerely,

  4. #4
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    In your example, what would be the key(s) of the University, Company and Institution tables? It matters because it may be that you didn't need three tables for this information in the first place. (As an aside, it never ceases to amaze me that people post data modelling questions without identifying the keys in their relational models. But don't feel bad - you aren't the first and won't be the last )

    There are other possible answers, depending on what methodologies and design principles you use. "E-R modeling" alone is little more than a diagramming convention and doesn't really give you the tools to deal with such issues.

    My advice to you is that you study and apply some of the well proven data modeling techniques. I don't think it will help you to discuss one isolated example in an online forum.

    One very good book is Terry Halpin's "Information Modeling and Relational Databases", which takes a fact-based approach to data modelling. Another principle directly relevant to your question is Orthogonal Design (see the book "Date on Database" by Chris Date).

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by r937
    bear with me = have patience
    bear with me! = help! I fell over the railing at the zoo!
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Inlish
    Well, in case of the firstname, we are simply talking about one attribute which by itself certainly does not represent the PK for the object.
    this is totally not true

    a firstnames table would certainly have to have the firstname as one of the columns, agreed?

    therefore, firstname would make the most appropriate PK for a firstnames table

    oh, yeah, you could assign an auto_increment/identity/serial number as the PK, but it would be a surrogate key, and an appropriate natural key ~always~ trumps a surrogate key except in some rare circumstances

    my point about not abstracting the locations out of your tables still stands -- if you are not interested in the locations in and of themselves regardless of whether a univerity, company, or whatever is located there, then don't split out locations as a separate entity

    you don't ~have~ to set up entities for every possible attribute, just the ones that you need in order to capture the essential logical relationships in the data
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Quote Originally Posted by r937
    therefore, firstname would make the most appropriate PK for a firstnames table

    oh, yeah, you could assign an auto_increment/identity/serial number as the PK, but it would be a surrogate key, and an appropriate natural key ~always~ trumps a surrogate key except in some rare circumstances
    As you know, I and others always disagree with you on this, Rudy except in some rare circumstances
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  8. #8
    Join Date
    Jun 2009
    Posts
    6
    Quote Originally Posted by r937
    this is totally not true

    a firstnames table would certainly have to have the firstname as one of the columns, agreed?

    therefore, firstname would make the most appropriate PK for a firstnames table

    Well, that assuming we would be needing such entity in the first place and given a cogent justification for its creation which you reasoned for identical names (first in this case), we could devise such object. Base on that assumption, you are right. I too was sophomorically trying to justify the existence of such object which you explained why I should avoid.


    Quote Originally Posted by r937
    my point about not abstracting the locations out of your tables still stands -- if you are not interested in the locations in and of themselves regardless of whether a univerity, company, or whatever is located there, then don't split out locations as a separate entity

    you don't ~have~ to set up entities for every possible attribute, just the ones that you need in order to capture the essential logical relationships in the data
    Now I understand why I probably won't be needing such entity in the first place as some of its attributes should essentially play a role of PK in other entities. But I also take your advice for such redundant convolution where "factoring" would be a silly task to undertake.

    You have to excuse me as I have been studying database for only a week; I need to realign my thought processing from the usual programming paradigms to set and declarative data modeling. What I was trying to accomplish basically to abstract a common set of attributes into one object to -- in my mind -- reduce duplications. It might be a meritorious goal but conceptually erroneous. Silly me.

  9. #9
    Join Date
    Jun 2009
    Posts
    6
    Quote Originally Posted by dportas
    In your example, what would be the key(s) of the University, Company and Institution tables? It matters because it may be that you didn't need three tables for this information in the first place. (As an aside, it never ceases to amaze me that people post data modelling questions without identifying the keys in their relational models. But don't feel bad - you aren't the first and won't be the last
    You are absolutely right. Right after I posted my second response, I realized that I do need certain attributes in my quasi-entity, Location, in order to properly identify unique PK in those child objects (i.e. University, Company, etc.). As I mentioned before, I am barely starting to learn database so I am not entirely familiar with the proper method of presenting my questions. I'll try to be more comprehensive as I progress.


    Quote Originally Posted by dportas
    My advice to you is that you study and apply some of the well proven data modeling techniques. I don't think it will help you to discuss one isolated example in an online forum.

    One very good book is Terry Halpin's "Information Modeling and Relational Databases", which takes a fact-based approach to data modelling. Another principle directly relevant to your question is Orthogonal Design (see the book "Date on Database" by Chris Date).
    Appreciate your suggestions. Halpin's book seems very interesting and I just managed to browse a few pages out of Data on Database excerpt. I hadn't realized that other than ER, there are other convention such ORM. Something to look in to.


    P.S. Not trying to be a spelling Nazi, as I am more guilty of than I'd like to admit, but why is it everybody is misspelling "modeling" (modelling -- notice the double "l")?

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by blindman
    As you know, I and others always disagree with you on this, Rudy except in some rare circumstances
    that's okay

    my post wasn't intended for you

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Inlish
    ...why is it everybody is misspelling "modeling" (modelling -- notice the double "l")?
    i never!!
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •