If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Index ID Attribute vs. Primary Keys

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-07-09, 16:23
Inlish Inlish is offline
Registered User
 
Join Date: Jun 2009
Posts: 6
Index ID Attribute vs. Primary Keys

Hi,

I am very new to database and data modeling so bare with me. The realm of my question is about database design and ERD for the sake of the lingo. I have a set of identical attributes persisting in multiple entities/objects. For instance,

University (country, state, city, name, etc.)
Company (country, state, city, name, description, etc.)
Institution (country, state, city, name, field, etc.)


As it appears, (country, state, city) are repeated in each of these entities (i.e. University, Company, etc.), therefore, for some, there would be a duplicate in the tables (i.e. multiple companies, universities, etc. may be located in the same location). So I created a new entity, Location, which encapsulates these attributes (country, state, city).

Location, for obvious reason, has (country, city) as the primary keys -- I could also augment state to the PK list but it would be irrelevant to my question. Now, in order to use each of Location instances in other child tables (i.e. University, Company, etc.), either I have to reference country-city-(and/or state), which defeats the whole purpose of avoiding spurious identical set of data in multiple tables/entities (as a matter of fact, in this case, I just managed to create an extra table in addition to the duplicates that had gone to any other table requiring to have a (country, state, city) attributes), or I could create an "id" for each of the Location's to uniquely identify the set -- we call it "locationid".

Obviously locationid can uniquely specifies each instance but we also have to have (country, city) as primary keys because otherwise, we can insert same set of (country, city) but with different locationid number. For example, if (country, city) are not PK, we can have:

locationid=0 , country=US , city=New York
locationid=1 , country=FR , city=Paris
locationid=2 , country=RU , city=Moscow
locationid=3 , country=FR , city=Paris


Since (country, city) are not PK but locationid is PK, then one can insert the same set of data but have it identified with different locationid's. But if (country, city) are to be PK, therefore unique which prevents duplication of country-city value in Location, then for any reference to the Location instance in University or Company table, we must include (country, city) along with locationid (or must I?) -- that, once again, causes for basically all the attributes in the Location object to be duplicated in other referencing entities (i.e. University, Institution, etc.).


My question is, can I have the locationid uniquely identifies each row in Location so I can only use that attribute as foreign key in other child tables (i.e. University, Company, etc.) but preserve the uniqueness of (country, city) attributes at the same time to avoid duplications in the Location table? Or if I need to maintain a uniqueness of (country, city), I have no other choice to mark them as PK and use them in referencing tables along with locationid, in which case, I guess I won't be needing locationid?

Sincerely,
Reply With Quote
  #2 (permalink)  
Old 06-07-09, 19:12
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by Inlish
I am very new to database and data modeling so bare with me.
bear with me = have patience

bare with me = let's get undressed together



if you were modelling people instead of universities, companies, and institutions, and you happened to notice that several people shared the same first name, e.g. john, mary, etc., would you set up a separate first names table?

no, eh

locations should have their own separate table only if you were interested in the locations in and of themselves, as locations, regardless of whether there were any universities, companies, or institutions located there

remove the locations entity, and go back and start the modelling over again

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 06-08-09, 00:11
Inlish Inlish is offline
Registered User
 
Join Date: Jun 2009
Posts: 6
Thank you for your reply and sorry about the typo. It must have somehow found its way through my fingers when my brain misfired -- it happens.

Quote:
Originally Posted by r937
if you were modelling people instead of universities, companies, and institutions, and you happened to notice that several people shared the same first name, e.g. john, mary, etc., would you set up a separate first names table?
Well, in case of the firstname, we are simply talking about one attribute which by itself certainly does not represent the PK for the object. However, in case of the Location, several attributes are perhaps shared among multiple entities. For instance, there could be thousands of businesses, companies, universities, or institutions located in the same city. If each was to use the same set of attributes (country, city, state, etc.), wouldn't this call for a separate object to avoid duplication? I apology if my concern stems from naivety but I presume, joining multiple tables for such scenario is deemed more costly than having these locations as part of entities, even though there could be multiple objects that share the same set of attributes (i.e. University, Institution, Company, Business, etc.).

Quote:
Originally Posted by r937
locations should have their own separate table only if you were interested in the locations in and of themselves, as locations, regardless of whether there were any universities, companies, or institutions located there
So, as far as proper data modeling, I should never spawn a new entity for the sake of eliminating spurious "set" of duplications in multiple tables? And just to play a devil's advocate, assuming there was a situation where Location entity had locationid, country, city, as primary keys, would I have to use all of them as FK in child tables or would merely locationid be suffice? Or all the PK must be referenced in the child table? Also, would it be sane to rather create a relation table between say, Location and each of the other objects having locations:

University (name, description, etc.)
University_Location (locationid, name)
Location (locationid, country, city, state, etc.)

as opposed to the initial scenario, which you also reprimanded:

University (name, locationid, description, etc.)
Location (locationid, country, city, state, etc.)

Or the former ER is bound to be redundant in its nature? I suppose, at this point, I am more interested to know how the multiple PK's are referenced in different tables. If there is a unique index for each instance as a part of the PK, would that very PK attribute be suffice to be referenced else where?




Sincerely,
Reply With Quote
  #4 (permalink)  
Old 06-08-09, 02:16
dportas dportas is offline
Registered User
 
Join Date: Dec 2007
Location: London, UK
Posts: 732
In your example, what would be the key(s) of the University, Company and Institution tables? It matters because it may be that you didn't need three tables for this information in the first place. (As an aside, it never ceases to amaze me that people post data modelling questions without identifying the keys in their relational models. But don't feel bad - you aren't the first and won't be the last )

There are other possible answers, depending on what methodologies and design principles you use. "E-R modeling" alone is little more than a diagramming convention and doesn't really give you the tools to deal with such issues.

My advice to you is that you study and apply some of the well proven data modeling techniques. I don't think it will help you to discuss one isolated example in an online forum.

One very good book is Terry Halpin's "Information Modeling and Relational Databases", which takes a fact-based approach to data modelling. Another principle directly relevant to your question is Orthogonal Design (see the book "Date on Database" by Chris Date).
Reply With Quote
  #5 (permalink)  
Old 06-08-09, 14:41
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Quote:
Originally Posted by r937
bear with me = have patience
bear with me! = help! I fell over the railing at the zoo!
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #6 (permalink)  
Old 06-08-09, 14:50
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by Inlish
Well, in case of the firstname, we are simply talking about one attribute which by itself certainly does not represent the PK for the object.
this is totally not true

a firstnames table would certainly have to have the firstname as one of the columns, agreed?

therefore, firstname would make the most appropriate PK for a firstnames table

oh, yeah, you could assign an auto_increment/identity/serial number as the PK, but it would be a surrogate key, and an appropriate natural key ~always~ trumps a surrogate key except in some rare circumstances

my point about not abstracting the locations out of your tables still stands -- if you are not interested in the locations in and of themselves regardless of whether a univerity, company, or whatever is located there, then don't split out locations as a separate entity

you don't ~have~ to set up entities for every possible attribute, just the ones that you need in order to capture the essential logical relationships in the data
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #7 (permalink)  
Old 06-08-09, 15:44
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Quote:
Originally Posted by r937
therefore, firstname would make the most appropriate PK for a firstnames table

oh, yeah, you could assign an auto_increment/identity/serial number as the PK, but it would be a surrogate key, and an appropriate natural key ~always~ trumps a surrogate key except in some rare circumstances
As you know, I and others always disagree with you on this, Rudy except in some rare circumstances
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #8 (permalink)  
Old 06-08-09, 17:34
Inlish Inlish is offline
Registered User
 
Join Date: Jun 2009
Posts: 6
Quote:
Originally Posted by r937
this is totally not true

a firstnames table would certainly have to have the firstname as one of the columns, agreed?

therefore, firstname would make the most appropriate PK for a firstnames table

Well, that assuming we would be needing such entity in the first place and given a cogent justification for its creation which you reasoned for identical names (first in this case), we could devise such object. Base on that assumption, you are right. I too was sophomorically trying to justify the existence of such object which you explained why I should avoid.


Quote:
Originally Posted by r937
my point about not abstracting the locations out of your tables still stands -- if you are not interested in the locations in and of themselves regardless of whether a univerity, company, or whatever is located there, then don't split out locations as a separate entity

you don't ~have~ to set up entities for every possible attribute, just the ones that you need in order to capture the essential logical relationships in the data
Now I understand why I probably won't be needing such entity in the first place as some of its attributes should essentially play a role of PK in other entities. But I also take your advice for such redundant convolution where "factoring" would be a silly task to undertake.

You have to excuse me as I have been studying database for only a week; I need to realign my thought processing from the usual programming paradigms to set and declarative data modeling. What I was trying to accomplish basically to abstract a common set of attributes into one object to -- in my mind -- reduce duplications. It might be a meritorious goal but conceptually erroneous. Silly me.
Reply With Quote
  #9 (permalink)  
Old 06-08-09, 17:45
Inlish Inlish is offline
Registered User
 
Join Date: Jun 2009
Posts: 6
Quote:
Originally Posted by dportas
In your example, what would be the key(s) of the University, Company and Institution tables? It matters because it may be that you didn't need three tables for this information in the first place. (As an aside, it never ceases to amaze me that people post data modelling questions without identifying the keys in their relational models. But don't feel bad - you aren't the first and won't be the last
You are absolutely right. Right after I posted my second response, I realized that I do need certain attributes in my quasi-entity, Location, in order to properly identify unique PK in those child objects (i.e. University, Company, etc.). As I mentioned before, I am barely starting to learn database so I am not entirely familiar with the proper method of presenting my questions. I'll try to be more comprehensive as I progress.


Quote:
Originally Posted by dportas
My advice to you is that you study and apply some of the well proven data modeling techniques. I don't think it will help you to discuss one isolated example in an online forum.

One very good book is Terry Halpin's "Information Modeling and Relational Databases", which takes a fact-based approach to data modelling. Another principle directly relevant to your question is Orthogonal Design (see the book "Date on Database" by Chris Date).
Appreciate your suggestions. Halpin's book seems very interesting and I just managed to browse a few pages out of Data on Database excerpt. I hadn't realized that other than ER, there are other convention such ORM. Something to look in to.


P.S. Not trying to be a spelling Nazi, as I am more guilty of than I'd like to admit, but why is it everybody is misspelling "modeling" (modelling -- notice the double "l")?
Reply With Quote
  #10 (permalink)  
Old 06-08-09, 19:43
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by blindman
As you know, I and others always disagree with you on this, Rudy except in some rare circumstances
that's okay

my post wasn't intended for you

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #11 (permalink)  
Old 06-08-09, 19:49
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by Inlish
...why is it everybody is misspelling "modeling" (modelling -- notice the double "l")?
i never!!
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On