Results 1 to 6 of 6
  1. #1
    Join Date
    Apr 2015
    Posts
    3

    Is it a good idea to use "real world values" as primary key?

    Currently I am learning database design all by myself, no school/university/course whatsoever. While I really enjoy the freedom this gives me, sometimes it leaves me with questions that may be a bit crude, so please bare with me.

    One of the resources I found uses a university database as an example. This database contains a table with information for all the university's students, things like name, date of birth, address and telephone number and student ID (generated from the university). As primary key, an auto incremented column was used.

    Some people commented on how the usage of primary keys was all wrong here they suggested using the telephone number or student ID as primary key.

    While this could work for some time, I personally think it's a bad idea. For example, what if two or more people sharing a flat also share one telephone number and enter this number? Now, the same primary key would exist more that one time in the students table, which would probably end in a disaster.The same goes for the student ID as primary key: What if, for some reason, the software generating these IDs has a bug and one ID appears twice? Or the university decides that it's okay to issue the same student ID every 10 years or so?

    From my current perspective, it could be a good idea to not rely on information from the "outside world" for primary key, but always let the database create an auto incremented column and use these as primary key. Is this correct or am I overthinking the whole design from the "business side"? Or is this where compound keys come into play?

    Also, does every needs a primary key to access its information or are there cases where it's not required?

    Best regards

  2. #2
    Join Date
    Apr 2015
    Posts
    3

    Is it a good idea to use "real world values" as primary key?

    Currently I am learning database design all by myself, no school/university/course whatsoever. While I really enjoy the freedom this gives me, sometimes it leaves me with questions that may be a bit crude, so please bare with me.

    One of the resources I found uses a university database as an example. This database contains a table with information for all the university's students, things like name, date of birth, address and telephone number and student ID (generated from the university). As primary key, an auto incremented column was used.

    Some people commented on how the usage of primary keys was all wrong here they suggested using the telephone number or student ID as primary key.

    While this could work for some time, I personally think it's a bad idea. For example, what if two or more people sharing a flat also share one telephone number and enter this number? Now, the same primary key would exist more that one time in the students table, which would probably end in a disaster.The same goes for the student ID as primary key: What if, for some reason, the software generating these IDs has a bug and one ID appears twice? Or the university decides that it's okay to issue the same student ID every 10 years or so?

    From my current perspective, it could be a good idea to not rely on information from the "outside world" for primary key, but always let the database create an auto incremented column and use these as primary key. Is this correct or am I overthinking the whole design from the "business side"? Or is this where compound keys come into play?

    Also, does every needs a primary key to access its information or are there cases where it's not required?

    Best regards

  3. #3
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    You've probably covered the concepts of Natural Keys, Surrogate Keys, Candidate Keys, and Primary Keys. If you haven't, please Google them before you read on.

    I have a relatively radical opinion about primary keys. If the user can "see" any part of a Candidate Key, then that key should be excluded from the list of Candidate Keys. If the user can see a value, they will be tempted to change that value. Even if the user won't change the value, a regulatory agency still might change it.

    Natural Keys tend to be large, often "stringy things" with lots of bytes and formatting fluff. These characteristics make them grossly inefficient as PKs (Primary Keys). Natural Keys are good things because they are part of the real world problem definition, and they ought to be formally expressed and enforced by the database engine. Unfortunately the end user can "see" the Natural Key, so to me it is a poor primary key.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  4. #4
    Join Date
    Apr 2015
    Posts
    3
    Thanks, that already helped me big time!

    After reading a bit more on the keys you mentioned, I am able to rephrase my initial post more clearly: Because natural keys or candidate keys often include data that can be changed (at least theoretically), I think they make poor primary keys. To be really sure that a key is unique, it looks much safer to me to use a surrogate key.

    From ready your answer, you also tend to think like this, correct?

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    That's a pretty good start.

    I define a NK (Natural Key), and an SK (Surrogate Key) for nearly every table that I create. I define the NK as an AK (Alternate Key) with a corresponding Unique Constraint in the table definition. I define the SK as the PK (Primary Key), and I absolutely NEVER show that PK to the end user... Developers know about the SK, DBAs know about it too, but it absolutely NEVER appears on any screen, report, or anything else that the user might view.

    I think of the NK and the SK as synonyms. The NK is for the user, the SK is for the system (database, programs, etc.) and never the twain shall meet.

    This means that if the user decides to edit the NK, then they can do so. If there is a key conflict, the user has to decide how to cope with it because it is a user problem, not a system problem. Even if regulatory agencies or heaven help us legislators decide to tamper with the NK (which happens from time to time), then we can redefine the NK and modify the constraint that supports it without causing havok to the remainder of the system.

    A great example of this is the IDC-10 codes that are the basis of the healthcare diagnostic system. Many North American companies (hospitals, insurance companies, the Federal Government, etc.) used the old IDC-9-CM codes and stored those codes as integers... Unfortunately, the IDC-10 is alphanumeric. For companies that depend on the diagnosis codes being integers, good luck with the conversion. For companies with the SK and PK support that I have described, they need to modify and test one field change in the database and the places where that data is used on screens and reports. The conservative estimate is 175 times more work for the companies that depended on the diagnostic code being an integer compared to the companies that used the SK/PK scheme that I described.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  6. #6
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by usr01 View Post
    Because natural keys or candidate keys often include data that can be changed (at least theoretically), I think they make poor primary keys. To be really sure that a key is unique, it looks much safer to me to use a surrogate key.
    No, that does not follow at all. When we make a key in a database we are enforcing a rule (a constraint) that says the data is required to be unique. That means duplicate data cannot be entered into the key attribute(s). Enforcing uniqueness constraints is an essential feature for data integrity and also for usability: users need to be able to identify individual items of information in the database accurately, therefore they need keys that they can see, that serve the intended real world purpose of identification and that are guaranteed to be unique within the data. Note that surrogate keys do not serve the same purpose because an unseen, purely "technical" key cannot be used for identifying information outside the database. A surrogate does not by itself contribute to data integrity because no meaningful uniqueness constraint is enforced - every piece of data is arbitrarily assigned a new key whether it duplicates existing data or not.

    You mentioned a good example. Students are generally assigned a unique student identifier / roll number. Clearly this ought to be unique and a uniqueness constraint in the database would catch the kind of program bug you mentioned if a student was wrongly assigned a duplicate number. If you allowed duplicate student identifiers to enter the database without any alternative means of identification it might be hard or impossible to identify students accurately. You are right that telephone numbers would most likely not be a sensible way to identify students.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •