Results 1 to 14 of 14
  1. #1
    Join Date
    Jan 2005
    Posts
    19

    Primary Key values

    Are there any recommendations on the types of values that should be used for primary keys?

    Are integers recommended or wil text (char/varchar) do too?
    Perhaps its more dependent on the actual situation?

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Any column or group of columns that supports enough unique values for your intended use can be a candidate key. The datatype(s) don't really matter as long as they can meet that criteria.

    I'm a strong proponent of surrogate keys. GUIDs are my first preference, but integers are also a good substitute.

    There are a number of folks that prefer natural keys, based on one or more columns that make sense to the user. This makes the key easier to put in a "real world" context, at the expense of making the key vulnerable to user's wanting to change the key value.

    -PatP

  3. #3
    Join Date
    Jan 2005
    Posts
    19
    What are recommendations when the primary key is used in subsequent tables, probably ending in in the largest table (most rows). I believe I should not be taking the entire primary key (gettings 3/4 columns) and add some more columns to it?

  4. #4
    Join Date
    Jan 2005
    Posts
    19
    Its more about optimization. Maybe that a reduced size will increase performance, though it will require an additional join in some (probably remote) cases?

  5. #5
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Perfomance should be the last consideration while doing design.

    First, make sure your design meets the requirements of the business. When appropriate, carry over composite primary keys to other tables as foreign keys. Even if this composite foreign key becomes a part of the primary key of the child table, do not worry about performance.

    At work here, we join tables with 4 to 5 million records to other large tables, all tables having composite keys. Performance has never been an issue.

    Only if your database performs sluggishly should you start looking at other options. Try indexing first, then maybe partitioning. Change the logical design only if absolutely necessary.

    Hope that helps.

    Ravi

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by Pat Phelan
    There are a number of folks that prefer natural keys, based on one or more columns that make sense to the user. This makes the key easier to put in a "real world" context, at the expense of making the key vulnerable to user's wanting to change the key value.
    i'm going to take issue with the above

    first of all, natural keys are not keys "that make sense to the user"

    what a horrible concept, pat, and shame on you for muddying the water that way

    whether or not a database table uses a surrogate key has nothing to do with users wanting to change a key value

    if the table has a surrogate key, then it also has at least one other candidate key (what i like to call the "real" primary key)

    if users want to change a value of this candidate key, they will want to do so whether or not the database was implemented with a surrogate key

    if it was implemented with a surrogate key, then the change is straightforward

    if it was implemented without a surrogate key, then the change is also straightforward, but requires ON UPDATE CASCADE to have been declared for related tables, assuming there are related tables

    there may be a performance issue with ON UPDATE CASCADE, but this is a separate discussion, and should involve not only the technical consideration of relational integrity, but also a statistical analysis of how often a key value changes

    for example, suppose you register cars in your database by vehicle number -- how many cars are you going to change the VIN for and still say they are the same car? this is not just some willy-nilly attribute we're thinking of changing, it is a key value which confers entity identity

    the fact that you implement a table using a natural key has absolutely no bearing on whether users want to change it or not

    nor should you simply declare a surrogate key because once every million cars you might want to correct a typo on the VIN

    i agree with ravi and will go even a bit further -- when you are in the logical design stage, still trying to make sense of which entities you have, and how they are related, you should never think of surrogate keys, because if you start assigning surrogate keys in logical design, you will cloud the issue with regard to what the real keys are, and then your related intersection/junction entities could end up with superfluous, unnecessary, or incorrect keys

    during logical design, surrogate keys are the spawn of the devil!!!
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Ok, I'm game for debate on this topic.

    So, Rudy claims that a natural key won't make sense to the user. If that is the case, his definition of a natural key doesn't meet my definition of a natural key.

    Every entity (table) has candidate keys, including a surrogate key (one derived by the database, solely for the purposes of relational management, wholly unrelated to the entity in any other way). One or more of those candidate keys is declared by the data modeler to be the primary key. The primary key is, by definition, the final arbitrator for determining a unique value to identify a given row.

    A foreign key is a copy of an entities primary key that is stored in a different entity, in order to allow them to show the relationship of one row to another. As an example, when tracking details about a vehicle you might include the primary key of the vehicle's manufacturer.

    Most entities have a natural key, such as Rudy's example of a VIN for vehicles. A natural key (like all candidate keys) must uniquely identify a vehicle. A natural key is also (at least by my definition) a value that means something to the end user.

    Because a natural key has meaning to a user, there is some reason to believe that they might want to change that value or part of it at some time. As Rudy points out, poorly designed database models can accomodate these changes using cascades.

    A cascade (whether update or delete) is a feature of some database engines that allows a change to a primary key to be propogated to any other entity that references it. Using my previous example, if you changed the primary id of the manufacturer, that change would cascade into every vehicle made by that manufacturer. At least in my opinion, entities ought to be atomic, so that no change to a value in one object forces changes to values other objects, meaning that I'm not a big fan of cascades!

    Since a candidate key for EVERY table is a surrogate key, I start there. This simplifies design by orders of magnitude. There are cases where a natural key makes sense. There are cases where no natural key is available. While I always try to keep an open mind, I prefer to start with something I know will work, then make changes as they are needed.

    -PatP

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    please, do not put words in my mouth

    i did not say that a natural key "won't make sense to the user"

    many of them do!

    and some natural keys do not make sense to the user

    i said that making sense to the user is not what defines a natural key as a natural key

    it is the difference between "some A are B", "all A are B", and "A implies B"

    that's Logic 101, pat, and you would do well to review the differences

    some surrogates make sense to the user too



    your comment that starting logical design with surrogate keys "simplifies design by orders of magnitude" is preposterous

    i know that using words like "preposterous" is not considered to be rational debating strategy, but in this case the adjective is fully warranted

    if i told you once, i told you a million times, don't exaggerate

    "orders of magnitude"

    what a load of bollocks!!
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Quote Originally Posted by r937
    please, do not put words in my mouth

    i did not say that a natural key "won't make sense to the user"
    In your first response, didn't you say:
    Quote Originally Posted by r937
    first of all, natural keys are not keys "that make sense to the user"
    Maybe I "missed a meeting" on that one somehow.
    Quote Originally Posted by r937
    your comment that starting logical design with surrogate keys "simplifies design by orders of magnitude" is preposterous

    i know that using words like "preposterous" is not considered to be rational debating strategy, but in this case the adjective is fully warranted

    if i told you once, i told you a million times, don't exaggerate

    "orders of magnitude"

    what a load of bollocks!!
    You milage may vary, but for me starting with a basis that is by definition workable, then branching out where appropriate works better. I can use a surrogate and know that it will work, since relational algebra uses surrogates for every relationship (per Corcoran) and allows the use of other attributes. I know that a design using a surrogate key will work. Since the database creates the key values according to its needs, those keys don't interfere with anything that the user needs or wants to do with their (application based) data.

    I have no problem with maintaining alternate keys (the candidates that are not chosen as the primary key). Those can and should be maintained by the database as well as the primary key. My problem is designing a data model where data in one entity is affected by data in another entity. This is acceptable in rollup tables (such as in OLAP), but I don't think it ought to be part of the OLTP design.

    -PatP

  10. #10
    Join Date
    Jan 2005
    Posts
    19
    Pat, Ravi and Rudy(?),
    Thanks for your input on the topic so far, it has been quite interesting.

    (Please don't get too exited please. Though such discussions might have some interesing things, I've seen enough going this way and they usually don't end somewhere useful.)

    I will take the concept to what I'm designing to give a potential better understanding of what the situation will be in my case.

    Ravi, I partially agree with you: performance is not the top priority, functionality and integrity are my top priorities, but I think it might be good to consider this when designing a database. My experience is that changes are hard and sometimes require a lot of work to be done.

    In this case I have a database consisting of a
    table customer; primary key customerid int, no other good candidates)
    table category; primary key category varchar(10))
    table subcatogory; primary key category varchar(10), subcategory(20); foreign key category.category)
    table customercourse; candidate keys: customerid, category, subcategory, startdate; foreign key customer.customerid; foreign key subcategory.(category, subcategory)
    table lesson; primary key: <customercourse primary key>, lessondatetime.
    table exam; primary key: category, subcategory, examname; foreign keys: subcategory.(category, subcategory)
    table takenexam; primary key <customercourse primary key>+<exam primary key>+examdatetime

    I expect the following relative ratio's:
    customer : 1
    customercourse : 3
    lesson : ~90 (mostly 40-120)
    exams : ~4 (mostly 2 - 6)

    I will probably have about 100 customers working with and a couple thousend 'archived' over some period of time.

    Category, subcategory and exam are unlikely to change, unless the official authority makes some changes to them (slight changes every couple years).
    Customers and customercourse will probably get added all the time.
    Lesson and takenexam will be added in an even high rate.

    Changes are unlikely, except some minor changes to a persons details (initials, firstname, birthdate and such stuff).
    Most accesses are done through the customer table going towards lesson or exam. Also checking the schedules for lessons and exams.

    The issue is the customercouse (and indirectly the lesson and takenexam) table.
    'Natural key':
    + Consistency of the takenexam table
    - Increased database size
    'Surogate key':
    + lesson table will be much smaller, as this is where most data will be.
    - Consistency of the takenexam table because much harder

    Looks like the conclusion can be made from my point of view. Should have considered this before ...

    Besides this, I've also one designed a database that only had surogate keys (I didn't know the use of combined keys yet at that time). Though the database is solid and reliable I think that using combined keys (at some points) could have increased performance (no need for exessive joins) while making it a bit easier to keep consistent too.
    That said you don't want to know the piece of shit (the one who made it probably didn't worked with databases before and it was dbase (again)) was it predecessor, almost being inconsistent by design.
    Though its a good way IHMO to learn lessons on how to make decent databases.

    Hopefully someone can give me some input or new insights (also hoping the above was understandable enough).

    - Joris

  11. #11
    Join Date
    Mar 2001
    Location
    Lexington, KY
    Posts
    606
    Surrogate keys vs. ‘natural’ keys are a logical design decision based upon your particular business rules. In some cases, the combination of first and last name may be sufficient to uniquely identify a user. In other cases it is not unique enough: you may then decide to have your DBMS create its own key via some sort of artificial means (e.g. the 'surrogate' key).

    Your DBMS implementation may impose certain performance restrictions based upon your decision; provided you are aware of these limitations you can then decide to alter your model to fit your DBMS.

    Unfortunately this shows the weaknesses in most SQL DBMS implementations: there is no real reason that, internally, foreign keys need to be actual copies of the ASCII data in the base table. Internally it could be a pointer to the primary key attribute of the row in the base table; this could also make certain indexes that we normally create redundant.

    In any case, there is plenty of room for optimization in SQL DBMS products. I only wish DBMS vendors would make them.
    Thanks,

    Matt

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    matt, great post

    there are actually more categories than just surrogate and natural

    i think celko has 4 or 5 of them, and they might include names like technical and synthetic

    and regarding your comments about "weaknesses in most SQL DBMS implementations" -- whew!! do you realize how close you almost came to breaking that law which shall remain nameless?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  13. #13
    Join Date
    Mar 2001
    Location
    Lexington, KY
    Posts
    606
    Thanks Rudy.

    do you realize how close you almost came to breaking that law which shall remain nameless?
    I like to live dangerously.
    Thanks,

    Matt

  14. #14
    Join Date
    Jan 2005
    Posts
    19
    Quote Originally Posted by MattR
    Your DBMS implementation may impose certain performance restrictions based upon your decision; provided you are aware of these limitations you can then decide to alter your model to fit your DBMS.
    Currently I'm living with Access (database only, frontend will be C# or VB6). So the database won't be too big. Currently I need to convert a DBase *^% thingie which is 10 MB (Straight to access 11 MB). It has a working set of 50 customers and history of 450. Since that things uses aggregates stored all over, when I'm done it will probably be much larger.
    Maybe I'm scaling up, later, to MSSQL or maybe PostgreSQL (wan't too look at that some day, I don't like MySQL very much, pre-InnoDB it lacks decent integrity).
    I didn't consider using a OODBMS. They are not that common currently. They especially work well when navigating and always using a single path to get to the data.

    Maybe some hints for those Access users?


    I'm planning to layer the software. With Access I must build a layer that checks the data properly, beyond the extend Access is capable off and were you normally can use triggers I now need code. So probably I can hide here some implementation details later, when I decide some changes actually make it faster.

    Currently I'm not using surrogate keys, because . If they later prove to be faster I can always make a new version and that as being superior, can't I?

    All you guys thank you for your input...

    - Joris

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •