Results 1 to 8 of 8
  1. #1
    AnObserver Guest

    Use of numeric pseudo-key vs. text key

    Can anyone point me to some guidelines as to when you should use a pseudo-key like an autoincrementing number as the primary key of a table instead of a longer text value (which will also be unique).

    My instinct is that use of the pseudo-key makes the database smaller (and perhaps faster) since less space is required for foreign key storage and less data must be compared for joins. It also allows changes to the associated text value without concerns about relationships, so I've gotten into a habit of doing this almost universally.


  2. #2
    Tim Fitzpatrick Guest
    It's normal to have a pseudo or artificial key for tables that only contain long natural unique identifiers - it does save space.

    However, I don't create such keys for every table (as most people seem to) because it can slow performance down.

    The benefit of cascading natural short natural keys is that the information is readily available and tables can be jumped in the joins.

    E.g. suppose there are three tables Table_1, Table_2 and Table_3 with their own ids and the following structure:

    Table_1 Table_2 Table_3
    T1_id (PK) T2_id (PK) T3_id (PK)
    T1_text T2_text T3_text
    T1_id (foreign key) T2_id (foreign key)

    If you start at Table_3 (e.g. if it's where the volatile data is) and need to discover T1_text you must make two joins (i.e. via Table_2).

    If Table_2 included the foreign key column T1_id as part of it's primary key, T3 would also contain a T1_id and could navigate directly to Table_1's data in one join.

    I hope this makes some sense!


  3. #3
    Srinivas Venigalla Guest
    There is a plenty of discussion on this topic. you may want to search using a good search engine like google. But here is my two cents worth:

    A primary key must be part of the data the table is representing. Good examples of a Primary key are: Emp_ID, Product_ID, SS_NO, tel_no, etc. They are naturally unique keys within the enterprise or the domain.

    An auto-incrementing ID, though it satisfies the requirement of a primary key, is a wrongway to use it. Those keys do not have any meaning of their own. Worse yet, the value of such PK varies depending on the order of creation! When you export a table and import, you may not be able to use that key at all (it happens on Microsoft products!). Every table in your design must have a natural PK, which is part of the data itself. Period. No need for external intervention.

    If you must have a auto-identity-key, it must be a surrogate key, with no bearing on the referential integrity.

    Coming to the type of the key. It is tempting to think an INT/LONG is more efficient than a CHAR. Almost all commercial databases have mastered the art of storing/searching string-type index files. So performance should not be a concern. No real world data are Integers!! SS_Nos, Tel_nos, Zip codes, Emp_IDs, Product_Ids,.. none of them are pure numeric. Show me one real world key that is pure numeric. Probably dates, using the julian convention, can be expressed as longs. But you need to use a function to convert them to be of any use in the real world. So, in my opininon, you can never have numeric keys all the time.

  4. #4
    Carlos Bromanski Guest
    You make some valid points, but I think David Cressey's March 25 post here titled "Surrogate keys" describes the real world a little better. There will be some cases where you cannot have a naturally unique key, sometimes caused by the errors of human behavior and sometimes unavoidably by the nature of the data. So how about a hybrid primary key, where you combine a mostly unique natural key with a meaning-free surrogate value.
    - cb

  5. #5
    Heinz Huber Guest
    Beware of the pitfalls of those "naturally unique" keys. Some of them tend to be not unique! E.g. the SS_NO you mentioned: The authorities succeed in assigning the SAME SS_NO to different individuals every now and then. And then you're really screwed if you've designed your database with the SS_NO as PK.


  6. #6
    Bernard Peek Guest
    Also note that all of the examples quoted are surrogate keys. Emp_ID is used as a key because there's no natural key that uniquely identifies an individual person. Even phone numbers are just an arbitrarily assigned code.

    They can be treated as natural keys because we trust the system that generates them to enforce their uniqueness. In the case of employee or product codes that's because we control the system. If you don't trust the system then you have three alternatives:

    a) Use surrogate keys

    b) Assume uniqueness but add some error-handling code

    c) Assume uniqueness and hope you get another job before anyone finds
    the problem.

    Bernard Peek

  7. #7
    Charles McDonald c Guest
    It's all very interesting but what happens when the primary keys are composite keys? This can get out of hand very quickely...

  8. #8
    Join Date
    Feb 2003
    Frederick, MD



    I believe that the discussion is ignoring the situation. After all, everything is relative :-)
    The whole point of this discussion is what is fast enough and what is fastest. For instance, if you wanted to generate a char(32) key for every row in a table, as opposed to a numeric( 32 ) obvious differences occur when the table exceeds certain row limits, (i.e. index depth becomes greater than 3). If you wanted to use a table with 300 rows and decided that you really didn't want to do an additional surrogate key, this might make a lot of sense. But if you are talking about a million rows, or worse, hundreds of millions, or even worse, a higly scalable design with the capability of partitioning the data amongst multiple databases supporting hundreds of billions of rows, then (as most women will agree) SIZE DOES MATTER!
    So when you are facing the impossible (something I specialize in every day) then you have to examine every facet of choosing a key. Ultimately, the volumetrics of the data are a driving factor in a decision about surrogate key usage. My advice is when you have the worst case, use an exact numeric datatype (they will compare faster than a comparably sized character datatype), they are typically 1/2 the size of a equally sized character datatype (numeric(38,0) occupies 17 bytes of storage while char(38) always uses 38 bytes.

    Your Friendly Neighborhood DBA,


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts