Results 1 to 8 of 8
  1. #1
    Join Date
    Jan 2010
    Posts
    19

    Using surrogate keys because of key length?

    I know that the subject of surrogate vs natural keys envokes strong feelings here, but assuming that you prefer natural keys, have you sometimes had to use a surrogate anyway because of key length?

    I'm looking at a design right now, where the natural key would be a descriptive name of up to 160 chars. I suppose I'm not the first one to come across such a situation.

    I know that some systems force the user to make up a short arbitrary code in circumstances like this, but that's just shifting the responsibility for generating the surrogate key from the computer to the user.

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    I'm a strong believer in using both a surrogate Primary Key, and a Unique Constraint for a Natural Key. They two are synonyms and need to be treated as such, but the surrogate key is often thousands of times easier for the code to use for any number of reasons.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by Anna-J View Post
    I know that the subject of surrogate vs natural keys envokes strong feelings here, but assuming that you prefer natural keys, have you sometimes had to use a surrogate anyway because of key length?

    I'm looking at a design right now, where the natural key would be a descriptive name of up to 160 chars. I suppose I'm not the first one to come across such a situation.

    I know that some systems force the user to make up a short arbitrary code in circumstances like this, but that's just shifting the responsibility for generating the surrogate key from the computer to the user.
    In a case where you have an unusually long string as the only potential identifier you should be asking some questions of the business users. How do they expect to identify the information represented by this table? Can they rely on differentiating two 160 character strings that differ only by one character? You may determine that it's more convenient and reliable to use or create a shorter identifier for them. That would not be a surrogate key. It would be a meaningful business key rather than a surrogate because it's used by the consumers of the data rather than being purely internal to the database or an application.

    If you do want to make the 160 character column unique then a surrogate is probably a very good idea. Obviously the surrogate doesn't make the column in question unique, so then you would need two uniqueness constraints instead of one.

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    In a lot of the applications that I build, the Natural Key is comprised of three or more columns, and can often have a maximum length much larger than 160 characters.

    The following discussion is specifically targeted toward code generators, but it applies almost as well to applications using any standard development platform such as Windows Forms, MFC, etc. When using code generator techniques, a single column key is far easier to use than a multiple column key. It is also far simpler to use code generation tools if the Primary Key for all of the tables is the same data type.

    As long as the Data Archictect and the developers clearly understand that the Surrogate Key and the Natural Key are synonyms, I've never had a problem with this technique.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by Pat Phelan View Post
    As long as the Data Archictect and the developers clearly understand that the Surrogate Key and the Natural Key are synonyms, I've never had a problem with this technique.
    By "synonym" in this case, do you mean that the surrogate and the natural key values always identify the same thing? That's not always true though. Either could change independently of each other. It's often a working assumption that surrogate key values will not be allowed to change at runtime but it could be a requirement to allow natural key values to be changed. Changing any key values amounts to the same thing: it changes the identification of the things being represented in the table.

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    I mean synonym in the sense that the surrogate key and the natural key identify the same row. For the purpose of identifying the underlying "thing" that the row represents the natural key and the surrogate key are equivalent. The natural key might change value, but the surrogate key and the new value of the natural key would still identify the same row so they are still synonomous even though the value of the natural key is different.

    The surrogate key exists in my mind purely to make it easier to construct and verify that solutions (applications) are correct, without requiring any change that is apparent to the SME (Subject Matter Expert). At least in my applications, adding a surrogate key is unnoticed by the user. I go to considerable lengths to keep the user from being aware that a surrogate key exists because I see the surrogate key as part of the solution instead of part of the problem to be solved.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  7. #7
    Join Date
    Jan 2010
    Posts
    19
    Quote Originally Posted by Pat Phelan View Post
    When using code generator techniques, a single column key is far easier to use than a multiple column key. It is also far simpler to use code generation tools if the Primary Key for all of the tables is the same data type.
    Could you give an example of this? I'd be grateful for one, since I have a few tables with 3 column PKs.

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Quote Originally Posted by Anna-J View Post
    Could you give an example of this?
    Ummm... How?

    Spin up a demonstration table with a surrogate key and three other columns. Mix and match the data types of the columns if you like real fun.

    Declare the surrogate key as the Primary Key, then take your code generator and spin up a demo app to allow you to enter rows, edit rows, and delete rows using the surrogate key. Test to be sure that it works correctly. Store the generated code for a few minutes.

    Remove the PK declaration, then declare the other columns as the Primary Key. Again, verify that the app works correctly. Compare the generated code with the previous version, specifically looking at the differences.

    The difference should be small compared to the total application, but those differences should cause you to strongly prefer using the surrogate key especially if the columns are complex.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •