Results 1 to 7 of 7
  1. #1
    Join Date
    May 2016
    Posts
    3

    NVARCHAR as Primary Key or INT Primary Key and NVARCHAR UNIQUE column

    I have alphanumeric data with a max length of 20 characters. I'm going to store this data in a column of type NVARCHAR(20).

    These data are CODES and must be unique, so I decided to make that column primary key.

    But, asking another question, someone has "suggested" me to use an INT column as primary key.

    What do you think? An INT primary key and a NVARCHAR column with an UNIQUE constraint or my current design?

    I think I'm adding a new column that I'm not going to use. I will use it only for join. I'm not sure if this is the best design because I will use a lot the NVARCHAR(20). And this column will need also an index, so I don't see what I'm going to achieve creating another column as primary key.

    In other words, in the 99% of my queries I will use that column on my where clause and I don't know what improvement I will get adding an INT primary key.

    I'm going to use SQL Server and now I'm using NVARCHAR column as primary. This PK column is also a FK in other two tables (which is also PK on each table).

    I'm sure that an INT PK is better than a NVARCHAR, but here the real KEY for the table is the NVARCHAR column. I will need another index for the NVARCHAR column, so I don't know if there is a lot of advantage to add an INT column as PK.
    Last edited by VansFannel; 05-17-16 at 04:30.

  2. #2
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Check the db you are usi g, some do not permit a varchar column as a pk. But there is no reason why you cannot have a character column of 20 characters.
    There are good reasons to use an integer column as a pk, especially if there is no obvious other pk and you needed to use an auto generated number, but I cant see that in what you have said.
    Such reasons are for performance,usually because the candidate key is overly long or a composite of several columns or may change.

    Bear in mind if you need to use unicode then pick an appropriate datatype
    I'd rather be riding on the Tiger 800 or the Norton

  3. #3
    Join Date
    May 2016
    Posts
    3
    Quote Originally Posted by healdem View Post
    Check the db you are usi g, some do not permit a varchar column as a pk. But there is no reason why you cannot have a character column of 20 characters.
    There are good reasons to use an integer column as a pk, especially if there is no obvious other pk and you needed to use an auto generated number, but I cant see that in what you have said.
    Such reasons are for performance,usually because the candidate key is overly long or a composite of several columns or may change.

    Bear in mind if you need to use unicode then pick an appropriate datatype
    I'm going to use SQL Server and now I'm using NVARCHAR column as primary. This PK column is also a FK in other two tables (which is also PK on each table).

    I'm sure that an INT PK is better than a NVARCHAR, but here the real KEY for the table is the NVARCHAR column. I will need another index for the NVARCHAR column, so I don't know if there is a lot of advantage to add an INT column as PK.

  4. #4
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    You are sure that an INT PK is 'better', based on what?
    There may be some minor performance benefit in using an integer as a PK, but that will be, I suspect, more than offset by having to create and maintain a unique index on the code column on top if yhe integer column.
    You may have extea steps in the application software.

    Icannot for tbe life if me see why anyone would advise you to use an integer coulmn inaddition to this code both of which must be unique.

    However if you think it may be a problem then get some metrucs that demonstrate there is a problem before consuming valuable development time
    I'd rather be riding on the Tiger 800 or the Norton

  5. #5
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    The difference between a PK (Primary Key) based on an INT and one based on an NVARCHAR(20) depends a lot on scale.

    The primary efficiency difference will be based on the size of the key, which will impact how many rows fit in an index. Assuming that you build the Clustered Index on the NVARCHAR(20) column, then there is zero size/performance difference for that index and your schema will also reap the benefits of having the rows in that table sorted by the key.

    The INT column requires four bytes of storage while the NVARCHAR(20) requires 42 bytes. This makes the INT more than ten times as efficient for both a foreign key and for NCI (Non-Clustered Index) lookups. At ten or even a thousand rows, a ten times reduction isn't too exciting... At larger scales a ten-fold increase in performance and reduction in storage and backup becomes much more important!

    When I do Data Modeling presentations, I always cover the difference between a PK and an NK (Natural Key), and advocate using both of them. The NVARCHAR(20) NK is what I recommend that you show to the user. The INT PK/SK (Surrogate Key) is what you use within the schema and the application. This way when the user or a legislator changes the definition or data type of the NK (see the chaos caused by the ICD-9 to ICD-10 conversion) your schema and app are relatively unaffected..

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  6. #6
    Join Date
    May 2016
    Posts
    3
    Quote Originally Posted by Pat Phelan View Post
    The difference between a PK (Primary Key) based on an INT and one based on an NVARCHAR(20) depends a lot on scale.

    The primary efficiency difference will be based on the size of the key, which will impact how many rows fit in an index. Assuming that you build the Clustered Index on the NVARCHAR(20) column, then there is zero size/performance difference for that index and your schema will also reap the benefits of having the rows in that table sorted by the key.

    The INT column requires four bytes of storage while the NVARCHAR(20) requires 42 bytes. This makes the INT more than ten times as efficient for both a foreign key and for NCI (Non-Clustered Index) lookups. At ten or even a thousand rows, a ten times reduction isn't too exciting... At larger scales a ten-fold increase in performance and reduction in storage and backup becomes much more important!

    When I do Data Modeling presentations, I always cover the difference between a PK and an NK (Natural Key), and advocate using both of them. The NVARCHAR(20) NK is what I recommend that you show to the user. The INT PK/SK (Surrogate Key) is what you use within the schema and the application. This way when the user or a legislator changes the definition or data type of the NK (see the chaos caused by the ICD-9 to ICD-10 conversion) your schema and app are relatively unaffected..

    -PatP
    Thanks for your explanation. I have understood everything. If I use an INT PK I will gain better performance in FK and JOINs. But I will also use a NK. Will be this design faster than mine (NVARCHAR(20) as PK)?

  7. #7
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Quote Originally Posted by VansFannel View Post
    Will be this design faster than mine (NVARCHAR(20) as PK)?
    The answer to this question depends on the scale... With only one row in your tables, the difference will be negligible, you won't be able to measure a difference that humans will care about. At a thousand rows, there will be a difference, but I doubt that you or your users will care. At a million rows, the difference will be important.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •