Results 1 to 15 of 15
  1. #1
    Join Date
    Sep 2003
    Posts
    218

    Unanswered: PK: varchar or Int

    Hi,

    I am using db2 ese v8.1.4a on win2k platform.

    I have a generic question: Between varchar or Int data type, which is more suited for a primary key keeping performance impact in mind... I know it is purely dependent on business need and type of data to be stored..but, just in case if one has to make a choice - which one is performance friendly?

    Thanks in avance.

  2. #2
    Join Date
    Jan 2003
    Posts
    4,292
    Provided Answers: 5
    I cannot give you a definitive answer which one "Performs" better, but I will almost always use an Integer over Varchar and this is my reasoning.

    1) int only uses 4 bytes, an equivalent varchar will need 10 bytes. This means that not only is the amount of data for the PK column less than half, but the same is true for all FKs that point to this PK. And in my opinion, smaller is faster.

    2) all PK are indexes. Indexes take up space. Searching smaller columns sizes in an index has to be faster.

    3) sort order. You know that using int that 1 < 2, but in varchar, it depends on the formatting.

    HTH

    Andy

  3. #3
    Join Date
    Sep 2003
    Posts
    218
    Thanks for the response.

    But, I remember having read somewhere that integer are also internally stored as "packed strings" and therefore there is no major difference in performance between the two.

    In any case, I will go ahead and make it integer.

    Many thanks.

  4. #4
    Join Date
    Jan 2003
    Posts
    4,292
    Provided Answers: 5
    An INTEGER in DB2 UDB for LUW takes up 4 bytes and has values that range from -2,147,483,648 to 2,147,483,647 (from the manual). This has to be stored as binary to accomplish this.

    Andy

  5. #5
    Join Date
    Oct 2003
    Posts
    706
    Just how much "difference in performance" do you actually expect to receive? Not much in either case, I assure you. What matters is what's most appropriate for your application.
    ChimneySweep(R): fast, automatic
    table repair at a click of the
    mouse! http://www.sundialservices.com

  6. #6
    Join Date
    Sep 2003
    Posts
    218
    I agree sundialsvcs, I am not expecting major boost in performance.

    However, I need to make sure column definition matches business requirement and also bring in flexibility for eg: the column in question is a primary key therefore choice is between int or varchar. if varchar is chosen, then i think it is lot more flexible to generate unique ids (GUID 128 bit) easily but flip side is performance....and hence the question.

    Thank you for your responses. I really appreciate it.

  7. #7
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650
    Looks like you are considering fixed width column, have you considered CHAR ??

    Also, make sure you select a proper clustering index ... If you do not call any index 'CLUSTERED', the first index you create(generally happens to be the primary index) will have the highest cluster ratio, which may or may not help in queries

    HTH

    Sathyaram
    Visit the new-look IDUG Website , register to gain access to the excellent content.

  8. #8
    Join Date
    Jan 2004
    Location
    Tallahassee, FL, USA
    Posts
    96
    char or int is the best for PK , but not varchar , this will cost you more.

    never use varchar column as PK , that need to do 2 things for every row you insert.


    hope it may help you


    Thank You
    Lekharaju Ennam
    Certified Oracle8i & DB UDB DBA
    Florida A&M University

  9. #9
    Join Date
    Sep 2003
    Posts
    218
    Actually, defining PK as char would make our application less flexible. So, choice is between int (which is existing) or varchar. I am not fully convinced that varchar is harmful... I see major applications have gone varchar way.

    I liked the idea of Clustering... just wondering how best i can fit this to my scenario.

    Once again thank you for your reply.

  10. #10
    Join Date
    Jun 2004
    Posts
    9

    Exclamation Primary keys should be business independent

    Hi,

    In the original post, I read the following:

    " I know it is purely dependent on business need and type of data to be stored.."

    In my opinion, primary keys should be independent of business requirements and type of data. Primary keys are not ment to store business data and should never be updated. The only purpose of primary keys is to constrain data to distinct values and to relate data found in other tables through parent-child relationships. This said, I always opt for integers since they are much easier to manipulate.

    Hope this helps!

    Bruno

  11. #11
    Join Date
    Sep 2003
    Posts
    237
    In general, more compact the data it's better; if I/O is involved so that you get more records with one I/O(index scan). Without knowing the PLAN that SQL uses, I don't think there is a right answer. When you consider that you cannot have more than 256 records per page, you are wasting a lot of space if you use 8k,16k,or 32k pages with a small index
    mota

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by brunod
    In my opinion, primary keys should be independent of business requirements and type of data. Primary keys are not ment to store business data and should never be updated.
    i'm glad you said it was just your opinion

    primary keys arise in logical modelling when it is vitally important that you identify the business entities

    and, of course, you cannot **identify** anything without the ability to say "this one" or "that one" or "no, the other one" -- in other words, to identify it uniquely

    thus the primary key always starts out as a logical concept, which may carry over into physical design

    now, if, during physical design, someone decides that it is wise to use a surrogate key as the dbms PRIMARY KEY, that's fine

    isn't it amazing how often you see people who use a surrogate primary key forget to declare a UNIQUE constraint on the alternate key (what i like to call the "real" primary key) and then come running to database forums with questions like "how do i remove duplicate rows"

    it is perfectly valid to have a primary key in a table that is a natural key, i.e. business data

    and yes, primary keys do change

    that is why ON UPDATE CASCADE was invented

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  13. #13
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by dr_suresh20
    I am not expecting major boost in performance.
    If you consider that integer comparison is a single processor cycle operation, while comparing two strings will require calling strcmp(), you may change your expectations :-)

    If the table will be subject to index range scans by multiple applications simultaneously I think there will be some noticeable difference in performance between INT and VARCHAR(128) keys.

  14. #14
    Join Date
    Sep 2003
    Posts
    218
    Let me conduct some more tests both on db2 and oracle before I conclude this topic...

    Once again, thank you all for your valuable contribution. I really appreciate it.

  15. #15
    Join Date
    Sep 2003
    Posts
    218
    sorry to bring this up again..

    I did a test on db2 database and here are the results.

    Case 1: PK integer - 65536 records - Total cost - 75.0297 - Execution Time - 0.003072 ms

    Case 2: PK varchar - 65536 records - Total cost - 75.0304 - Execution Time - 0.037570 ms

    Index on PK used.

    Difference of 0.0007 in cost and .034498 ms in time.

    Please note that I had set current mode = explain only...

    Cost wise, I don't see much change..but, execution is 10 times the original value!! and this is a lot of difference if you considering huge chunks of data..

    Any thoughts?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •