If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > PK: varchar or Int

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 08-24-04, 07:34
dr_suresh20 dr_suresh20 is offline
Registered User
 
Join Date: Sep 2003
Posts: 218
PK: varchar or Int

Hi,

I am using db2 ese v8.1.4a on win2k platform.

I have a generic question: Between varchar or Int data type, which is more suited for a primary key keeping performance impact in mind... I know it is purely dependent on business need and type of data to be stored..but, just in case if one has to make a choice - which one is performance friendly?

Thanks in avance.
Reply With Quote
  #2 (permalink)  
Old 08-24-04, 08:43
ARWinner ARWinner is offline
Registered User
 
Join Date: Jan 2003
Posts: 3,575
I cannot give you a definitive answer which one "Performs" better, but I will almost always use an Integer over Varchar and this is my reasoning.

1) int only uses 4 bytes, an equivalent varchar will need 10 bytes. This means that not only is the amount of data for the PK column less than half, but the same is true for all FKs that point to this PK. And in my opinion, smaller is faster.

2) all PK are indexes. Indexes take up space. Searching smaller columns sizes in an index has to be faster.

3) sort order. You know that using int that 1 < 2, but in varchar, it depends on the formatting.

HTH

Andy
Reply With Quote
  #3 (permalink)  
Old 08-25-04, 08:46
dr_suresh20 dr_suresh20 is offline
Registered User
 
Join Date: Sep 2003
Posts: 218
Thanks for the response.

But, I remember having read somewhere that integer are also internally stored as "packed strings" and therefore there is no major difference in performance between the two.

In any case, I will go ahead and make it integer.

Many thanks.
Reply With Quote
  #4 (permalink)  
Old 08-25-04, 08:55
ARWinner ARWinner is offline
Registered User
 
Join Date: Jan 2003
Posts: 3,575
An INTEGER in DB2 UDB for LUW takes up 4 bytes and has values that range from -2,147,483,648 to 2,147,483,647 (from the manual). This has to be stored as binary to accomplish this.

Andy
Reply With Quote
  #5 (permalink)  
Old 08-25-04, 09:00
sundialsvcs sundialsvcs is offline
Registered User
 
Join Date: Oct 2003
Posts: 706
Just how much "difference in performance" do you actually expect to receive? Not much in either case, I assure you. What matters is what's most appropriate for your application.
__________________
ChimneySweep(R): fast, automatic
table repair at a click of the
mouse! http://www.sundialservices.com
Reply With Quote
  #6 (permalink)  
Old 08-26-04, 03:49
dr_suresh20 dr_suresh20 is offline
Registered User
 
Join Date: Sep 2003
Posts: 218
I agree sundialsvcs, I am not expecting major boost in performance.

However, I need to make sure column definition matches business requirement and also bring in flexibility for eg: the column in question is a primary key therefore choice is between int or varchar. if varchar is chosen, then i think it is lot more flexible to generate unique ids (GUID 128 bit) easily but flip side is performance....and hence the question.

Thank you for your responses. I really appreciate it.
Reply With Quote
  #7 (permalink)  
Old 08-26-04, 04:07
sathyaram_s sathyaram_s is offline
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,534
Looks like you are considering fixed width column, have you considered CHAR ??

Also, make sure you select a proper clustering index ... If you do not call any index 'CLUSTERED', the first index you create(generally happens to be the primary index) will have the highest cluster ratio, which may or may not help in queries

HTH

Sathyaram
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #8 (permalink)  
Old 08-26-04, 10:01
famudba famudba is offline
Registered User
 
Join Date: Jan 2004
Location: Tallahassee, FL, USA
Posts: 96
char or int is the best for PK , but not varchar , this will cost you more.

never use varchar column as PK , that need to do 2 things for every row you insert.


hope it may help you


Thank You
__________________
Lekharaju Ennam
Certified Oracle8i & DB UDB DBA
Florida A&M University
Reply With Quote
  #9 (permalink)  
Old 08-30-04, 05:58
dr_suresh20 dr_suresh20 is offline
Registered User
 
Join Date: Sep 2003
Posts: 218
Actually, defining PK as char would make our application less flexible. So, choice is between int (which is existing) or varchar. I am not fully convinced that varchar is harmful... I see major applications have gone varchar way.

I liked the idea of Clustering... just wondering how best i can fit this to my scenario.

Once again thank you for your reply.
Reply With Quote
  #10 (permalink)  
Old 08-30-04, 13:47
brunod brunod is offline
Registered User
 
Join Date: Jun 2004
Posts: 9
Exclamation Primary keys should be business independent

Hi,

In the original post, I read the following:

" I know it is purely dependent on business need and type of data to be stored.."

In my opinion, primary keys should be independent of business requirements and type of data. Primary keys are not ment to store business data and should never be updated. The only purpose of primary keys is to constrain data to distinct values and to relate data found in other tables through parent-child relationships. This said, I always opt for integers since they are much easier to manipulate.

Hope this helps!

Bruno
Reply With Quote
  #11 (permalink)  
Old 08-30-04, 15:28
dbamota dbamota is offline
Registered User
 
Join Date: Sep 2003
Posts: 237
In general, more compact the data it's better; if I/O is involved so that you get more records with one I/O(index scan). Without knowing the PLAN that SQL uses, I don't think there is a right answer. When you consider that you cannot have more than 256 records per page, you are wasting a lot of space if you use 8k,16k,or 32k pages with a small index
__________________
mota
Reply With Quote
  #12 (permalink)  
Old 08-30-04, 17:31
r937 r937 is online now
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
Quote:
Originally Posted by brunod
In my opinion, primary keys should be independent of business requirements and type of data. Primary keys are not ment to store business data and should never be updated.
i'm glad you said it was just your opinion

primary keys arise in logical modelling when it is vitally important that you identify the business entities

and, of course, you cannot **identify** anything without the ability to say "this one" or "that one" or "no, the other one" -- in other words, to identify it uniquely

thus the primary key always starts out as a logical concept, which may carry over into physical design

now, if, during physical design, someone decides that it is wise to use a surrogate key as the dbms PRIMARY KEY, that's fine

isn't it amazing how often you see people who use a surrogate primary key forget to declare a UNIQUE constraint on the alternate key (what i like to call the "real" primary key) and then come running to database forums with questions like "how do i remove duplicate rows"

it is perfectly valid to have a primary key in a table that is a natural key, i.e. business data

and yes, primary keys do change

that is why ON UPDATE CASCADE was invented

__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #13 (permalink)  
Old 08-31-04, 13:08
n_i n_i is online now
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by dr_suresh20
I am not expecting major boost in performance.
If you consider that integer comparison is a single processor cycle operation, while comparing two strings will require calling strcmp(), you may change your expectations :-)

If the table will be subject to index range scans by multiple applications simultaneously I think there will be some noticeable difference in performance between INT and VARCHAR(128) keys.
Reply With Quote
  #14 (permalink)  
Old 09-01-04, 09:36
dr_suresh20 dr_suresh20 is offline
Registered User
 
Join Date: Sep 2003
Posts: 218
Let me conduct some more tests both on db2 and oracle before I conclude this topic...

Once again, thank you all for your valuable contribution. I really appreciate it.
Reply With Quote
  #15 (permalink)  
Old 09-07-04, 01:30
dr_suresh20 dr_suresh20 is offline
Registered User
 
Join Date: Sep 2003
Posts: 218
sorry to bring this up again..

I did a test on db2 database and here are the results.

Case 1: PK integer - 65536 records - Total cost - 75.0297 - Execution Time - 0.003072 ms

Case 2: PK varchar - 65536 records - Total cost - 75.0304 - Execution Time - 0.037570 ms

Index on PK used.

Difference of 0.0007 in cost and .034498 ms in time.

Please note that I had set current mode = explain only...

Cost wise, I don't see much change..but, execution is 10 times the original value!! and this is a lot of difference if you considering huge chunks of data..

Any thoughts?
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On