| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

08-24-04, 07:34
|
|
Registered User
|
|
Join Date: Sep 2003
Posts: 218
|
|
|
PK: varchar or Int
|
|
Hi,
I am using db2 ese v8.1.4a on win2k platform.
I have a generic question: Between varchar or Int data type, which is more suited for a primary key keeping performance impact in mind... I know it is purely dependent on business need and type of data to be stored..but, just in case if one has to make a choice - which one is performance friendly?
Thanks in avance.
|
|

08-24-04, 08:43
|
|
Registered User
|
|
Join Date: Jan 2003
Posts: 3,575
|
|
I cannot give you a definitive answer which one "Performs" better, but I will almost always use an Integer over Varchar and this is my reasoning.
1) int only uses 4 bytes, an equivalent varchar will need 10 bytes. This means that not only is the amount of data for the PK column less than half, but the same is true for all FKs that point to this PK. And in my opinion, smaller is faster.
2) all PK are indexes. Indexes take up space. Searching smaller columns sizes in an index has to be faster.
3) sort order. You know that using int that 1 < 2, but in varchar, it depends on the formatting.
HTH
Andy
|
|

08-25-04, 08:46
|
|
Registered User
|
|
Join Date: Sep 2003
Posts: 218
|
|
|
|
Thanks for the response.
But, I remember having read somewhere that integer are also internally stored as "packed strings" and therefore there is no major difference in performance between the two.
In any case, I will go ahead and make it integer.
Many thanks.
|
|

08-25-04, 08:55
|
|
Registered User
|
|
Join Date: Jan 2003
Posts: 3,575
|
|
An INTEGER in DB2 UDB for LUW takes up 4 bytes and has values that range from -2,147,483,648 to 2,147,483,647 (from the manual). This has to be stored as binary to accomplish this.
Andy
|
|

08-25-04, 09:00
|
|
Registered User
|
|
Join Date: Oct 2003
Posts: 706
|
|
Just how much "difference in performance" do you actually expect to receive? Not much in either case, I assure you. What matters is what's most appropriate for your application.
|
|

08-26-04, 03:49
|
|
Registered User
|
|
Join Date: Sep 2003
Posts: 218
|
|
I agree sundialsvcs, I am not expecting major boost in performance.
However, I need to make sure column definition matches business requirement and also bring in flexibility for eg: the column in question is a primary key therefore choice is between int or varchar. if varchar is chosen, then i think it is lot more flexible to generate unique ids (GUID 128 bit) easily but flip side is performance....and hence the question.
Thank you for your responses. I really appreciate it.
|
|

08-26-04, 04:07
|
|
Super Moderator
|
|
Join Date: Aug 2001
Location: UK
Posts: 4,534
|
|
Looks like you are considering fixed width column, have you considered CHAR ??
Also, make sure you select a proper clustering index ... If you do not call any index 'CLUSTERED', the first index you create(generally happens to be the primary index) will have the highest cluster ratio, which may or may not help in queries
HTH
Sathyaram
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
|
|

08-26-04, 10:01
|
|
Registered User
|
|
Join Date: Jan 2004
Location: Tallahassee, FL, USA
Posts: 96
|
|
char or int is the best for PK , but not varchar , this will cost you more.
never use varchar column as PK , that need to do 2 things for every row you insert.
hope it may help you
Thank You
__________________
Lekharaju Ennam
Certified Oracle8i & DB UDB DBA
Florida A&M University
|
|

08-30-04, 05:58
|
|
Registered User
|
|
Join Date: Sep 2003
Posts: 218
|
|
Actually, defining PK as char would make our application less flexible. So, choice is between int (which is existing) or varchar. I am not fully convinced that varchar is harmful... I see major applications have gone varchar way.
I liked the idea of Clustering... just wondering how best i can fit this to my scenario.
Once again thank you for your reply.
|
|

08-30-04, 13:47
|
|
Registered User
|
|
Join Date: Jun 2004
Posts: 9
|
|
Primary keys should be business independent
Hi,
In the original post, I read the following:
" I know it is purely dependent on business need and type of data to be stored.."
In my opinion, primary keys should be independent of business requirements and type of data. Primary keys are not ment to store business data and should never be updated. The only purpose of primary keys is to constrain data to distinct values and to relate data found in other tables through parent-child relationships. This said, I always opt for integers since they are much easier to manipulate.
Hope this helps!
Bruno
|
|

08-30-04, 15:28
|
|
Registered User
|
|
Join Date: Sep 2003
Posts: 237
|
|
In general, more compact the data it's better; if I/O is involved so that you get more records with one I/O(index scan). Without knowing the PLAN that SQL uses, I don't think there is a right answer. When you consider that you cannot have more than 256 records per page, you are wasting a lot of space if you use 8k,16k,or 32k pages with a small index
__________________
mota
|
|

08-30-04, 17:31
|
|
SQL Consultant
|
|
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
|
|
Quote:
|
Originally Posted by brunod
In my opinion, primary keys should be independent of business requirements and type of data. Primary keys are not ment to store business data and should never be updated.
|
i'm glad you said it was just your opinion
primary keys arise in logical modelling when it is vitally important that you identify the business entities
and, of course, you cannot **identify** anything without the ability to say "this one" or "that one" or "no, the other one" -- in other words, to identify it uniquely
thus the primary key always starts out as a logical concept, which may carry over into physical design
now, if, during physical design, someone decides that it is wise to use a surrogate key as the dbms PRIMARY KEY, that's fine
isn't it amazing how often you see people who use a surrogate primary key forget to declare a UNIQUE constraint on the alternate key (what i like to call the "real" primary key) and then come running to database forums with questions like "how do i remove duplicate rows"
it is perfectly valid to have a primary key in a table that is a natural key, i.e. business data
and yes, primary keys do change
that is why ON UPDATE CASCADE was invented

|
|

08-31-04, 13:08
|
|
:-)
|
|
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
|
|
Quote:
|
Originally Posted by dr_suresh20
I am not expecting major boost in performance.
|
If you consider that integer comparison is a single processor cycle operation, while comparing two strings will require calling strcmp(), you may change your expectations :-)
If the table will be subject to index range scans by multiple applications simultaneously I think there will be some noticeable difference in performance between INT and VARCHAR(128) keys.
|
|

09-01-04, 09:36
|
|
Registered User
|
|
Join Date: Sep 2003
Posts: 218
|
|
Let me conduct some more tests both on db2 and oracle before I conclude this topic...
Once again, thank you all for your valuable contribution. I really appreciate it.
|
|

09-07-04, 01:30
|
|
Registered User
|
|
Join Date: Sep 2003
Posts: 218
|
|
sorry to bring this up again..
I did a test on db2 database and here are the results.
Case 1: PK integer - 65536 records - Total cost - 75.0297 - Execution Time - 0.003072 ms
Case 2: PK varchar - 65536 records - Total cost - 75.0304 - Execution Time - 0.037570 ms
Index on PK used.
Difference of 0.0007 in cost and .034498 ms in time.
Please note that I had set current mode = explain only...
Cost wise, I don't see much change..but, execution is 10 times the original value!! and this is a lot of difference if you considering huge chunks of data..
Any thoughts?
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|