If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > CHAR vs VARCHAR storage efficiency

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-15-07, 23:32
jeremym jeremym is offline
Registered User
 
Join Date: Oct 2007
Posts: 7
CHAR vs VARCHAR storage efficiency

Hi all,

I was pointed at this DB2 v8.1 developer works document http://www.ibm.com/developerworks/db...hur/index.html, specifically this bit under "CREATE TABLE options":-
Avoid using the VARCHAR data type for columns of 30 bytes or less as it typically wastes space; instead use CHAR. Wasting space can even affect query times if the volume is significant.
This is rather intriguing, and does raise some questions:
* Why 30 bytes?
* Does this still hold with 9.1.2 on linux given that the article was written for 8.1?
* Are there any other articles that people know of that describe this in more detail?
* Is the overhead for a varchar field that much? I thought it was two bytes for the length and one more byte if the field is nullable - a maximum overhead for a varchar field of three extra bytes.

I would like to understand how this works. At this early stage in a project this would be good to take on board if I can get an idea of how much space can be saved.

any pointers would be appreciated
thanks.
Reply With Quote
  #2 (permalink)  
Old 10-15-07, 23:41
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
The VARCHAR overhead is as you stated (2 extra bytes to store the length). The null byte applies to all columns.

In addition, DB2 requries some extra CPU time to process VARCHARs compared to CHAR. But VARCHAR can potentially use less memory if the row is smaller, which could improve the bufferpool hit ratio (more rows can fit in a page in memory).

Do the math yourself, and consider the CPU load, disk space, memory, etc of your server. The amount of space saved depends on your data. Don't get worked up about it, and there are no hard and fast rules.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #3 (permalink)  
Old 10-16-07, 05:45
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
This is micro-tuning only. If you really have performance problems, you may consider switching from VARCHAR to CHAR for short strings. But that is very, very rarely an issue. You can usually gain more performance improvements with other things like proper buffer pool tuning, good indexes, denormalization of tables, .... Unless all that is covered, a proper data model is much more important than getting an additional 0.001% (just an arbitrary number!) query speed-up.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #4 (permalink)  
Old 10-16-07, 21:32
jeremym jeremym is offline
Registered User
 
Join Date: Oct 2007
Posts: 7
Thanks Marcus and Knut for your replies.


A quick investigation shows that the space comments don't stack up. Comparing inserts and selects for char vs. varchar showed in a very artificial test that varchar inserts were marginally slower and selects were marginally faster. I expect that like a lot of db tuning, the answer is - that it all depends.

I suppose really that I was trying to understand the justification '30 bytes or less as it typically wastes space' statement.

cheers,
Reply With Quote
  #5 (permalink)  
Old 10-17-07, 04:41
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
I guess that the 30 bytes are just an arbitrarily chosen number.

It really depends on the average length on the strings in the column. If the average length + 2 byte for the length is shorter than the maximum length (30 in this case), then VARCHAR consumes less space than CHAR. I believe that the assumption of the author was that VARCHAR(30)/CHAR(30) and smaller are usually used more or less to their limits, so that the length information introduced by VARCHAR adds overhead. Whether this assumption is truly applicable in your environment, is something you have to decide for yourself.

I'm convinced that performance and space arguments for such a comparison are more or less useless and not applicable. No one cares about 2 additional bytes per row (or 10 bytes per row with multiple VARCHARs) these days. Even if you have millions or billions of rows - you would have lots of disc space anyway and adding a disk isn't such a big deal. And already busted the performance claim. So do a good data model design and stick with that.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #6 (permalink)  
Old 10-17-07, 06:06
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
The important thing about row length (which is determined by the sum of the column lengths) is memory (specifically how much of the data you can keop in the DB2 bufferpools). DB2 performance is all about memory and avoiding synchronous physical I/O.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #7 (permalink)  
Old 10-17-07, 17:18
jeremym jeremym is offline
Registered User
 
Join Date: Oct 2007
Posts: 7
Thanks Marcus and Knut, I think I understand it now...

If the varchar column length is less than 30 and the field is usually filled to less than three bytes (the overhead of a nullable varchar) of the maximum field size, then consider saving the cost of the overhead by using a char data type instead.

This saving could then have a positive impact on rowsize and what will fit in buffer pools etc.
Reply With Quote
  #8 (permalink)  
Old 10-18-07, 13:52
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Only 2 bytes.

The null indicator is applicable to both, CHAR and VARCHAR and, therefore, must not be taken into consideration here.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #9 (permalink)  
Old 10-19-07, 02:30
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
Is compressed data really stored in compressed form in the bufferpool?
Is this true for both LUW and z/OS ?
Is decompression / compression then part of the access path "work" by the data manager?
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On