Results 1 to 9 of 9
  1. #1
    Join Date
    Oct 2007
    Posts
    7

    Unanswered: CHAR vs VARCHAR storage efficiency

    Hi all,

    I was pointed at this DB2 v8.1 developer works document http://www.ibm.com/developerworks/db...hur/index.html, specifically this bit under "CREATE TABLE options":-

    Avoid using the VARCHAR data type for columns of 30 bytes or less as it typically wastes space; instead use CHAR. Wasting space can even affect query times if the volume is significant.

    This is rather intriguing, and does raise some questions:
    * Why 30 bytes?
    * Does this still hold with 9.1.2 on linux given that the article was written for 8.1?
    * Are there any other articles that people know of that describe this in more detail?
    * Is the overhead for a varchar field that much? I thought it was two bytes for the length and one more byte if the field is nullable - a maximum overhead for a varchar field of three extra bytes.

    I would like to understand how this works. At this early stage in a project this would be good to take on board if I can get an idea of how much space can be saved.

    any pointers would be appreciated
    thanks.

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    The VARCHAR overhead is as you stated (2 extra bytes to store the length). The null byte applies to all columns.

    In addition, DB2 requries some extra CPU time to process VARCHARs compared to CHAR. But VARCHAR can potentially use less memory if the row is smaller, which could improve the bufferpool hit ratio (more rows can fit in a page in memory).

    Do the math yourself, and consider the CPU load, disk space, memory, etc of your server. The amount of space saved depends on your data. Don't get worked up about it, and there are no hard and fast rules.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    This is micro-tuning only. If you really have performance problems, you may consider switching from VARCHAR to CHAR for short strings. But that is very, very rarely an issue. You can usually gain more performance improvements with other things like proper buffer pool tuning, good indexes, denormalization of tables, .... Unless all that is covered, a proper data model is much more important than getting an additional 0.001% (just an arbitrary number!) query speed-up.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  4. #4
    Join Date
    Oct 2007
    Posts
    7
    Thanks Marcus and Knut for your replies.


    A quick investigation shows that the space comments don't stack up. Comparing inserts and selects for char vs. varchar showed in a very artificial test that varchar inserts were marginally slower and selects were marginally faster. I expect that like a lot of db tuning, the answer is - that it all depends.

    I suppose really that I was trying to understand the justification '30 bytes or less as it typically wastes space' statement.

    cheers,

  5. #5
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    I guess that the 30 bytes are just an arbitrarily chosen number.

    It really depends on the average length on the strings in the column. If the average length + 2 byte for the length is shorter than the maximum length (30 in this case), then VARCHAR consumes less space than CHAR. I believe that the assumption of the author was that VARCHAR(30)/CHAR(30) and smaller are usually used more or less to their limits, so that the length information introduced by VARCHAR adds overhead. Whether this assumption is truly applicable in your environment, is something you have to decide for yourself.

    I'm convinced that performance and space arguments for such a comparison are more or less useless and not applicable. No one cares about 2 additional bytes per row (or 10 bytes per row with multiple VARCHARs) these days. Even if you have millions or billions of rows - you would have lots of disc space anyway and adding a disk isn't such a big deal. And already busted the performance claim. So do a good data model design and stick with that.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  6. #6
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    The important thing about row length (which is determined by the sum of the column lengths) is memory (specifically how much of the data you can keop in the DB2 bufferpools). DB2 performance is all about memory and avoiding synchronous physical I/O.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  7. #7
    Join Date
    Oct 2007
    Posts
    7
    Thanks Marcus and Knut, I think I understand it now...

    If the varchar column length is less than 30 and the field is usually filled to less than three bytes (the overhead of a nullable varchar) of the maximum field size, then consider saving the cost of the overhead by using a char data type instead.

    This saving could then have a positive impact on rowsize and what will fit in buffer pools etc.

  8. #8
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Only 2 bytes.

    The null indicator is applicable to both, CHAR and VARCHAR and, therefore, must not be taken into consideration here.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  9. #9
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Is compressed data really stored in compressed form in the bufferpool?
    Is this true for both LUW and z/OS ?
    Is decompression / compression then part of the access path "work" by the data manager?
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •