Results 1 to 4 of 4
  1. #1
    Join Date
    Sep 2003
    Posts
    1

    Unanswered: Varchar Usage - optimization

    Greetings

    My name is Tom Bigbee – I am the data architect for Unisys on a project in Louisiana to rewrite the statewide OMV system.

    Currently the user has an IBM mainframe running DB2 v7.0. We will be upgrading the user’s box to run v8.1. The application developers are using Websphere and would very much like to utilize variable length character strings (varchars) on the DB instance. The client DBA is against using varchars and lists the reasons below.

    When a single varchar field that is not physically located at the very end of a file, is updated within a table, the entire row is copied into the log file. If a single char field physically located anywhere in the file, is updated, then only that char field gets copied to the log file.

    Varchars should not be used within an index.

    Varchars automatically take up two extra bytes; therefore they waste space

    Chars take up less CPU cycles because the database does not have to figure out how much room the string is taking.

    When using varchars, the administrative processes used to optimize the database, have to be run much more frequently.


    The concerns that I put forth were…

    For application development when you have to trim a string you actually create another sting object – this is wasteful and takes up memory and CPU cycles

    If a char is not null and a single char is entered, then all spaces within that character sting are used as well

    Thru put Performance can suffer if a large number of fixed character strings containing quite a few empty spaces are streamed over the wire as opposed to the more compressed stream if varchars were being utilized.

    There has already been some code laid down (EJB) on an existing schema containing varchars.


    My questions are there
    1) Is there anything to the points put forth by the DBA?
    2) Should varchars be avoided?
    3) Could you point me to any white papers concerning any of the above?

    Thank you, Tom Bigbee

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    I am not sure which platform you are referring to. Normally mainframe means OS/390 OS, but I don't think version 8.1 is available yet on that platform (Maybe I am wrong?). But here are answers to the best of my knowledge. Others may have more detailed insights in these issues.

    When a single varchar field that is not physically located at the very end of a file, is updated within a table, the entire row is copied into the log file. If a single char field physically located anywhere in the file, is updated, then only that char field gets copied to the log file.

    When the length of varchar data is changed (the number of bytes that are used is changed) DB2 has to log from that point to the end of row.

    Varchars should not be used within an index.

    Varchars can be used, but it is not as efficient because of the extra 2 bytes, and extra CPU processing. Since indexes are smaller than tables, the extra 2 bytes is larger percent of the index compared to the table (assuming that not all columns are varchar). Keeping the indexes smaller also increases the chances of having them stay in the buffer pool.

    Varchars automatically take up two extra bytes; therefore they waste space.

    Yes, 2 bytes (half-word binary) to indicate the length of used space in the column.

    Chars take up less CPU cycles because the database does not have to figure out how much room the string is taking.

    Yes.

    When using varchars, the administrative processes used to optimize the database, have to be run much more frequently.

    Not sure exactly what they are referring to, but one example is that when the data in a varchar is increased in size, the row many no longer fit in its old space, so it has to be moved to new page. This requires more frequent reorgs to be run.

    My questions are there
    1) Is there anything to the points put forth by the DBA?


    Yes.

    2) Should varchars be avoided?

    Should be avoided unless you are saving enough disk space to justify its use. Varchars should typically be used on large char fields that are often not fully filled with data and are not frequently updated.

    3) Could you point me to any white papers concerning any of the above?

    You can search the Master Index of the DB2 manuals, or possibly look for some IBM Redbooks that might discuss this. I don't have any specific reference.

  3. #3
    Join Date
    Jun 2003
    Location
    Canada
    Posts
    37
    Some programming languages (such as Java) do not pad, so trailing blanks are considered significant characters.

    If you use CHAR, and you have defined a string of CHAR(4), 'ABC' is not equal to 'ABC '. You have to use a trim() or equivalent method to eliminate the trailing blanks. This is a burden on the application programmer and is an obvious source of application coding errors. In addition, this operation increases the application CPU cost (typically, by more than the cost of using VARCHAR in DB2). However, this cost is only incurred by non-host applications (no impact on accessing the same data from CICS, QMF, etc.)

    If you use VARCHAR, the in-DB2 cost will be somewhat higher, as Marcus pointed out. This cost will be paid by all applications that access the data (Java, CICS, COBOL, QMF/TSO, etc.). The advantage of this approach is that it makes life easier for the (Java, etc.) application programmer.

    The decision depends on project priorities.

    Cheers,

    Julius

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    If you are talking about using varchar for everything, I think you have to calculate the amount of extra space needed by varchar in DB2 (2 bytes per column). This decreases the number of rows per page, which decreases the amount of data that will fit in DB2 bufferpools of both the data and indexes. Bufferpool utilization is the key to good performance in DB2. So it may not just be a matter of extra CPU time, but also a lot of extra disk I/O to account for less data that will fit in the bufferpools.

    As previously discussed, if you update a varchar column in DB2 that is larger than the previous value, the row will probably not fit back in the same location. If this happens a lot, you are talking about serious performance hits. Logging will also be slowed down as DB2 is required to write more data to the log buffer and the active logs on disk.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •