Page 1 of 3 123 LastLast
Results 1 to 15 of 32

Thread: Char & Varchar2

  1. #1
    Join Date
    Jun 2008
    Location
    India
    Posts
    96

    Unanswered: Char & Varchar2

    Hi All,

    I want to know the memory (no of bytes to store the data) difference between CHAR & VARCHAR2 datatype.

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    A CHAR(n) always stores the full n characters, with blanks to pad out the end if required. A VARCHAR2(n) holds 0 to n characters - the V standard for "variable length". The DUMP function shows this:
    Code:
    SQL> create table t1 (c char(100), v varchar2(100));
    
    Table created.
    
    SQL> insert into t1 values ('a', 'b');
    
    1 row created.
    
    SQL> select dump(c), dump(v) from t1;
    
    DUMP(C)
    ----------------------------------------------------------------------------------------------------
    DUMP(V)
    ----------------------------------------------------------------------------------------------------
    Typ=96 Len=100: 97,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,
    ,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,
    ,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32
    Typ=1 Len=1: 98
    As a rule you should always use VARCHAR2, there is no benefit in using CHAR (but plenty of drawbacks).

  3. #3
    Join Date
    Jun 2008
    Location
    India
    Posts
    96
    Suppose if i want to store flag values either 'Y' or 'N'. Here always the length be 1 only. in this case, do i use CHAR() or VARCHAR2() ?

  4. #4
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    VARCHAR2. As I said, always use VARCHAR2.

    True, in this case the 2 will be identical - but that means there is no advantage in using the non-standard CHAR for such columns.

  5. #5
    Join Date
    Jun 2008
    Location
    India
    Posts
    96
    #1)

    create table flag_val
    (
    flag char(1)
    )

    insert into t1 values ('Y')
    insert into t1 values ('N')


    #2)

    create table flag_val
    (
    flag varchar2(1)
    )

    insert into t1 values ('Y')
    insert into t1 values ('N')


    which one would you prefer? why?

  6. #6
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    I would prefer VARCHAR2, because there's nothing special about a 1 character field, so why use a different datatype for them? But as I said, the two will be identical in reality, so feel free to use CHAR(1) if it makes you happier to do so.

  7. #7
    Join Date
    Mar 2009
    Location
    Sydney, Australia
    Posts
    258
    (I was referred to this thread by a Sybase reader.)

    I realise this is the Oracle, not the Sybase, forum.

    Storage & Processing

    In Sybase ASE, each var-length column takes 4 additional bytes (to store column position plus length); each var-length row requires a Column Offset table at the end of each row (an additional 4 bytes per row on top of the 4 bytes per column). It is not physically possible to manipulate these structures without additional processing overhead (over fixed-length, which does not require these structures, as the row offset in the page and the column offset in each row, is fixed, and known).

    There are further processing overheads such as (just one example) resulting from row-size changes due to updates, which requires other rows to be shifted on the page upon one row being updated. Plus different handling methods for the different lock schemes. Plus Direct vs Deferred Writes should also be considered. All this is documented in the P&T Basics manual, Ch 8 Data Storage.

    Therefore, for short columns, var-length instead of char takes more storage and performs much slower than fixed-length.

    For wide columns, no doubt there is a savings in overall storage for the table (actual vs average vs max) obtained from var-length over fixed-length, but that must be evaluated against the negative performance impact of var-length. If performance is a criterion, I would think that disk space is cheap these days.

    Index

    If the var-length column is indexed, there is an additional order of magnitude of overhead: every index entry in the B-Tree has to be (a) stored with the 4 additional bytes and (b) unpacked on every access (think about selecting say 2% of the rows via the index, which requires all 100% of the index entries to be unpacked and evaluated). Therefore indices on var-length columns should be avoided at all costs.

    Sybase and DB2 are the same in this regard. While not being an Oracle expert, I imagine that Oracle requires similar additional structures for var-length columns and rows; I cannot imagine that it can manipulate var-length columns and rows the same as, or faster, than fixed-length columns.

  8. #8
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Derek,

    Thanks for the info about how Sybase works. However, nothing you said is applicable to Oracle. In Oracle:
    • The storage requirements to hold an n character string in CHAR(n) and VARCHAR2(n) are identical for all n (including n=1). Yes, Oracle stores the length in a few (1 to 3) bytes - for both VARCHAR2 and CHAR columns. For strings of length m < n, VARCHAR2 will be smaller (1 to 3 length bytes + m characters rather than 1 to 3 bytes + n characters).
    • The performance will be the same or better with VARCHAR2 rather than CHAR, since CHAR by definition wastes space if the stored strings are smaller than the maximum size of the column, leading to larger table and index sizes.

    Nobody should use CHAR in Oracle based on inapplicable information regarding Sybase!

  9. #9
    Join Date
    Mar 2009
    Location
    Sydney, Australia
    Posts
    258
    I will not dispute your Oracle knowledge.

    But one thing sticks in my throat. While I accept that "CHAR wastes space if the stored strings are smaller than the maximum size of the column", I fail to understand how that gets extrapolated into "The performance will be the same or better with VARCHAR2 rather than CHAR". With var-anything, there is substantial additional processing; even if we accept that Oracle adds this unnecessary overhead for fixed-length columns as thus the processing is the "same" ... how does more or less space on disk (calling it "waste" is misleading) make it perform "same or better".

    Might be good if you have an example of a large tables with the actual sizes for index trees, clustered index, data (heaps). A few indices with and without varchar strings.
    Regards
    Derek Asirvadem (Formerly DerekA)
    Information Architect / Senior Sybase DBA
    Copyright 2009 Software Gems Pty Ltd

    I answer questions from the Original Poster only. If you have a genuine question, as the moderators have requested, start a new thread.

    http://www.softwaregems.com.au

  10. #10
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Quote Originally Posted by Derek Asirvadem
    how does more or less space on disk (calling it "waste" is misleading) make it perform "same or better".
    Say we have (very extreme case) a table T1 (COL CHAR(4000)) and a table T2 (COL VARCHAR2(4000)), and we insert 1 million rows into each with COL='X'. T1 will occupy (very roughly) 4 GB (around 500,000 blocks), T2 will occupy 2MB (around 250 blocks). Clearly a full table scan on T2 will be a lot faster than a full table scan on T1. Any indexes will be similarly different in size, leading to more b-tree levels in T1's indexes - leading to a (perhaps very small) performance advantage for T2.

    As I admitted already, that is an extreme case. But the best case for T1 is when we insert 4000-char values into both tables: at this point the table and index sizes, and performance, will be identical. So VARCHAR2 performs at worst the same, and in other cases better than CHAR. In Oracle.

    And if we are storing blank spaces that we don't really want, I think it's fair to call it wasted space.
    Might be good if you have an example of a large tables with the actual sizes for index trees, clustered index, data (heaps). A few indices with and without varchar strings.
    I agree, and I'll try to do that soon when time permits and post back here. There's no such thing as a "clustered index" in Oracle by the way. I could show the sizes (in 8K blocks) of the 2 tables and their indexes, and the depths of the indexes.

  11. #11
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Right, I said I'd report back with an example - here goes.

    To generate some test data I have used the Oracle ALL_OBJECTS view which has a row for every table, index, stored procedure etc. in the database. In my DB this currently has 85,409 rows. The column OBJECT_NAME contains object names (no surprise there), which are a maximum of 30 characters long. I found that the average name length is 19.26. Now I use this to populate two new tables:

    Code:
    SQL> create table t1 (id integer, name char(30));
    
    Table created.
    
    SQL> create table t2 (id integer, name varchar2(30));
    
    Table created.
    
    SQL> insert into t1 select object_id, object_name from all_objects;
    
    85409 rows created.
    
    SQL> insert into t2 select object_id, object_name from all_objects;
    
    85409 rows created.
    I add an index on name to each table:
    Code:
    SQL> create index t1_idx on t1(name);
    
    Index created.
    
    SQL> create index t2_idx on t2(name);
    
    Index created.
    Now I analyze the tables to generate statistics about them:
    Code:
    SQL> analyze table t1 compute statistics;
    
    Table analyzed.
    
    SQL> analyze table t2 compute statistics;
    
    Table analyzed.
    Now let's view the relevant table stats:
    Code:
    SQL> select table_name, num_rows, blocks, avg_row_len
      2  from user_tables
      3  where table_name in ('T1','T2');
    
    TABLE_NAME                       NUM_ROWS     BLOCKS AVG_ROW_LEN
    ------------------------------ ---------- ---------- -----------
    T1                                  85409        483          39
    T2                                  85409        360          28
    
    SQL> select index_name, blevel, leaf_blocks
      2  from user_indexes
      3  where table_name in ('T1','T2');
    
    INDEX_NAME                         BLEVEL LEAF_BLOCKS
    ------------------------------ ---------- -----------
    T1_IDX                                  2         503
    T2_IDX                                  2         373
    As we can see, the table and index with the CHAR(30) column are about 34% larger (number of blocks) than the VARCHAR2(30) ones. The number of levels in the indexes is the same, 2.

    I would expect a full table scan of T1 to take approximately 34% longer than a full scan of T2 (in fact I just tried and it took 25% longer: 3m15s v. 2m36s). Indexed lookup access should be the same in either case, but an index range scan on T1 may take longer than on T2.

    Does that answer your queries, Derek? I'm happy to provide more info if you want.

  12. #12
    Join Date
    Mar 2009
    Location
    Sydney, Australia
    Posts
    258
    Thanks, Tony. It is always good to cross over the floor and learn a little about how Oracle does things.

    BTW, I do not wish to hijack the thread, but my Oracle colleagues tell me there is most certainly a Clustered Index in Oracle. Although the lower level implemenation is no doubt different, the purpose is the same, and therefore the identical name conveys the correct meaning. They had a bit of an argument re which vendor provided it first. Google provides many his.
    Regards
    Derek Asirvadem (Formerly DerekA)
    Information Architect / Senior Sybase DBA
    Copyright 2009 Software Gems Pty Ltd

    I answer questions from the Original Poster only. If you have a genuine question, as the moderators have requested, start a new thread.

    http://www.softwaregems.com.au

  13. #13
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Derek Asirvadem

    If the var-length column is indexed, there is an additional order of magnitude of overhead: every index entry in the B-Tree has to be (a) stored with the 4 additional bytes and (b) unpacked on every access (think about selecting say 2% of the rows via the index, which requires all 100% of the index entries to be unpacked and evaluated). Therefore indices on var-length columns should be avoided at all costs.

    Sybase and DB2 are the same in this regard.
    Derek, you are wrong saying that DB2 behaves in the same manner.

    The string length does not require 4 bytes, only two, an overhead of 1 byte over CHAR. There is no "unpacking" on every access. Selecting 2% of the rows via index does not require 100% of the index entries to be evaluated. Etc. In other words, in DB2 the argument for VARCHAR vs. CHAR would be exactly the same as in Oracle.
    ---
    "It does not work" is not a valid problem statement.

  14. #14
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Quote Originally Posted by Derek Asirvadem
    my Oracle colleagues tell me there is most certainly a Clustered Index in Oracle
    Perhaps they mean Index-Organized Tables (IOTs)? Or do they call an index "clustered" if it the base table happens to have its rows physically stored in the order of the keys of the index? There certainly isn't a type of index called a "clustered index" as such in Oracle. See this AskTom thread for a comparison of Oracle IOTs with Sybase clustered indexes.

  15. #15
    Join Date
    Nov 2003
    Posts
    2,932
    Provided Answers: 12
    Quote Originally Posted by andrewst
    Or do they call an index "clustered" if it the base table happens to have its rows physically stored in the order of the keys of the index?
    There is a thing called the "clustering factor" which indicates how closely the physical ordering of the rows related to the order in an index.
    But you are right in that there is no such thing as a clustered index in Oracle.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •