Results 1 to 15 of 15
  1. #1
    Join Date
    Jan 2003
    Posts
    1,605

    Unanswered: DB2 v8.1 fp8 and UTF8 code page - value 'x' is too long problem

    Hi,

    My system: db2 v8.1 fp8 on Windows 2000. I created database with code page UTF8

    CREATE DATABASE UTF4 ON 'C:' USING CODESET UTF-8 TERRITORY SI COLLATE USING UCA400_NO;

    I created table with 1 character and if special character is inserted into database error appears.

    Code:
    db2 create table db2x.test (a char(1))
    DB20000I  The SQL command completed successfully.
    
    db2 insert into db2x.test values ('a')
    DB20000I  The SQL command completed successfully.
    
    db2 insert into db2x.test values ('č')
    DB21034E  The command was processed as an SQL statement because it was not a
    valid Command Line Processor command.  During SQL processing it returned:
    SQL0433N  Value "č" is too long.  SQLSTATE=22001
    When I created table with 2 character and only 1 special character is inserted into database, then everything is ok.

    Code:
    db2 create table db2y.test (a char(2))
    DB20000I  The SQL command completed successfully.
    
    db2 insert into db2y.test values ('a')
    DB20000I  The SQL command completed successfully.
    
    db2 insert into db2y.test values ('č')
    DB20000I  The SQL command completed successfully.

    Why char(2) has to be defined if only 1 character is inserted? Is this somekind of bug or I missed something?

    Thanks,
    Grofaty

  2. #2
    Join Date
    Mar 2004
    Location
    Toronto, ON, Canada
    Posts
    513
    What tool are you using to execute that SQL? It's interesting, I tried it from the db2 clp (DOS) and it wouldn't take the č character, converted it to a regular "c". When I tried from the Control Center's Command Editor it bombed and exitted!

    So it might not be the database engine that's having the issue, but rather the interface to it...
    --
    Jonathan Petruk
    DB2 Database Consultant

  3. #3
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by grofaty

    Why char(2) has to be defined if only 1 character is inserted? Is this somekind of bug or I missed something?
    Looks like the latter :-)

    Length of character columns is defined in bytes, not characters; to make sure that any UTF-8 character fits in the column it should be defined as CHAR(4).

    When working with character string data in UTF-8, one should not assume that each character is one byte. In multibyte UTF-8 encoding, each ASCII character is one byte, but non-ASCII characters take two to four bytes each. This should be taken into account when defining CHAR fields. Depending on the ratio of ASCII to non-ASCII characters, a CHAR field of size n bytes can contain anywhere from n/4 to n characters.
    The rest is here:
    http://publib.boulder.ibm.com/infoce...n/c0004846.htm

  4. #4
    Join Date
    Mar 2004
    Location
    Toronto, ON, Canada
    Posts
    513
    Interesting... intuitively that makes sense since DB2 has to know how much space on a page something like CHAR(4) will take.
    Last edited by J Petruk; 03-02-05 at 11:29.
    --
    Jonathan Petruk
    DB2 Database Consultant

  5. #5
    Join Date
    Mar 2004
    Location
    Toronto, ON, Canada
    Posts
    513
    Poses some interesting problems as I think a bit more about it...

    For instance if you have a field that should never be more than 4 characters, you have to define it as 16, which could then allow an application to stick 16 single byte characters into it.

    In other words, the business rule "field A should be 4 characters or less" can't be implemented through the data typing in UTF-8... you'd have to jump through other hoops like a trigger or something to enforce it.
    --
    Jonathan Petruk
    DB2 Database Consultant

  6. #6
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    I think that a simple check constraint should work: as far as I understand LENGTH() returns the number of characters in a string, not the number of bytes.

  7. #7
    Join Date
    Jan 2005
    Posts
    191
    Actually one should use varchar - to prevent DB2 padding the column with spaces.

    But yes, non intuitive.

    James Campbell

  8. #8
    Join Date
    Jan 2003
    Posts
    1,605
    Hi,

    n_i: that is bad news!!! Lets look at the whole problem: I have database with code page 1250 and thousand of tables in this database. But I must migrate to UTF8! So what should I do? Should I change definitions (from 1 char to 2 char tables) for thousand tables? Isn't that little stupid?

    J Petruk: I am using DB2 Command Window. In dos window you have to execute command chcp 1250 to get correct code page to insert such a character.

    Thanks,
    Grofaty

  9. #9
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650
    grofaty ... You may find the following page 'Unicode handling of data types' useful
    http://publib.boulder.ibm.com/infoce...n/c0004846.htm
    Visit the new-look IDUG Website , register to gain access to the excellent content.

  10. #10
    Join Date
    Jan 2003
    Posts
    1,605
    sathyaram_s, I think you posted the same link as n_i, did you?

  11. #11
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650
    I think I did ....

    But, have a read through the pages ... It talks about the functions which you'll have to be careful about ... For eg. substr works on byte basis etc

    Cheers
    Sathyaram
    Visit the new-look IDUG Website , register to gain access to the excellent content.

  12. #12
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by grofaty
    But I must migrate to UTF8! So what should I do? Should I change definitions (from 1 char to 2 char tables) for thousand tables? Isn't that little stupid?
    Well, that's the fun of migration for you :-)

    What if you migrate from Slovenia to Tahiti, for example? Should you learn thousands of new words and grammar rules? Of course you should. Is that stupid? I don't think so.

    :-)

  13. #13
    Join Date
    Jan 2003
    Posts
    1,605
    n_i: I thought with stupid: I will need to migrate and have a lot of work for little benefit. If I would learn Tahiti, I would problably have good reason, e.g. some beautiful lady.

  14. #14
    Join Date
    Oct 2005
    Posts
    1

    Converting my DB2 database to UTF8 !

    Hi grofaty,
    I would really like to get in touch with you to know how you managed the conversion since we are going the same operation here and it is a major problem that DB2 uses byte-oriented representation instead of caracter-oriented for most of the caracter handling.

    Thanks in advance,
    Jean-Michel

  15. #15
    Join Date
    Mar 2003
    Posts
    280
    Quote Originally Posted by albattros View Post
    Hi grofaty,
    I would really like to get in touch with you to know how you managed the conversion since we are going the same operation here and it is a major problem that DB2 uses byte-oriented representation instead of caracter-oriented for most of the caracter handling.

    Thanks in advance,
    Jean-Michel
    You might want to concider using graphic / vargraphic. It uses ucs-2 (predesessor of utf-16). In my experience most columns can be left as is because they will only contain 1-byte chars.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •