If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > DB2 v8.1 fp8 and UTF8 code page - value 'x' is too long problem

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-02-05, 08:34
grofaty grofaty is offline
Registered User
 
Join Date: Jan 2003
Posts: 1,570
DB2 v8.1 fp8 and UTF8 code page - value 'x' is too long problem

Hi,

My system: db2 v8.1 fp8 on Windows 2000. I created database with code page UTF8

CREATE DATABASE UTF4 ON 'C:' USING CODESET UTF-8 TERRITORY SI COLLATE USING UCA400_NO;

I created table with 1 character and if special character is inserted into database error appears.

Code:
db2 create table db2x.test (a char(1))
DB20000I  The SQL command completed successfully.

db2 insert into db2x.test values ('a')
DB20000I  The SQL command completed successfully.

db2 insert into db2x.test values ('č')
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL0433N  Value "č" is too long.  SQLSTATE=22001
When I created table with 2 character and only 1 special character is inserted into database, then everything is ok.

Code:
db2 create table db2y.test (a char(2))
DB20000I  The SQL command completed successfully.

db2 insert into db2y.test values ('a')
DB20000I  The SQL command completed successfully.

db2 insert into db2y.test values ('č')
DB20000I  The SQL command completed successfully.

Why char(2) has to be defined if only 1 character is inserted? Is this somekind of bug or I missed something?

Thanks,
Grofaty
Reply With Quote
  #2 (permalink)  
Old 03-02-05, 09:32
J Petruk J Petruk is offline
Registered User
 
Join Date: Mar 2004
Location: Toronto, ON, Canada
Posts: 513
What tool are you using to execute that SQL? It's interesting, I tried it from the db2 clp (DOS) and it wouldn't take the č character, converted it to a regular "c". When I tried from the Control Center's Command Editor it bombed and exitted!

So it might not be the database engine that's having the issue, but rather the interface to it...
__________________
--
Jonathan Petruk
DB2 Database Consultant
Reply With Quote
  #3 (permalink)  
Old 03-02-05, 10:23
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by grofaty

Why char(2) has to be defined if only 1 character is inserted? Is this somekind of bug or I missed something?
Looks like the latter :-)

Length of character columns is defined in bytes, not characters; to make sure that any UTF-8 character fits in the column it should be defined as CHAR(4).

Quote:
When working with character string data in UTF-8, one should not assume that each character is one byte. In multibyte UTF-8 encoding, each ASCII character is one byte, but non-ASCII characters take two to four bytes each. This should be taken into account when defining CHAR fields. Depending on the ratio of ASCII to non-ASCII characters, a CHAR field of size n bytes can contain anywhere from n/4 to n characters.
The rest is here:
http://publib.boulder.ibm.com/infoce...n/c0004846.htm
Reply With Quote
  #4 (permalink)  
Old 03-02-05, 10:26
J Petruk J Petruk is offline
Registered User
 
Join Date: Mar 2004
Location: Toronto, ON, Canada
Posts: 513
Interesting... intuitively that makes sense since DB2 has to know how much space on a page something like CHAR(4) will take.
__________________
--
Jonathan Petruk
DB2 Database Consultant

Last edited by J Petruk; 03-02-05 at 10:29.
Reply With Quote
  #5 (permalink)  
Old 03-02-05, 10:29
J Petruk J Petruk is offline
Registered User
 
Join Date: Mar 2004
Location: Toronto, ON, Canada
Posts: 513
Poses some interesting problems as I think a bit more about it...

For instance if you have a field that should never be more than 4 characters, you have to define it as 16, which could then allow an application to stick 16 single byte characters into it.

In other words, the business rule "field A should be 4 characters or less" can't be implemented through the data typing in UTF-8... you'd have to jump through other hoops like a trigger or something to enforce it.
__________________
--
Jonathan Petruk
DB2 Database Consultant
Reply With Quote
  #6 (permalink)  
Old 03-02-05, 14:59
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
I think that a simple check constraint should work: as far as I understand LENGTH() returns the number of characters in a string, not the number of bytes.
Reply With Quote
  #7 (permalink)  
Old 03-02-05, 22:58
jacampbell jacampbell is offline
Registered User
 
Join Date: Jan 2005
Posts: 191
Actually one should use varchar - to prevent DB2 padding the column with spaces.

But yes, non intuitive.

James Campbell
Reply With Quote
  #8 (permalink)  
Old 03-03-05, 01:13
grofaty grofaty is offline
Registered User
 
Join Date: Jan 2003
Posts: 1,570
Hi,

n_i: that is bad news!!! Lets look at the whole problem: I have database with code page 1250 and thousand of tables in this database. But I must migrate to UTF8! So what should I do? Should I change definitions (from 1 char to 2 char tables) for thousand tables? Isn't that little stupid?

J Petruk: I am using DB2 Command Window. In dos window you have to execute command chcp 1250 to get correct code page to insert such a character.

Thanks,
Grofaty
Reply With Quote
  #9 (permalink)  
Old 03-03-05, 02:07
sathyaram_s sathyaram_s is offline
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,534
grofaty ... You may find the following page 'Unicode handling of data types' useful
http://publib.boulder.ibm.com/infoce...n/c0004846.htm
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #10 (permalink)  
Old 03-03-05, 08:34
grofaty grofaty is offline
Registered User
 
Join Date: Jan 2003
Posts: 1,570
sathyaram_s, I think you posted the same link as n_i, did you?
Reply With Quote
  #11 (permalink)  
Old 03-03-05, 08:53
sathyaram_s sathyaram_s is offline
Super Moderator
 
Join Date: Aug 2001
Location: UK
Posts: 4,534
I think I did ....

But, have a read through the pages ... It talks about the functions which you'll have to be careful about ... For eg. substr works on byte basis etc

Cheers
Sathyaram
__________________
Visit the new-look IDUG Website , register to gain access to the excellent content.
Reply With Quote
  #12 (permalink)  
Old 03-03-05, 10:12
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by grofaty
But I must migrate to UTF8! So what should I do? Should I change definitions (from 1 char to 2 char tables) for thousand tables? Isn't that little stupid?
Well, that's the fun of migration for you :-)

What if you migrate from Slovenia to Tahiti, for example? Should you learn thousands of new words and grammar rules? Of course you should. Is that stupid? I don't think so.

:-)
Reply With Quote
  #13 (permalink)  
Old 03-04-05, 01:07
grofaty grofaty is offline
Registered User
 
Join Date: Jan 2003
Posts: 1,570
n_i: I thought with stupid: I will need to migrate and have a lot of work for little benefit. If I would learn Tahiti, I would problably have good reason, e.g. some beautiful lady.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On