If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Insertion of Spanish Text from Control Center, Data Studio, SSIS fails

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 12-28-11, 06:19
satyajit satyajit is offline
Registered User
 
Join Date: Dec 2011
Posts: 10
Question Insertion of Spanish Text from Control Center, Data Studio, SSIS fails

Hello

We are getting the below error while inserting Spanish data through any application such as SSIS, Control Center, Data Studio. The column length is VARCHAR(40), the code page is 1208 and database code set is utf-8.

ERROR:
------
Value "23 bis chemin de la Humère,Z.I. Saint E" is too long.. SQLCODE=-433, SQLSTATE=22001

(The data in the above error has truncated the last character 't'. The complete data is: '23 bis chemin de la Humère,Z.I. Saint Et')

The same above value is inserted successfully using putty. When the value is inserted using putty and then executing SELECT statement on Data Studio is giving NULL Values.

The results of INSERT & SELECT queries are in the attachment attached with this post.

The same value inserted using Putty:

--------------------------------------------------------------------------
db2 => insert into EMP_SPAIN_MASTER values(18663, '23 bis chemin de la Humère,Z.I. Saint Et')!
DB20000I The SQL command completed successfully.
db2 =>
--------------------------------------------------------------------------

Although not sure but, what I feel is the applications inserting multilingual data in DB2 takes extra byte whereas inserting in DB2 using putty does not do the same.

Could anybody suggest solution to this?

Thanks
Satyajit
Attached Files
File Type: doc INSERT-SELECT Query Result.doc (28.5 KB, 2 views)

Last edited by satyajit; 12-28-11 at 07:58.
Reply With Quote
  #2 (permalink)  
Old 12-28-11, 08:49
nvk@vhv nvk@vhv is offline
Registered User
 
Join Date: Jan 2010
Posts: 294
Hi,

a character in UTF-8 can be up to 4 Bytes long.
However the length of Varchar-Field is specified in bytes. Not in characters. So in worst case you could only store 10 Characters in the Varchar(40).

Hth
Reply With Quote
  #3 (permalink)  
Old 12-28-11, 11:16
satyajit satyajit is offline
Registered User
 
Join Date: Dec 2011
Posts: 10
Hello

If this is the case then how come I'm able to store the exact line (23 bis chemin de la Humère,Z.I. Saint Et) directly on the server using Putty but, cannot insert using Clients like IBM Data Studio, DB2 Control Center or MS SSIS?

It should be uniform throughout...the same data should also be inserted using the DB2 clients.

Is there any issue with the UTF-8 conversion while data is being moved from DB2 Clients on Windows to Unix server or any known issues while inserting data using these clients?

Thanks
Satyajit

Last edited by satyajit; 12-28-11 at 11:17. Reason: Spelling mistake
Reply With Quote
  #4 (permalink)  
Old 12-28-11, 11:28
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
putty may be translating UTF-8 into the extended ASCII codes, which include accented characters. IBM tools are Java-based, meaning they use UTF-8 internally.
Reply With Quote
  #5 (permalink)  
Old 12-28-11, 11:45
satyajit satyajit is offline
Registered User
 
Join Date: Dec 2011
Posts: 10
Hello

Even if Putty converts DB2 will anyways save it s UTF-8 format not Extended ASCII.

And if this is the case then this would be an issue of incompatibility between DB2 Clients and DB2 Server which is not how the interfaces should work. Just to insert data using DB2 client we cannot or rather no company will change their database architecture. There must be some way out by which the UTF-8 characters can be inserted using the clients.

Any more suggestions or solution to this issue or any setting to be done at the DB2 client side?

Regards
Satyajit
Reply With Quote
  #6 (permalink)  
Old 12-28-11, 13:16
nvk@vhv nvk@vhv is offline
Registered User
 
Join Date: Jan 2010
Posts: 294
Ok,

did some testing on a Linux-Box. Don't have a Solaris or AIX Box for 1208.
>create database OOPSDB ... USING CODESET UTF-8 TERRITORY ES PAGESIZE 4 k
>create table test.oopstab ( id int, Value varchar(40) )

Afterwards i inserted the Row via Putty and Data Studio. Then i exported the data a checked the output in a HEX-Editor for the è.

Data Studio inserted "C3 A8". And from the Command line it was "E8", which is the Unicode code point for è (see Unicode/UTF-8-character table). From my point of view Data Studio is working correct.

With the double-Byte representation for the Character you hit the Limit of the Varchar-Field.
Reply With Quote
  #7 (permalink)  
Old 12-28-11, 14:33
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by satyajit View Post

Even if Putty converts DB2 will anyways save it s UTF-8 format not Extended ASCII.
I think you misunderstand what happens. If putty is not set to handle UTF-8 characters, it's not UTF-8 what is being inserted, but something else what putty now has instead of UTF-8 characters. DB2 cannot possibly guess what it was before putty translated it.

Consider this:
Code:
## putty translation set to ISO-8859-1

~> db2 "insert into test values (1, '23 bis chemin de la Humère,Z.I. Saint Et')"
insert into test values (1, '23 bis chemin de la Humère,Z.I. Saint Et')
DB20000I  The SQL command completed successfully.


:~> echo " insert into test values (2, '23 bis chemin de la Humère,Z.I. Saint Et')" | od -ah
0000000   i   n   s   e   r   t  sp   i   n   t   o  sp   t   e   s   t
        6e69 6573 7472 6920 746e 206f 6574 7473
0000020  sp   v   a   l   u   e   s  sp   (   1   ,  sp   '   2   3  sp
        7620 6c61 6575 2073 3128 202c 3227 2033
0000040   b   i   s  sp   c   h   e   m   i   n  sp   d   e  sp   l   a
        6962 2073 6863 6d65 6e69 6420 2065 616c
0000060  sp   H   u   m   h   r   e   ,   Z   .   I   .  sp   S   a   i
        4820 6d75 72e8 2c65 2e5a 2e49 5320 6961
0000100   n   t  sp   E   t   '   )  nl
        746e 4520 2774 0a29
0000110


## putty translation set to UTF-8

~> db2 " insert into test values (2, '23 bis chemin de la Humère,Z.I. Saint Et')"
insert into test values (2, '23 bis chemin de la Humère,Z.I. Saint Et')
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL0433N  Value "23 bis chemin de la Humère,Z.I. Saint Et" is too long.
SQLSTATE=22001

~> echo " insert into test values (2, '23 bis chemin de la Humère,Z.I. Saint Et')" | od -ah
0000000  sp   i   n   s   e   r   t  sp   i   n   t   o  sp   t   e   s
        6920 736e 7265 2074 6e69 6f74 7420 7365
0000020   t  sp   v   a   l   u   e   s  sp   (   2   ,  sp   '   2   3
        2074 6176 756c 7365 2820 2c32 2720 3332
0000040  sp   b   i   s  sp   c   h   e   m   i   n  sp   d   e  sp   l
        6220 7369 6320 6568 696d 206e 6564 6c20
0000060   a  sp   H   u   m   C   (   r   e   ,   Z   .   I   .  sp   S
        2061 7548 c36d 72a8 2c65 2e5a 2e49 5320
0000100   a   i   n   t  sp   E   t   '   )  nl
        6961 746e 4520 2774 0a29
0000112
Reply With Quote
  #8 (permalink)  
Old 02-23-12, 02:28
satyajit satyajit is offline
Registered User
 
Join Date: Dec 2011
Posts: 10
Hello

Thanks for your response. Was away for a while so could not reply back. Can you suggest something that will enable to accept the U+00E8 instead of 'c3 a8' which is where it exploits the length of column. Will changing the char set to ISO-8859-1 have any effect on rest of the languages?

Should we change the char set on the systems that use DB2 clients to load data in the DB2 server(In this case Windows server that contains SSIS installed on it and our local machine that has clients such as Data Studio, Control Center, Toad).

Your advice and suggestion is highly appreciated.

Thanks & Regards
Satyajit
Reply With Quote
  #9 (permalink)  
Old 02-24-12, 00:54
satyajit satyajit is offline
Registered User
 
Join Date: Dec 2011
Posts: 10
Hello

One more thing that I would like to share is that the DB2 database that we have is created using UTF-8. While going through the Unicode/UTF-8-character table I found that the UTF-8 Hex code for è is 'c3 a8'. If I'm not wrong it will take 4 bytes. Currently we came to a conclusion that this database has to be created with UTF-16 but, lot of space will be wasted.

IBM recommended to use VARGRAPHIC instead of VARCHAR.

We are still investigating of possible ways to accommodate the latin characters having letter with grave, ring, acute.

Thanks & Regards
Satyajit
Reply With Quote
  #10 (permalink)  
Old 02-24-12, 07:24
satyajit satyajit is offline
Registered User
 
Join Date: Dec 2011
Posts: 10
Insertion of Latin Characters issue resolved

Hello

I just figured out that we need to change the code page of the Windows OS. It was using 437 (IBM437 OEM United States) which was not UTF-8. Hence it was sending the Hex characters when Data Studio was being used to insert data. This in turn would result in 'value too long error message'.

I've changed the OS code page to 65001 (Unicode UTF-8) used by Microsoft for UTF-8. The Data studio did work fine in inserting data. Now we are working on the SSIS server to change the same and check if we can insert Latin Data.

Steps (Type the below commands on Windows Command Prompt):
--------------------------------------------------------------
Check Existing Code Page:
mode con cp

Change Windows Code Page:
mode con cp select=65001

Code Page Reference:
Code Page Identifiers

Will reply again once SSIS works fine with the above settings.

Thanks for your help!


Thanks & Regards
Satyajit
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On