Results 1 to 10 of 10
  1. #1
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    20

    Question Unanswered: Insertion of Spanish Text from Control Center, Data Studio, SSIS fails

    Hello

    We are getting the below error while inserting Spanish data through any application such as SSIS, Control Center, Data Studio. The column length is VARCHAR(40), the code page is 1208 and database code set is utf-8.

    ERROR:
    ------
    Value "23 bis chemin de la Humère,Z.I. Saint E" is too long.. SQLCODE=-433, SQLSTATE=22001

    (The data in the above error has truncated the last character 't'. The complete data is: '23 bis chemin de la Humère,Z.I. Saint Et')

    The same above value is inserted successfully using putty. When the value is inserted using putty and then executing SELECT statement on Data Studio is giving NULL Values.

    The results of INSERT & SELECT queries are in the attachment attached with this post.

    The same value inserted using Putty:

    --------------------------------------------------------------------------
    db2 => insert into EMP_SPAIN_MASTER values(18663, '23 bis chemin de la Humère,Z.I. Saint Et')!
    DB20000I The SQL command completed successfully.
    db2 =>
    --------------------------------------------------------------------------

    Although not sure but, what I feel is the applications inserting multilingual data in DB2 takes extra byte whereas inserting in DB2 using putty does not do the same.

    Could anybody suggest solution to this?

    Thanks
    Satyajit
    Attached Files Attached Files
    Last edited by satyajit; 12-28-11 at 08:58.

  2. #2
    Join Date
    Jan 2010
    Posts
    335
    Hi,

    a character in UTF-8 can be up to 4 Bytes long.
    However the length of Varchar-Field is specified in bytes. Not in characters. So in worst case you could only store 10 Characters in the Varchar(40).

    Hth

  3. #3
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    20
    Hello

    If this is the case then how come I'm able to store the exact line (23 bis chemin de la Humère,Z.I. Saint Et) directly on the server using Putty but, cannot insert using Clients like IBM Data Studio, DB2 Control Center or MS SSIS?

    It should be uniform throughout...the same data should also be inserted using the DB2 clients.

    Is there any issue with the UTF-8 conversion while data is being moved from DB2 Clients on Windows to Unix server or any known issues while inserting data using these clients?

    Thanks
    Satyajit
    Last edited by satyajit; 12-28-11 at 12:17. Reason: Spelling mistake

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    putty may be translating UTF-8 into the extended ASCII codes, which include accented characters. IBM tools are Java-based, meaning they use UTF-8 internally.

  5. #5
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    20
    Hello

    Even if Putty converts DB2 will anyways save it s UTF-8 format not Extended ASCII.

    And if this is the case then this would be an issue of incompatibility between DB2 Clients and DB2 Server which is not how the interfaces should work. Just to insert data using DB2 client we cannot or rather no company will change their database architecture. There must be some way out by which the UTF-8 characters can be inserted using the clients.

    Any more suggestions or solution to this issue or any setting to be done at the DB2 client side?

    Regards
    Satyajit

  6. #6
    Join Date
    Jan 2010
    Posts
    335
    Ok,

    did some testing on a Linux-Box. Don't have a Solaris or AIX Box for 1208.
    >create database OOPSDB ... USING CODESET UTF-8 TERRITORY ES PAGESIZE 4 k
    >create table test.oopstab ( id int, Value varchar(40) )

    Afterwards i inserted the Row via Putty and Data Studio. Then i exported the data a checked the output in a HEX-Editor for the è.

    Data Studio inserted "C3 A8". And from the Command line it was "E8", which is the Unicode code point for è (see Unicode/UTF-8-character table). From my point of view Data Studio is working correct.

    With the double-Byte representation for the Character you hit the Limit of the Varchar-Field.

  7. #7
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by satyajit View Post

    Even if Putty converts DB2 will anyways save it s UTF-8 format not Extended ASCII.
    I think you misunderstand what happens. If putty is not set to handle UTF-8 characters, it's not UTF-8 what is being inserted, but something else what putty now has instead of UTF-8 characters. DB2 cannot possibly guess what it was before putty translated it.

    Consider this:
    Code:
    ## putty translation set to ISO-8859-1
    
    ~> db2 "insert into test values (1, '23 bis chemin de la Humère,Z.I. Saint Et')"
    insert into test values (1, '23 bis chemin de la Humère,Z.I. Saint Et')
    DB20000I  The SQL command completed successfully.
    
    
    :~> echo " insert into test values (2, '23 bis chemin de la Humère,Z.I. Saint Et')" | od -ah
    0000000   i   n   s   e   r   t  sp   i   n   t   o  sp   t   e   s   t
            6e69 6573 7472 6920 746e 206f 6574 7473
    0000020  sp   v   a   l   u   e   s  sp   (   1   ,  sp   '   2   3  sp
            7620 6c61 6575 2073 3128 202c 3227 2033
    0000040   b   i   s  sp   c   h   e   m   i   n  sp   d   e  sp   l   a
            6962 2073 6863 6d65 6e69 6420 2065 616c
    0000060  sp   H   u   m   h   r   e   ,   Z   .   I   .  sp   S   a   i
            4820 6d75 72e8 2c65 2e5a 2e49 5320 6961
    0000100   n   t  sp   E   t   '   )  nl
            746e 4520 2774 0a29
    0000110
    
    
    ## putty translation set to UTF-8
    
    ~> db2 " insert into test values (2, '23 bis chemin de la Humère,Z.I. Saint Et')"
    insert into test values (2, '23 bis chemin de la Humère,Z.I. Saint Et')
    DB21034E  The command was processed as an SQL statement because it was not a
    valid Command Line Processor command.  During SQL processing it returned:
    SQL0433N  Value "23 bis chemin de la Humère,Z.I. Saint Et" is too long.
    SQLSTATE=22001
    
    ~> echo " insert into test values (2, '23 bis chemin de la Humère,Z.I. Saint Et')" | od -ah
    0000000  sp   i   n   s   e   r   t  sp   i   n   t   o  sp   t   e   s
            6920 736e 7265 2074 6e69 6f74 7420 7365
    0000020   t  sp   v   a   l   u   e   s  sp   (   2   ,  sp   '   2   3
            2074 6176 756c 7365 2820 2c32 2720 3332
    0000040  sp   b   i   s  sp   c   h   e   m   i   n  sp   d   e  sp   l
            6220 7369 6320 6568 696d 206e 6564 6c20
    0000060   a  sp   H   u   m   C   (   r   e   ,   Z   .   I   .  sp   S
            2061 7548 c36d 72a8 2c65 2e5a 2e49 5320
    0000100   a   i   n   t  sp   E   t   '   )  nl
            6961 746e 4520 2774 0a29
    0000112

  8. #8
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    20
    Hello

    Thanks for your response. Was away for a while so could not reply back. Can you suggest something that will enable to accept the U+00E8 instead of 'c3 a8' which is where it exploits the length of column. Will changing the char set to ISO-8859-1 have any effect on rest of the languages?

    Should we change the char set on the systems that use DB2 clients to load data in the DB2 server(In this case Windows server that contains SSIS installed on it and our local machine that has clients such as Data Studio, Control Center, Toad).

    Your advice and suggestion is highly appreciated.

    Thanks & Regards
    Satyajit

  9. #9
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    20
    Hello

    One more thing that I would like to share is that the DB2 database that we have is created using UTF-8. While going through the Unicode/UTF-8-character table I found that the UTF-8 Hex code for è is 'c3 a8'. If I'm not wrong it will take 4 bytes. Currently we came to a conclusion that this database has to be created with UTF-16 but, lot of space will be wasted.

    IBM recommended to use VARGRAPHIC instead of VARCHAR.

    We are still investigating of possible ways to accommodate the latin characters having letter with grave, ring, acute.

    Thanks & Regards
    Satyajit

  10. #10
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    20

    Insertion of Latin Characters issue resolved

    Hello

    I just figured out that we need to change the code page of the Windows OS. It was using 437 (IBM437 OEM United States) which was not UTF-8. Hence it was sending the Hex characters when Data Studio was being used to insert data. This in turn would result in 'value too long error message'.

    I've changed the OS code page to 65001 (Unicode UTF-8) used by Microsoft for UTF-8. The Data studio did work fine in inserting data. Now we are working on the SSIS server to change the same and check if we can insert Latin Data.

    Steps (Type the below commands on Windows Command Prompt):
    --------------------------------------------------------------
    Check Existing Code Page:
    mode con cp

    Change Windows Code Page:
    mode con cp select=65001

    Code Page Reference:
    Code Page Identifiers

    Will reply again once SSIS works fine with the above settings.

    Thanks for your help!


    Thanks & Regards
    Satyajit

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •