Results 1 to 6 of 6
  1. #1
    Join Date
    May 2007
    Location
    somewhere in dbforums
    Posts
    221

    Unanswered: Varchar / Vargraphic problem

    I am currently migrating from SQL server to DB2. Quite a few of my fields in my database contain japanese characters. so i have selected the appropriate codeset in DB2.

    Now what is happening is in some rows what japanese text used to get stored in VARCHAR(64) in my SQL server table cannot be imported into DB2....where i have the data type as VARCHAR(64)....The problem it says is the string is too long.

    On converting the DB2 field to VARGRAPHIC(64) or VARCHAR(128) everything works fine....

    but i tried it out SQL server too stores the japanese text as double byte so why is it that it could be stored as VARCHAR(64) in SQL server and not in DB2......why is that i require a larger space to store in DB2

    on running the length operation both gave me length as 25.... so fundamentally japanese text should fit into 50 bytes ( taking into consideration the double byte ) but it is not fitting into even varchar(64)

  2. #2
    Join Date
    Jun 2006
    Posts
    471
    some characters use 2-3 bytes to be stored so 20 char can not be stored in char(20)
    using graphic or vargraphic it will always store 20 characters in graphic(20)
    but it will use more bytes physically
    also the functions substr - length operate correctly on graphic/vargraphic
    substr in char filed could return halve a character - with graphic it will take the correct character completely
    Best Regards, Guy Przytula
    DB2 UDB LUW certified V6/7/8

  3. #3
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    20
    Hello

    We are facing the same issue with Spanish characters. Latin languages and some other languages like Japanese or Chinese use more than 2 Byte to store data. UTF-8 uses 2 Byte and UTF-16 uses 4 bytes. IBM has advised us to use Vargraphic instead of Varchar.

    Thanks & Regards
    Satyajit

  4. #4
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    UTF-8 uses different number of bytes for characters (depending on the character). This can be just 1 byte (e.g. for ASCII) and can be as long as 4 bytes. With UTF-16, you have either 2 bytes or 4 bytes per character.

    So whether 25 Japanese characters fit into 50 bytes or not really depends on the UTF-8 encoding of those characters. I don't know the Japanese character set well enough to comment further. As for the length, how did you determine that? It should be something like this, to know how many bytes were needed:
    Code:
    VALUES LENGTH(VARCHAR(..., 500), OCTETS)
    Finally, if you say VARCHAR(64), it means you can store strings of a length up to 64 bytes. The number of characters in that string will be 64 or less (depending on the characters). Likewise, if you use VARGRAPHIC(64), you can store strings up to 64 double-bytes. Again, the number of characters in such strings will be restricted to 64 - or less (in case you have characters that require 2 double-bytes for their representation).
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  5. #5
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    20

    Latin Characters Solution Found!!!

    Hello

    I've figured out the issue at my end.

    I just figured out that we need to change the code page of the Windows OS. It was using 437 (IBM437 OEM United States) which was not UTF-8. Hence it was sending the Hex characters when Data Studio was being used to insert data. This in turn would result in 'value too long error message'.

    I've changed the OS code page to 65001 (Unicode UTF-8) used by Microsoft for UTF-8. The Data studio did work fine in inserting data. Now we are working on the SSIS server to change the same and check if we can insert Latin Data.

    Steps (Type the below commands on Windows Command Prompt):
    --------------------------------------------------------------
    Check Existing Code Page:
    mode con cp

    Change Windows Code Page:
    mode con cp select=65001

    Code Page Reference:
    Code Page Identifiers

    Trying the same on SSIS server. I hope it works on it as well


    Thanks & Regards
    Satyajit

  6. #6
    Join Date
    Dec 2011
    Location
    Centurion, South Africa
    Posts
    20

    Code Page Reference Link

    Hello

    Code Page Ref. Links:

    Desktop Code Page Link:
    Code Page Identifiers

    Win 2003 Code Page Link:
    Code Page Identifiers

    Thanks & Regards
    Satyajit

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •