If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Varchar / Vargraphic problem

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 06-28-07, 02:39
nick.ncs nick.ncs is offline
Registered User
 
Join Date: May 2007
Location: somewhere in dbforums
Posts: 221
Varchar / Vargraphic problem

I am currently migrating from SQL server to DB2. Quite a few of my fields in my database contain japanese characters. so i have selected the appropriate codeset in DB2.

Now what is happening is in some rows what japanese text used to get stored in VARCHAR(64) in my SQL server table cannot be imported into DB2....where i have the data type as VARCHAR(64)....The problem it says is the string is too long.

On converting the DB2 field to VARGRAPHIC(64) or VARCHAR(128) everything works fine....

but i tried it out SQL server too stores the japanese text as double byte so why is it that it could be stored as VARCHAR(64) in SQL server and not in DB2......why is that i require a larger space to store in DB2

on running the length operation both gave me length as 25.... so fundamentally japanese text should fit into 50 bytes ( taking into consideration the double byte ) but it is not fitting into even varchar(64)
Reply With Quote
  #2 (permalink)  
Old 06-28-07, 10:26
guyprzytula guyprzytula is offline
Registered User
 
Join Date: Jun 2006
Posts: 471
some characters use 2-3 bytes to be stored so 20 char can not be stored in char(20)
using graphic or vargraphic it will always store 20 characters in graphic(20)
but it will use more bytes physically
also the functions substr - length operate correctly on graphic/vargraphic
substr in char filed could return halve a character - with graphic it will take the correct character completely
__________________
Best Regards, Guy Przytula
DB2 UDB LUW certified V6/7/8
Reply With Quote
  #3 (permalink)  
Old 02-24-12, 03:00
satyajit satyajit is offline
Registered User
 
Join Date: Dec 2011
Posts: 10
Hello

We are facing the same issue with Spanish characters. Latin languages and some other languages like Japanese or Chinese use more than 2 Byte to store data. UTF-8 uses 2 Byte and UTF-16 uses 4 bytes. IBM has advised us to use Vargraphic instead of Varchar.

Thanks & Regards
Satyajit
Reply With Quote
  #4 (permalink)  
Old 02-24-12, 05:03
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
UTF-8 uses different number of bytes for characters (depending on the character). This can be just 1 byte (e.g. for ASCII) and can be as long as 4 bytes. With UTF-16, you have either 2 bytes or 4 bytes per character.

So whether 25 Japanese characters fit into 50 bytes or not really depends on the UTF-8 encoding of those characters. I don't know the Japanese character set well enough to comment further. As for the length, how did you determine that? It should be something like this, to know how many bytes were needed:
Code:
VALUES LENGTH(VARCHAR(..., 500), OCTETS)
Finally, if you say VARCHAR(64), it means you can store strings of a length up to 64 bytes. The number of characters in that string will be 64 or less (depending on the characters). Likewise, if you use VARGRAPHIC(64), you can store strings up to 64 double-bytes. Again, the number of characters in such strings will be restricted to 64 - or less (in case you have characters that require 2 double-bytes for their representation).
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #5 (permalink)  
Old 02-24-12, 07:29
satyajit satyajit is offline
Registered User
 
Join Date: Dec 2011
Posts: 10
Latin Characters Solution Found!!!

Hello

I've figured out the issue at my end.

I just figured out that we need to change the code page of the Windows OS. It was using 437 (IBM437 OEM United States) which was not UTF-8. Hence it was sending the Hex characters when Data Studio was being used to insert data. This in turn would result in 'value too long error message'.

I've changed the OS code page to 65001 (Unicode UTF-8) used by Microsoft for UTF-8. The Data studio did work fine in inserting data. Now we are working on the SSIS server to change the same and check if we can insert Latin Data.

Steps (Type the below commands on Windows Command Prompt):
--------------------------------------------------------------
Check Existing Code Page:
mode con cp

Change Windows Code Page:
mode con cp select=65001

Code Page Reference:
Code Page Identifiers

Trying the same on SSIS server. I hope it works on it as well


Thanks & Regards
Satyajit
Reply With Quote
  #6 (permalink)  
Old 02-27-12, 01:29
satyajit satyajit is offline
Registered User
 
Join Date: Dec 2011
Posts: 10
Code Page Reference Link

Hello

Code Page Ref. Links:

Desktop Code Page Link:
Code Page Identifiers

Win 2003 Code Page Link:
Code Page Identifiers

Thanks & Regards
Satyajit
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On