If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > scalar functions like SUBSTR for multi-byte character sets

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-02-06, 23:27
dahalkar_p dahalkar_p is offline
Registered User
 
Join Date: Oct 2001
Location: Bangalore
Posts: 186
scalar functions like SUBSTR for multi-byte character sets

Hi,

Please refer to the below problem:

Problem

We need to deal with multi-byte character sets like Japanese, Chinese etc. Our application uses built-in string scalar functions like SUBSTR , LENGTH etc. However these functions are dependent on bytes rather than characters.

For example SELECT SUBSTR (‘你好’, 1, 3) FROM SYSIBM.SYSDUMMY1 returns 你 (only one character since each Chinese character in this implementation is being represented by three bytes) while SELECT SUBSTR(‘Hello’, 1, 3) FROM SYSIBM.SYSDUMMY1 returns ‘Hel’ (each English character being represented by a single byte.).

Oracle 9i has introduced some Unicode compliant functions like INSTRC, INSTR2, INSTR4, SUBSTRC, SUBSTR2, and SUBSTR4 which are capable of understanding the fact that a particular sequence of bytes actually represents a single character in reality. For example, the standard LENGTH function will interpret code point 0x0061 followed by code point 0x0303 as two separate characters. The LENGTHC function, because it is designed for use with Unicode, will recognize that in Unicode, 0x0061 followed by 0x0303 really represents just one character.

Do we have similar functions in DB2 UDB for LUW? I have searched a lot on the net but could not find such functions in DB2 UDB. Just wanted to ask you whether such functions exist in DB2 UDB. Else how do we deal with such a scenario in DB2 UDB.
__________________
Prashant
Reply With Quote
  #2 (permalink)  
Old 01-03-06, 02:36
przytula przytula is offline
Registered User
 
Join Date: Nov 2004
Posts: 374
unicode

if charset is UTF-8 and columns are defined graphic/vargraphic..
the functions like substr work correctly on character level and not byte level.
Otherwise you will get byte substr and is only part of the character..
__________________
Best Regards, Guy Przytula
DB2/ORA/SQL Services
DB2 DBA & Advanced DBA Certified
DB2 Dprop Certified
http://users.skynet.be/przytula/dbss.html
Reply With Quote
  #3 (permalink)  
Old 01-03-06, 04:18
juliane26 juliane26 is offline
Registered User
 
Join Date: Oct 2005
Posts: 109
Hi Prashant,

I would apreciate you to update the post in case you find out anything else on the topic.

We are struggling with moving to unicode a lot. Former DDL definitions (like varchar(10) ) don't work anymore when moving from a german or whatever codepage to unicode when having language characters with more than one byte. So previous used DDL and code don't work anymore when moving to unicode - same like you observed. This is rather annoying but unfortunately SQL Standard conform.

As a result we rather keep language codepages if possible and wait and hope for new development.

Changing everything to graphic would be a huge development and migration effort and also space requirements would change a lot (double for all chars if I remember correctly) - til today that has been avoided.
__________________
Juliane
Reply With Quote
  #4 (permalink)  
Old 01-03-06, 23:26
dahalkar_p dahalkar_p is offline
Registered User
 
Join Date: Oct 2001
Location: Bangalore
Posts: 186
Hi Juliane,

I agree changing everything would be a huge effort + lot of disk space usage, but as of now IBM does not provide any alternative functions which Oracle 9i does.

The other thing which concerns me is, will sorting and indexing take more time with Vargraphs and result in a degradation of performance.

I will try to get the answers on new development from IBM guys and then update the forum.
__________________
Prashant
Reply With Quote
  #5 (permalink)  
Old 01-04-06, 00:07
dahalkar_p dahalkar_p is offline
Registered User
 
Join Date: Oct 2001
Location: Bangalore
Posts: 186
Hi,

Just found out something on indexes:

Prior to version 8, indexes created on such columns did not allow INDEX ONLY access as the column data was padded in the index value and the table was scanned but now in version 8 we have a new keyword to be specified in index definition. The word is PADDED or NOT PADDED which takes care of padding the column data in the index data or not padding the same. NOT PADDING makes index only scan work for these columns like any other column and also saves lot of space.
__________________
Prashant
Reply With Quote
  #6 (permalink)  
Old 01-04-06, 05:45
jongdel jongdel is offline
Registered User
 
Join Date: Jan 2006
Location: Zeist - the Netherlands
Posts: 3
Prashant,

I guess you are talking DB2 on mainframe?
In that case i hope you are at Db2 version 8, because in this version a number of string functions have a extra operand, which i think should solve this problem.
For example: SUBSTRING(C1,1,21,CODEUNITS16)

Hope this helps.

leo
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On