Results 1 to 6 of 6
  1. #1
    Join Date
    Oct 2001
    Location
    Bangalore
    Posts
    186

    Unanswered: scalar functions like SUBSTR for multi-byte character sets

    Hi,

    Please refer to the below problem:

    Problem

    We need to deal with multi-byte character sets like Japanese, Chinese etc. Our application uses built-in string scalar functions like SUBSTR , LENGTH etc. However these functions are dependent on bytes rather than characters.

    For example SELECT SUBSTR (‘你好’, 1, 3) FROM SYSIBM.SYSDUMMY1 returns 你 (only one character since each Chinese character in this implementation is being represented by three bytes) while SELECT SUBSTR(‘Hello’, 1, 3) FROM SYSIBM.SYSDUMMY1 returns ‘Hel’ (each English character being represented by a single byte.).

    Oracle 9i has introduced some Unicode compliant functions like INSTRC, INSTR2, INSTR4, SUBSTRC, SUBSTR2, and SUBSTR4 which are capable of understanding the fact that a particular sequence of bytes actually represents a single character in reality. For example, the standard LENGTH function will interpret code point 0x0061 followed by code point 0x0303 as two separate characters. The LENGTHC function, because it is designed for use with Unicode, will recognize that in Unicode, 0x0061 followed by 0x0303 really represents just one character.

    Do we have similar functions in DB2 UDB for LUW? I have searched a lot on the net but could not find such functions in DB2 UDB. Just wanted to ask you whether such functions exist in DB2 UDB. Else how do we deal with such a scenario in DB2 UDB.
    Prashant

  2. #2
    Join Date
    Nov 2004
    Posts
    374

    unicode

    if charset is UTF-8 and columns are defined graphic/vargraphic..
    the functions like substr work correctly on character level and not byte level.
    Otherwise you will get byte substr and is only part of the character..
    Best Regards, Guy Przytula
    DB2/ORA/SQL Services
    DB2 DBA & Advanced DBA Certified
    DB2 Dprop Certified
    http://users.skynet.be/przytula/dbss.html

  3. #3
    Join Date
    Oct 2005
    Posts
    109
    Hi Prashant,

    I would apreciate you to update the post in case you find out anything else on the topic.

    We are struggling with moving to unicode a lot. Former DDL definitions (like varchar(10) ) don't work anymore when moving from a german or whatever codepage to unicode when having language characters with more than one byte. So previous used DDL and code don't work anymore when moving to unicode - same like you observed. This is rather annoying but unfortunately SQL Standard conform.

    As a result we rather keep language codepages if possible and wait and hope for new development.

    Changing everything to graphic would be a huge development and migration effort and also space requirements would change a lot (double for all chars if I remember correctly) - til today that has been avoided.
    Juliane

  4. #4
    Join Date
    Oct 2001
    Location
    Bangalore
    Posts
    186
    Hi Juliane,

    I agree changing everything would be a huge effort + lot of disk space usage, but as of now IBM does not provide any alternative functions which Oracle 9i does.

    The other thing which concerns me is, will sorting and indexing take more time with Vargraphs and result in a degradation of performance.

    I will try to get the answers on new development from IBM guys and then update the forum.
    Prashant

  5. #5
    Join Date
    Oct 2001
    Location
    Bangalore
    Posts
    186
    Hi,

    Just found out something on indexes:

    Prior to version 8, indexes created on such columns did not allow INDEX ONLY access as the column data was padded in the index value and the table was scanned but now in version 8 we have a new keyword to be specified in index definition. The word is PADDED or NOT PADDED which takes care of padding the column data in the index data or not padding the same. NOT PADDING makes index only scan work for these columns like any other column and also saves lot of space.
    Prashant

  6. #6
    Join Date
    Jan 2006
    Location
    Zeist - the Netherlands
    Posts
    3
    Prashant,

    I guess you are talking DB2 on mainframe?
    In that case i hope you are at Db2 version 8, because in this version a number of string functions have a extra operand, which i think should solve this problem.
    For example: SUBSTRING(C1,1,21,CODEUNITS16)

    Hope this helps.

    leo

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •