Results 1 to 6 of 6
  1. #1
    Join Date
    Jan 2003
    Posts
    1,636
    Provided Answers: 1

    Unanswered: How to create UTF-8 database with proper sorrting of non-English characters?

    Hi,
    I have DB2 Advanced Workgroup Server Edition 11.1 fixpack 2 on Linux/Intel. I am interested in using BLU technology. I need to create a database to support Slovenian characters which are beside ordinary English letters also letters: č, š and ž.

    I usually create non-Unicode database:
    Code:
    db2 CREATE DATABASE data1 USING CODESET 1250 TERRITORY SI
    Then create tables and insert one of the Slovenian character and check if table sorting works fine. Letter č should be sorted between letters c and d. So order is: c, č, d.

    Table sort sample:
    Code:
    db2 connect to <database>
    db2 "create table admin.tab (col1 varchar(20 codeunits32))"
    db2 "insert into admin.tab values ('c'), ('č'), ('d')"
    db2 "select * from admin.tab order by col1"
    Characters are correctly sorted: c, č, d.

    But like I see BLU technology is only supported using UTF-8 encoding, so I tried:
    Code:
    db2 CREATE DATABASE data2 USING CODESET UTF-8 TERRITORY SI COLLATE USING IDENTITY
    Then repeated the "table sort sample" above and Slovenian characters are incorrectly sorted at the bottom like: c, d, č.
    This are incorrectly sorted characters for Slovenian alphabet.

    Now I tried to do the Slovenian sorting in UTF-8 database:
    Code:
    db2 CREATE DATABASE data3 USING CODESET UTF-8 TERRITORY SI COLLATE USING SYSTEM_1250_SI
    And repeated the "table sort sample" above and Slovenian characters are now sorted correctly: c, č, d.
    But... Like I see BLU technology is only supported using UTF-8 encoding when COLLATE USING IDENTITY is used. So it looks like if I want to use BLU, I am force to use UTF-8 with COLLATE USING IDENTITY.

    To see if db2 client has any influence on sorting I have set:
    db2set DB2TERRITORY=SI
    on db2 client (Windows) machine and establish connection and select with order by, and I see no influence of this setting on table sort.

    Questions:
    1. Is there a way I can use proper sorting of Slovenian characters when UTF-8 with COLLATE USING IDENTITY database is used? Like setting sorting on db2 client or similar?
    2. What is the purpose of TERRITORY SI setting if COLLATE USING IDENTITY is used? It looks having no influence.
    Regards

  2. #2
    Join Date
    Jul 2016
    Location
    Moscow
    Posts
    294
    Provided Answers: 45
    Hi,

    Try this:
    Code:
    select * from admin.tab order by COLLATION_KEY_BIT(col1, 'SYSTEM_1250_SI')
    Regards,
    Mark.

  3. #3
    Join Date
    Jan 2003
    Posts
    1,636
    Provided Answers: 1
    @mark.bb, this works great. Thanks a lot.

    Above solution is super, but just wondering is there maybe some setting at db2 client or similar that this setting becomes default and I don't need to change every existing select statement?
    Last edited by grofaty; 10-19-17 at 02:50.

  4. #4
    Join Date
    Jul 2016
    Location
    Moscow
    Posts
    294
    Provided Answers: 45
    There is no such a client setting making the driver sort the result set itself.

  5. #5
    Join Date
    Jan 2003
    Posts
    1,636
    Provided Answers: 1
    @mark.bb, OK fine. Thank you.

    Just one more think. About my second question from my first post.

    For example having two databases:
    Code:
    db2 CREATE DATABASE data2 USING CODESET UTF-8 TERRITORY SI COLLATE USING IDENTITY
    vs.
    Code:
    db2 CREATE DATABASE data2 USING CODESET UTF-8 TERRITORY US COLLATE USING IDENTITY
    What is the purpose of TERRITORY SI setting if COLLATE USING IDENTITY is used? It looks having no influence, or is there something?
    Last edited by grofaty; 10-19-17 at 05:30.

  6. #6
    Join Date
    Jul 2016
    Location
    Moscow
    Posts
    294
    Provided Answers: 45
    Code:
    db2 CREATE DATABASE data2 USING CODESET UTF-8 TERRITORY XX COLLATE USING IDENTITY
    XX influences date and time formats, and not the sort order.
    Date and time formats by territory code
    Regards,
    Mark.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •