Results 1 to 6 of 6
  1. #1
    Join Date
    Oct 2002
    Location
    Norway
    Posts
    13

    Question Unanswered: Sorting strings using collation problem ... help

    Hi.

    I want to sort records that contain Norwegian letters.

    Database is in UTF-8

    If I do this:
    select CUS_NAME from customer where lower(CUS_NAME) like 'a%' order by ICU.SORTKEY(CUS_NAME, 'LNB')

    (where LNB is the collation)

    then the records are ordered like this:
    A
    A
    A

    If I do this:
    select CUS_NAME from customer where lower(CUS_NAME) like 'a%' order by ICU.SORTKEY(CUS_NAME, 'LNB_AS')

    then the records are ordered like this:
    A
    A
    A


    The records should have been in this order:
    A
    A
    A


    I have tried changing the attributes for the collation but to no help.
    E.g when I use LNB_AS the sorting gets different than using only LNB.
    Have tried lots of other attributes as well without getting the sort order correct.

    Is there a way I can apply some rules that the ICU uses when sorting?

    How can I make this work?

    Anybody?
    Last edited by ToreBK; 11-30-06 at 12:31.

  2. #2
    Join Date
    Jan 2003
    Posts
    1,605
    Hi,
    can you please provide your operating system name and version, db2 type and version and UTF-8 collating sequence.

    How to get collating sequence info?
    Code:
    db2 get db cfg for database_name
    Without above info I can just guess. You are probably having problem not specifying the correct collating sequence.

    Using UTF-8 there are 5 types of collating sequence:
    - SYSTEM
    - IDENTITY
    - IDENTITY_16BIT
    - UCA400_NO
    - UCA400_LTH

    You will need to test this out for Norwegian characters, I use uca400_no. Collating sequence is defined by create database command.

    sample for 'system':
    Code:
    db2 CREATE DATABASE test ON 'C:' USING CODESET UTF-8
    TERRITORY NO COLLATE USING SYSTEM

  3. #3
    Join Date
    Oct 2002
    Location
    Norway
    Posts
    13

    Yes, here it is:

    OS: Windows XP PRO, SP2.

    DB2 Express-C

    UTF-8 COLLATING sequence: IDENTITY

    **** UPDATE ****
    I deleted the database and created a new one like this:

    CREATE DATABASE parents ON 'C:' USING CODESET UTF-8
    TERRITORY NO COLLATE USING UCA400_NO

    then I did this:
    db2 get db cfg for database_name

    and see that new COLLATING sequence are now: UCA400_NO

    I ran my queries again.... ( I inserted the records from scratch )
    select PAG_NAME from pager where PAG_ID>=3000 order by ICU.SORTKEY(PAG_NAME,'LNB')

    then the records are still ordered like this:
    Aø
    Aæ
    Aå

    The correct order should be:
    Aæ
    Aø
    Aå

    Urk....Same result as with database COLLATING sequence set to IDENTITY.

    I did some more testing.....
    and from this link: http://www-128.ibm.com/developerwork...ole/index.html

    If you scroll down to first "order by" example then I inserted the records .. Alice, Celine, Cindy etc and tested if I
    got the same result as in this page:
    Using this for English:
    select PAG_NAME from pager where PAG_ID>=7000 order by ICU.SORTKEY(PAG_NAME,'LEN')
    I get same sort order as in the link above.

    Using this for Slovak
    select PAG_NAME from pager where PAG_ID>=7000 order by ICU.SORTKEY(PAG_NAME,'LSK')
    Then I get the same result as in the link above.


    But using LNB for Norwegian Bokmål, places æøå in wrong order ..... strange.

    Do I have to specify charset to use against DB2 in PHP?
    Can that have anything to do with it.
    I belive this does not have anything with sorting of the data.
    When I insert records containing æøå letters today, they are displayed correctly when I select them laiter though.

    When I work with UTF-8 data against MySQL I use this after connecting in PHP
    @mysql_query("SET NAMES 'utf8'");

    For ORACLE I do this after connecting:
    putenv("NLS_LANG=.AL32UTF8");
    putenv("NLS_COMP=LINGUISTIC");
    putenv("NLS_SORT=BINARY_CI");

    For IBASE
    if ( ! $this->db = @ibase_pconnect($dbpath.$dbname, $dbuser, $dbpass, 'UTF8')){ return false; }



    **** UPDATE END ***

    I then thought I could store let say both french, german, norwegian, thai etc...

    And let say a Norwegian client, which stores all his records in Norwegian want to sort his data in norwegian sort order, then I thought it was possible for him to specify the collation for his language (well, actually I do it for him on the fly when ordering the data (at least I hoped I could)....

    I have installed the ICU windows binaries (DLL's) in the FUNCTION directory and created a UDF in order to be able to use a collation from the ICU in my queries.

    So for a Norwegian client I would sort like this:
    select CUS_NAME from customer where lower(CUS_NAME) like 'a%' order by ICU.SORTKEY(CUS_NAME, 'LNB')

    For a german I would just change the collation from LNB to LDE like this:
    select CUS_NAME from customer where lower(CUS_NAME) like 'a%' order by ICU.SORTKEY(CUS_NAME, 'LDE')

    I have gotten the collation names from this page:
    http://www-128.ibm.com/developerwork...sidefile1.html

    The strange thing is this: When I use the default collation for "Norwegian Bokmål" which is LNB the characters æøå is not sorted correctly.

    I also know that there is some attributes one can use with the LNB .. like
    LNB_AS, LNB_AS_CX .... there is a lot more.
    Have also tried full collation (for norwegian) with all attributes like this:
    LNB_AN_CX_EX_FX_HX_NX_S3

    But when I go to the IBM online page where I can test this specific collation I see that all the attributes are the default settings.
    If I enter my test words with æøå in this page and hit Sort button then the words are sorted correctly.

    Is it not possible to store data from several languages in a UTF-8 database, and then laiter on using queries that uses a specific collation for a specific language (one language at a time though)?

    Could it be that there is something wrong with the ruleset for Norwegian Bokmål collation (LNB) in the ICU dll's ?

    Anyway... thanks for taking the time to look at this.
    Last edited by ToreBK; 12-01-06 at 07:25.
    Tore

  4. #4
    Join Date
    Jan 2003
    Posts
    1,605
    Hi,
    have you tried to create database with UCA400_NO collating sequence?
    Grofaty

  5. #5
    Join Date
    Oct 2002
    Location
    Norway
    Posts
    13

    Yes

    Yes, thats what I said in my previous post. Please read through it for more details.

    Regards
    Tore

  6. #6
    Join Date
    Jan 2003
    Posts
    1,605
    Hi,
    you are using db2-express-C. What is the version you are using? Is it v8 or v9?
    check out with command: db2level

    As I know there are no fixpack support for db2-express-C, so migration to newer version can help.

    Did you test all UTF-8 collating sequences?
    Hope this helps,
    Grofaty

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •