If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Sorting Norwegian characters æøå problem

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-30-06, 11:22
ToreBK ToreBK is offline
Registered User
 
Join Date: Oct 2002
Location: Norway
Posts: 13
Question Sorting strings using collation problem ... help

Hi.

I want to sort records that contain Norwegian letters.

Database is in UTF-8

If I do this:
select CUS_NAME from customer where lower(CUS_NAME) like 'a%' order by ICU.SORTKEY(CUS_NAME, 'LNB')

(where LNB is the collation)

then the records are ordered like this:




If I do this:
select CUS_NAME from customer where lower(CUS_NAME) like 'a%' order by ICU.SORTKEY(CUS_NAME, 'LNB_AS')

then the records are ordered like this:





The records should have been in this order:





I have tried changing the attributes for the collation but to no help.
E.g when I use LNB_AS the sorting gets different than using only LNB.
Have tried lots of other attributes as well without getting the sort order correct.

Is there a way I can apply some rules that the ICU uses when sorting?

How can I make this work?

Anybody?

Last edited by ToreBK; 11-30-06 at 11:31.
Reply With Quote
  #2 (permalink)  
Old 12-01-06, 02:01
grofaty grofaty is offline
Registered User
 
Join Date: Jan 2003
Posts: 1,570
Hi,
can you please provide your operating system name and version, db2 type and version and UTF-8 collating sequence.

How to get collating sequence info?
Code:
db2 get db cfg for database_name
Without above info I can just guess. You are probably having problem not specifying the correct collating sequence.

Using UTF-8 there are 5 types of collating sequence:
- SYSTEM
- IDENTITY
- IDENTITY_16BIT
- UCA400_NO
- UCA400_LTH

You will need to test this out for Norwegian characters, I use uca400_no. Collating sequence is defined by create database command.

sample for 'system':
Code:
db2 CREATE DATABASE test ON 'C:' USING CODESET UTF-8
TERRITORY NO COLLATE USING SYSTEM
Reply With Quote
  #3 (permalink)  
Old 12-01-06, 04:46
ToreBK ToreBK is offline
Registered User
 
Join Date: Oct 2002
Location: Norway
Posts: 13
Yes, here it is:

OS: Windows XP PRO, SP2.

DB2 Express-C

UTF-8 COLLATING sequence: IDENTITY

**** UPDATE ****
I deleted the database and created a new one like this:

CREATE DATABASE parents ON 'C:' USING CODESET UTF-8
TERRITORY NO COLLATE USING UCA400_NO

then I did this:
db2 get db cfg for database_name

and see that new COLLATING sequence are now: UCA400_NO

I ran my queries again.... ( I inserted the records from scratch )
select PAG_NAME from pager where PAG_ID>=3000 order by ICU.SORTKEY(PAG_NAME,'LNB')

then the records are still ordered like this:
Aø
Aæ
Aå

The correct order should be:
Aæ
Aø
Aå

Urk....Same result as with database COLLATING sequence set to IDENTITY.

I did some more testing.....
and from this link: http://www-128.ibm.com/developerwork...ole/index.html

If you scroll down to first "order by" example then I inserted the records .. Alice, Celine, Cindy etc and tested if I
got the same result as in this page:
Using this for English:
select PAG_NAME from pager where PAG_ID>=7000 order by ICU.SORTKEY(PAG_NAME,'LEN')
I get same sort order as in the link above.

Using this for Slovak
select PAG_NAME from pager where PAG_ID>=7000 order by ICU.SORTKEY(PAG_NAME,'LSK')
Then I get the same result as in the link above.


But using LNB for Norwegian Bokmål, places æøå in wrong order ..... strange.

Do I have to specify charset to use against DB2 in PHP?
Can that have anything to do with it.
I belive this does not have anything with sorting of the data.
When I insert records containing æøå letters today, they are displayed correctly when I select them laiter though.

When I work with UTF-8 data against MySQL I use this after connecting in PHP
@mysql_query("SET NAMES 'utf8'");

For ORACLE I do this after connecting:
putenv("NLS_LANG=.AL32UTF8");
putenv("NLS_COMP=LINGUISTIC");
putenv("NLS_SORT=BINARY_CI");

For IBASE
if ( ! $this->db = @ibase_pconnect($dbpath.$dbname, $dbuser, $dbpass, 'UTF8')){ return false; }



**** UPDATE END ***

I then thought I could store let say both french, german, norwegian, thai etc...

And let say a Norwegian client, which stores all his records in Norwegian want to sort his data in norwegian sort order, then I thought it was possible for him to specify the collation for his language (well, actually I do it for him on the fly when ordering the data (at least I hoped I could)....

I have installed the ICU windows binaries (DLL's) in the FUNCTION directory and created a UDF in order to be able to use a collation from the ICU in my queries.

So for a Norwegian client I would sort like this:
select CUS_NAME from customer where lower(CUS_NAME) like 'a%' order by ICU.SORTKEY(CUS_NAME, 'LNB')

For a german I would just change the collation from LNB to LDE like this:
select CUS_NAME from customer where lower(CUS_NAME) like 'a%' order by ICU.SORTKEY(CUS_NAME, 'LDE')

I have gotten the collation names from this page:
http://www-128.ibm.com/developerwork...sidefile1.html

The strange thing is this: When I use the default collation for "Norwegian Bokmål" which is LNB the characters æøå is not sorted correctly.

I also know that there is some attributes one can use with the LNB .. like
LNB_AS, LNB_AS_CX .... there is a lot more.
Have also tried full collation (for norwegian) with all attributes like this:
LNB_AN_CX_EX_FX_HX_NX_S3

But when I go to the IBM online page where I can test this specific collation I see that all the attributes are the default settings.
If I enter my test words with æøå in this page and hit Sort button then the words are sorted correctly.

Is it not possible to store data from several languages in a UTF-8 database, and then laiter on using queries that uses a specific collation for a specific language (one language at a time though)?

Could it be that there is something wrong with the ruleset for Norwegian Bokmål collation (LNB) in the ICU dll's ?

Anyway... thanks for taking the time to look at this.
__________________
Tore

Last edited by ToreBK; 12-01-06 at 06:25.
Reply With Quote
  #4 (permalink)  
Old 12-01-06, 06:25
grofaty grofaty is offline
Registered User
 
Join Date: Jan 2003
Posts: 1,570
Hi,
have you tried to create database with UCA400_NO collating sequence?
Grofaty
Reply With Quote
  #5 (permalink)  
Old 12-01-06, 07:18
ToreBK ToreBK is offline
Registered User
 
Join Date: Oct 2002
Location: Norway
Posts: 13
Yes

Yes, thats what I said in my previous post. Please read through it for more details.

Regards
__________________
Tore
Reply With Quote
  #6 (permalink)  
Old 12-04-06, 01:44
grofaty grofaty is offline
Registered User
 
Join Date: Jan 2003
Posts: 1,570
Hi,
you are using db2-express-C. What is the version you are using? Is it v8 or v9?
check out with command: db2level

As I know there are no fixpack support for db2-express-C, so migration to newer version can help.

Did you test all UTF-8 collating sequences?
Hope this helps,
Grofaty
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On