If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > sql wildcards on unicode databases

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-01-09, 07:30
bdr bdr is offline
Registered User
 
Join Date: Apr 2009
Posts: 6
sql wildcards on unicode databases

hello,

I have a strange behavior with unicode databases:
Database territory = PT
Database code page = 1208
Database code set = UTF-8
Database country/region code = 351
Database collating sequence = IDENTITY
Alternate collating sequence (ALT_COLLATE) =
Number compatibility = OFF
Varchar2 compatibility = OFF
Database page size = 4096

Now the odd part:

# db2 "CREATE TABLE teste ( nome varchar(10) )"
DB20000I The SQL command completed successfully.

# db2 "INSERT INTO teste VALUES('çedilha')"
DB20000I The SQL command completed successfully.

# db2 "SELECT * FROM teste"
NOME
----------
çedilhado

1 record(s) selected.

# db2 "SELECT * FROM teste WHERE nome LIKE '_edilhado'"
NOME
----------

0 record(s) selected.

# db2 "SELECT * FROM teste WHERE nome LIKE '__edilhado'"
NOME
----------
çedilhado

1 record(s) selected.



Is this supposed to happen? I know that non-ascii characters take 2 bytes. But, the LIKE command allows "Wild cards". A % may be used to match and string, _ will match any single character (or is it byte????)

Thnx.
Reply With Quote
  #2 (permalink)  
Old 04-01-09, 10:00
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
What's your db2 version/platform?
Reply With Quote
  #3 (permalink)  
Old 04-01-09, 10:31
bdr bdr is offline
Registered User
 
Join Date: Apr 2009
Posts: 6
Thank you for your help.

db2 express 9.5.0 on ubuntu 8.04
Reply With Quote
  #4 (permalink)  
Old 04-01-09, 12:06
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
It might be the way you insert the data; try "select hex(nome) from teste" to see if they look anything like valid UTF-8 characters.
Reply With Quote
  #5 (permalink)  
Old 04-01-09, 12:22
bdr bdr is offline
Registered User
 
Join Date: Apr 2009
Posts: 6
db2 "select hex(nome), nome from teste"

1 NOME
-------------------- ----------
C3A76564696C6861646F çedilhado

1 record(s) selected.
Reply With Quote
  #6 (permalink)  
Old 04-02-09, 13:04
bdr bdr is offline
Registered User
 
Join Date: Apr 2009
Posts: 6
Anyone? Does anybody have utf-8 databases with non-ascii characters?
Reply With Quote
  #7 (permalink)  
Old 04-02-09, 15:22
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Well, C3A7 does look like the right code for "c cedille". May be you should try talking to IBM support, because theoretically ç should match the underscore wildcard.
Reply With Quote
  #8 (permalink)  
Old 04-02-09, 16:58
nidm nidm is offline
Registered User
 
Join Date: May 2003
Posts: 113
one '_' underscore should stand for one character. for this case, the particular character is double-byte character. DB2 only match the first-byte of this character.

I don't know whether it is by design or a code bug. Only folks from DB2 can tell.

My gut feeling tell me that this case is by design
Reply With Quote
  #9 (permalink)  
Old 04-02-09, 19:10
n_i n_i is offline
:-)
 
Join Date: Jun 2003
Location: Toronto, Canada
Posts: 4,449
Quote:
Originally Posted by nidm
DB2 only match the first-byte of this character.
Not true. Quoting the manual:

Quote:
In a Unicode graphic column, every non-supplementary character, including the halfwidth underscore character (U&'\005F') and the halfwidth percent sign character (U&'\0025'), is two bytes in width. In a Unicode database, special characters in a pattern are interpreted as follows:

* For character strings, a halfwidth underscore character (X'5F') or a fullwidth underscore character (X'EFBCBF') refers to one Unicode character, and a halfwidth percent sign character (X'25') or a fullwidth percent sign character (X'EFBC85') refers to zero or more Unicode characters.
That's why I said that in theory this should work.
Reply With Quote
  #10 (permalink)  
Old 04-02-09, 19:50
bdr bdr is offline
Registered User
 
Join Date: Apr 2009
Posts: 6
Let me first thank you for your help!!!

Do you have one db2 database (utf-8) available to make the same test? Maybe i missed something on the configuration... It seems to me very unlikely that IBM made a bug like this, but... The use of utf-8 is quite normal these days i guess...
Reply With Quote
  #11 (permalink)  
Old 04-13-09, 11:10
bdr bdr is offline
Registered User
 
Join Date: Apr 2009
Posts: 6
Hello,

Just for the record. I tried the same tests on a db2 v9.5.2 and the results are just as expected in theory. So, it is really a (major!!!!) bug from IBM db2 v9.5.0!!!

Thanx for all :-)
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On