Results 1 to 11 of 11
  1. #1
    Join Date
    Apr 2009
    Posts
    6

    Unanswered: sql wildcards on unicode databases

    hello,

    I have a strange behavior with unicode databases:
    Database territory = PT
    Database code page = 1208
    Database code set = UTF-8
    Database country/region code = 351
    Database collating sequence = IDENTITY
    Alternate collating sequence (ALT_COLLATE) =
    Number compatibility = OFF
    Varchar2 compatibility = OFF
    Database page size = 4096

    Now the odd part:

    # db2 "CREATE TABLE teste ( nome varchar(10) )"
    DB20000I The SQL command completed successfully.

    # db2 "INSERT INTO teste VALUES('çedilha')"
    DB20000I The SQL command completed successfully.

    # db2 "SELECT * FROM teste"
    NOME
    ----------
    çedilhado

    1 record(s) selected.

    # db2 "SELECT * FROM teste WHERE nome LIKE '_edilhado'"
    NOME
    ----------

    0 record(s) selected.

    # db2 "SELECT * FROM teste WHERE nome LIKE '__edilhado'"
    NOME
    ----------
    çedilhado

    1 record(s) selected.



    Is this supposed to happen? I know that non-ascii characters take 2 bytes. But, the LIKE command allows "Wild cards". A % may be used to match and string, _ will match any single character (or is it byte????)

    Thnx.

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    What's your db2 version/platform?
    ---
    "It does not work" is not a valid problem statement.

  3. #3
    Join Date
    Apr 2009
    Posts
    6
    Thank you for your help.

    db2 express 9.5.0 on ubuntu 8.04

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    It might be the way you insert the data; try "select hex(nome) from teste" to see if they look anything like valid UTF-8 characters.
    ---
    "It does not work" is not a valid problem statement.

  5. #5
    Join Date
    Apr 2009
    Posts
    6
    db2 "select hex(nome), nome from teste"

    1 NOME
    -------------------- ----------
    C3A76564696C6861646F çedilhado

    1 record(s) selected.

  6. #6
    Join Date
    Apr 2009
    Posts
    6
    Anyone? Does anybody have utf-8 databases with non-ascii characters?

  7. #7
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Well, C3A7 does look like the right code for "c cedille". May be you should try talking to IBM support, because theoretically ç should match the underscore wildcard.
    ---
    "It does not work" is not a valid problem statement.

  8. #8
    Join Date
    May 2003
    Posts
    113
    one '_' underscore should stand for one character. for this case, the particular character is double-byte character. DB2 only match the first-byte of this character.

    I don't know whether it is by design or a code bug. Only folks from DB2 can tell.

    My gut feeling tell me that this case is by design

  9. #9
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by nidm
    DB2 only match the first-byte of this character.
    Not true. Quoting the manual:

    In a Unicode graphic column, every non-supplementary character, including the halfwidth underscore character (U&'\005F') and the halfwidth percent sign character (U&'\0025'), is two bytes in width. In a Unicode database, special characters in a pattern are interpreted as follows:

    * For character strings, a halfwidth underscore character (X'5F') or a fullwidth underscore character (X'EFBCBF') refers to one Unicode character, and a halfwidth percent sign character (X'25') or a fullwidth percent sign character (X'EFBC85') refers to zero or more Unicode characters.
    That's why I said that in theory this should work.
    ---
    "It does not work" is not a valid problem statement.

  10. #10
    Join Date
    Apr 2009
    Posts
    6
    Let me first thank you for your help!!!

    Do you have one db2 database (utf-8) available to make the same test? Maybe i missed something on the configuration... It seems to me very unlikely that IBM made a bug like this, but... The use of utf-8 is quite normal these days i guess...

  11. #11
    Join Date
    Apr 2009
    Posts
    6
    Hello,

    Just for the record. I tried the same tests on a db2 v9.5.2 and the results are just as expected in theory. So, it is really a (major!!!!) bug from IBM db2 v9.5.0!!!

    Thanx for all :-)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •