Page 1 of 2 12 LastLast
Results 1 to 15 of 16
  1. #1
    Join Date
    Aug 2003
    Location
    India
    Posts
    262

    Unanswered: Informix GLS Queries...

    Dear All,

    I have created my database setting both DB & CLIENT Locales to utf8. I have a few queries on it -

    1. Does the actual data in japanese/chinese language get inserted in the database, how can i view them through dbaccess
    2. Will the current width be enough to hold the data in japanese/chinese languages, for eg. i have a name column which stores 20 chars, so when i have to store japanese data will it be able to hold 20 chars or i will have to increase the size of my column.
    3. How do i do sorting on a column which has data inserted in many languages, for eg.i have rows in french, spanish, japanese & chinese. How do i select sorting fora particular language, is there an env. variable for it.

    Please advice. Thanks in advance.

    Best Regards,

    Lloyd

  2. #2
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    1. you can't do this in dbaccess. dbaccess cannot display utf-8 characters. You can set your client locale to something your dbaccess can display. GLS will translate utf-8 to this locale and display the proper characters.
    2. Yes it will. This works fine.
    3. There is a very good paragraph in th e GLS manual about how you want to sort. It all depends on what datatype you used to store th eUnicode characters. NCHAR or CHAR - based.

    If you really want to see how IDS stores the data, I wrote a small java program that you can have to display this. Java is Unicode based and can perfectly display utf-8 symbols. I even show it in hex.
    rws

  3. #3
    Join Date
    Aug 2003
    Location
    India
    Posts
    262
    Originally posted by Roelwe
    1. you can't do this in dbaccess. dbaccess cannot display utf-8 characters. You can set your client locale to something your dbaccess can display. GLS will translate utf-8 to this locale and display the proper characters.
    2. Yes it will. This works fine.
    3. There is a very good paragraph in th e GLS manual about how you want to sort. It all depends on what datatype you used to store th eUnicode characters. NCHAR or CHAR - based.

    If you really want to see how IDS stores the data, I wrote a small java program that you can have to display this. Java is Unicode based and can perfectly display utf-8 symbols. I even show it in hex.
    Hi Roelwe,

    Thanks for your feedback. Regarding storing of data i am afraid it does not store. I had created my database by setting DB_LOCALE & CLIENT_LOCALE to en_us.utf8 codeset in dbaccess. I have created a table with a column with size 20 char. I even tried changing them to nchar, but no luck. When i try to insert japanese/chiense data i can insert only 3-4 chars, the rest get truncated. In setnet32 i also tried setting DB_LOCALE & CLIEN_LOCALE to en_us.utf8 codeset. I want my database to support 5 languages. I can insert spanish, german, french data properly, the problem is only for japanese & chinese data. Does that mean i will have to increase the size of my columns. I use winsql tool to insert data. Can you send me your java code, so i can try it out. Please advice.

    Best Regards,

    Lloyd

  4. #4
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    How do you know the data gets truncated? You can't test this in dbaccess. You will need a utf8 enabled application.
    rws

  5. #5
    Join Date
    Aug 2003
    Location
    India
    Posts
    262
    Originally posted by Roelwe
    How do you know the data gets truncated? You can't test this in dbaccess. You will need a utf8 enabled application.
    Hi Roelwe,

    I insert data through winsql lite. Only some portion of data gets inserted the rest gets truncated. Then i once again select the data and view it through IE by encoding to utf-8 format. This way i can see the data. So i come to know that the data has got truncated. The feedback i got from few people was i will have to increase the size of my columns, the same goes for oracle too. Any other workarounds.

    Regards,

    Lloyd

  6. #6
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    I would suggest you write a little java program where you are sure that the data you want to insert really gets to the database.
    I did write test programs like these.
    My IDS 9.40.UC2 was on Linux.
    My Java program ran on W2K with JDBC 2.21.JC5
    See attaced files.
    Attached Files Attached Files
    rws

  7. #7
    Join Date
    Aug 2003
    Location
    India
    Posts
    262
    Originally posted by Roelwe
    I would suggest you write a little java program where you are sure that the data you want to insert really gets to the database.
    I did write test programs like these.
    My IDS 9.40.UC2 was on Linux.
    My Java program ran on W2K with JDBC 2.21.JC5
    See attaced files.
    Hi Roelwe,

    Thanks for your feedback. I will run your program and test it out & let you know.

    Best Regards,

    lloyd

  8. #8
    Join Date
    Aug 2003
    Location
    India
    Posts
    262
    Originally posted by Roelwe
    I would suggest you write a little java program where you are sure that the data you want to insert really gets to the database.
    I did write test programs like these.
    My IDS 9.40.UC2 was on Linux.
    My Java program ran on W2K with JDBC 2.21.JC5
    See attaced files.
    Hi Roelwe,

    Thanks for your java program its great. I have IDS 9.4 on W2K, JDBC 2.21 JC5. I could insert data successfully for most of the languages except for japanese, chinese & korean, where the data gets inserted as "???" (small boxes) , its displayed the same way in the window too. One more thing is the data also gets truncated. I copy a paritcular japanese/ chinese word from a website and paste in the window and insert data. But when i view the data from the database some of the data gets truncated. For inserting data i go to a specific language site and copy the contents and paste it in the java program window and insert the data. As japanese/chinese data are multi-byte they cannot be viewed properly, so i do a select query from my table, copy those contents and paste it to a notepad and save the file as html and encoding as utf8. Then i view the file in Internet Explorer. This way i can see the chinese/japanese characters, but the data gets trunated.
    Any suggesstion, Have you tried it at your side. Thanks in advance.

    Best Regards,

    Lloyd

  9. #9
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    I added a few keyboard layouts to my default one.
    I could even switch between greek, german and russian in one string. The characters were stored correctly.
    How did you insert the chines/Japanese characters?
    Did you change your keyboard layout?
    rws

  10. #10
    Join Date
    Aug 2003
    Location
    India
    Posts
    262
    Originally posted by Roelwe
    I added a few keyboard layouts to my default one.
    I could even switch between greek, german and russian in one string. The characters were stored correctly.
    How did you insert the chines/Japanese characters?
    Did you change your keyboard layout?
    Hi Roelwe,

    Thanks for your prompt feedback. No i did not change my keyborad layout. For inserting Chinese/Japanese data i went to a japanese/chinese site and copied the contents & pasted it in the java program window & inserted it, but the data is displayed as junk on the screen, as it does not support multi-byte chars, but the data is successfully inserted in the database. Now to view the data i do a select from the table, copy the contents to a notepad, save the file as html, encoding as utf8 format. Then i view the html file in internet explorer. This way i can see the data, but the only thing is the data has got truncated. For. eg. i copy some 10 japanese charcters & paste it in the java program window and insert the data, but when i retrieve i can see only 7-8 chars. You can try for yourself & let me know.

    Best Regards,

    Lloyd

  11. #11
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    Well - I don't really trust the windows copy/paste capabilities. Especially when notepad is involved. How do you know that it's not notepad who truncates the characters?
    I just tried it here.
    Switch to a chinese keyboard (chinese - taiwan) with a traditional unicode keyboard layout (NON US).
    When you then type in Word '5677' you get a chinese character. Same for '5678'.
    When I tried this in my program and pressed 'show', It showed me that the two chinese characters were stored in 6 bytes total:
    E5 99 B7 and E5 99 B8 -
    rws

  12. #12
    Join Date
    Aug 2003
    Location
    India
    Posts
    262
    Originally posted by Roelwe
    Well - I don't really trust the windows copy/paste capabilities. Especially when notepad is involved. How do you know that it's not notepad who truncates the characters?
    I just tried it here.
    Switch to a chinese keyboard (chinese - taiwan) with a traditional unicode keyboard layout (NON US).
    When you then type in Word '5677' you get a chinese character. Same for '5678'.
    When I tried this in my program and pressed 'show', It showed me that the two chinese characters were stored in 6 bytes total:
    E5 99 B7 and E5 99 B8 -
    Hi Roelwe,

    Thanks once again for your reply. Its not notepad that truncates the characters. I actually copy a text from the webpsite and paste it in the java program window, it displays the data in small boxes for chinese/japanese characters, then i select insert. Now when i select show i can see the no of boxes is reduced.
    For eg. i copy a japanese text of 10 chars and paste it on the java program window, it displays around 8 small boxes, now i select insert and then show, it only displays 7 small boxes. We can come to know that data is truncated. Just to verify it i then copy/paste to notepad and view it as html. I have Win 2000 professional. I cannot change my keyborad settings to Chinese - Taiwan as its not in the dropdown list.
    Anyother workarounds. Did you try copy/paste from a website and insert through the java program. Thanks once again.

    Best Regards,

    Lloyd

  13. #13
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    Hi - beware that my test program creates a table with col2=char(20) meaning only 20bytes.
    10 Japanese characters can represent 30 bytes...

    In the show window - the third column is the most important one; This is how IDS actually stores the data. (the many fff's is a bug in the java program). All the bytes are seperated with a space.

    If you copy/paste, windows will always take in account your current input locale.
    rws

  14. #14
    Join Date
    Aug 2003
    Location
    India
    Posts
    262
    Originally posted by Roelwe
    Hi - beware that my test program creates a table with col2=char(20) meaning only 20bytes.
    10 Japanese characters can represent 30 bytes...

    In the show window - the third column is the most important one; This is how IDS actually stores the data. (the many fff's is a bug in the java program). All the bytes are seperated with a space.

    If you copy/paste, windows will always take in account your current input locale.
    Hi Roelwe,

    That was very quick. Yes the col2 is 20 chars, that means it can store upto 20 bytes, since japanese character are multi-byte the data gets truncated. But i was told by Informix tech support that this has been taken care in ver. 9.4. So does that mean that we have no other alternative than to increase the column length. Thanks once again, you have been a great help.

    Best Regards,

    Lloyd

  15. #15
    Join Date
    Aug 2002
    Location
    Belgium
    Posts
    534
    There has been a lot of cange for Unicode support in the 9.40 release.
    I don't know which support engineer told you this info, but I am not aware of it.
    The only thing I see is what is stored in the database. W
    hen I reserve char(20) it is filled after 6 to 7 chinese characters.

    and translated into 18 to 21 bytes - and stored that way (as you can see in the 'show' window.

    A good option would be to use lvarchar. The idea behind the fact that it would decrease performance over char is no longer true. CPU's are a lot faster and disks are always bottlenecks...
    rws

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •