Page 1 of 2 12 LastLast
Results 1 to 15 of 29
  1. #1
    Join Date
    Mar 2009
    Posts
    15

    Unanswered: Reading Unicode from Oracle 10g database

    Hi, this is my first time posting here. I have been looking for answers to my problem for a week and I have not been able to find any answers.

    Problem: We had to add support for Polish and Vietnamese to some of our applications. I added two columns to a table and set the data type for both of them to nvarchar. I then used java jdbc connection to enter data into these columns. I opened Toad 9.7.2 to view the characters. I had to change the font on the data grid to a font that supports these characters sets. After I did that, I was able to view the characters as expected. So far so good.

    Next, I made an attempt to get these characters out. I decided to generate an xml file with the data and print it to IE through a servlet. When I looked at the XML, the characters have upside down ? in place of characters that are above unicode 255.

    Database NLS Configuration:
    NLS_LANGUAGE AMERICAN
    NLS_TERRITORY AMERICA
    NLS_CURRENCY $
    NLS_ISO_CURRENCY AMERICA
    NLS_NUMERIC_CHARACTERS .,
    NLS_CALENDAR GREGORIAN
    NLS_DATE_FORMAT DD-MON-RR
    NLS_DATE_LANGUAGE AMERICAN
    NLS_CHARACTERSET WE8ISO8859P1
    NLS_SORT BINARY
    NLS_TIME_FORMAT HH.MI.SSXFF AM
    NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM
    NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR
    NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR
    NLS_DUAL_CURRENCY $
    NLS_NCHAR_CHARACTERSET AL16UTF16
    NLS_COMP BINARY
    NLS_LENGTH_SEMANTICS BYTE
    NLS_NCHAR_CONV_EXCP FALSE

    Java and Other Info:
    Java Runtime Env. 6
    Driver - ojdbc14.jar thin client
    Server - WebSphere 6.1.0.15 with Connection Polling
    OS - Windows XP Service Pack 3

    I set my server's JVM file encoding to UTF-8 using -Dfile.encoding=UTF-8
    through the admin console. This allowed me to eventually get polish characters into a PDF. When I turn it off, I get #s in the PDF in place of non ascii characters.

    I have tried using Java's getBytes("UTF-8") and many variations of this.
    I also tried to get a CharacterStream and read it one character at a time but got UnicodeToAscii exception.


    I wanted to check to see if the characters were in the database as I expected by removing Toad from the equation.

    I opened SQL Plus and ran SELECT DUMP([column name]) from [table name]
    I then started to break down the dump:
    Typ=1 Len=336:
    0,0,0,77, =M
    0,0,0,97, =A
    0,0,0,109=m,
    0,0,0,32,=space
    0,0,0,100,=d
    0,1,0,66,=l with line through it (this should be 0,0,0,322)
    0,0,0,117=u,
    0,0,0,103=g,
    0,0,0,105=i,
    0,0,0,32,=space
    0,0,0,99,=c
    0,0,0,105=i,
    0,1,0,5,=a with hook
    0,0,0,103,
    0,0,0,32,=space
    0,0,0,109,
    0,0,0,117,
    0,0,0,115,0,0,0,122,
    0,1,0,25,
    0,0,0,32,=space
    0,0,0,119,

    I have spent a week on this and I opened an SR with Oracle. They could not figure out the problem and essentially left me hanging.

    I also tried casting the prepared statement to an OraclePreparedStatement and got a clas cast exception. WebSphere wraps prepared statements (WSJdbcPreparedStatement) for the conneciton pooling.

    This post is basically my last resort. Any help would be greatly appreciated. I feel like it should be a simple thing to do...getting unicode characters out of the database using java...

    Does any implicit conversion of the characters happend in the JDBC adapter? I should make a correction about above. When I view the characters in IE, they appear as blocks. The 0,1,0,66 line in the dump displays as two blocks. I also created a new database using UTF-8, and when I got those string out of the database, the characters displayed as upside down questionmarks and the dump for that showed 0,191.

    Thank you,

    Chris
    Last edited by cmack13; 03-20-09 at 11:27.

  2. #2
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    As you can see the correct data in TOAD I would strongly assume that it is a problem of your application not the database. You might want to try a Java based SQL tool as well just to be sure.
    But in my experience it's the native Windows apps that have more problems with UTF8 database than the Java apps. So if TOAD shows them correctly, things should be fine database wise.

    I decided to generate an xml file with the data and print it to IE through a servlet
    Did you set the encoding of the servlet's output correctly?

    When I looked at the XML
    How did you look at the XML?

  3. #3
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    >I opened SQL Plus and ran SELECT DUMP([column name]) from [table name]
    Not that this will necessarily get you any farther, but the ASCIISTR function will display the "specicial" characters in pure printable ASCII characters.

    If you get no viable solution here, you might want to try the
    Welcome to The Oracle FAQ | Oracle FAQ forums

    I wish I could help more but I try to avoid Java whenever possible.
    It appears to me you have a data presentation problem & not a storage problem.
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

  4. #4
    Join Date
    Mar 2009
    Posts
    15
    I set the response.setContentType("text/xml");
    and response.setCharacterEncoding("UTF-8");
    and I put <?xml version="1.0" encoding="UTF-8"?>
    at the beginning of the xml structure.

    When I open SQL Plus and try to view the characters, I see the square blocks. My initial thought is that the application is using windows cp1252 encoding, but then after looking at the dump, It appears that the characters are getting converted some how.

    The character ł has an integer value of 322 and unicode value 142. How come the dump shows 0,1,0,66 ? The dump removes the concern of the encoding of the viewer since it is displaying integer values.

    I agree with the comment that what is in the database is probably correct. The problem is, what I get out is not. Could it be the medium I am using to view the characters that is causing the problem? I tested that question by hardcoding some translated code from translate.google.com into my java code and putting into the xml and I can view it all just fine.

    I'm sure other people are doing this, I am just stumped as to why the characters are not coming out correctly.

    Thanks for your response by the way, I appreciate it a lot!!

  5. #5
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by cmack13
    When I open SQL Plus and try to view the characters, I see the square blocks.
    The Windows commandline does not display UTF8 characters (I think there is a codepage that might handle some foreign characters, but definitely not the full set of UTF8 characters). That might also be caused by the font used.

    Could it be the medium I am using to view the characters that is causing the problem? I tested that question by hardcoding some translated code from translate.google.com into my java code and putting into the xml and I can view it all just fine.
    So it's somewhere between loading the data from the database and sending it to the browser.

    Are you sure IE would display an UTF-8 encoded XML file properly? What if you write the file on the server and the manually view it e.g. using Notepad (which can handle UTF8 quite well)

    Firefox has a menu item to manually set the encoding of the displayed file, so maybe that could be used to "debug" this issue.

  6. #6
    Join Date
    Mar 2009
    Posts
    15
    I am generating the xml on the fly programatically. IE has options to change the encoding of the page. It definitely displays all of the polish characters when I hard code them in my java code. But when I try to get them out of the DB, it displays the blocks in place of the extended characters.

    I suspect that there is some conversion going on in the jdbc adapter but I do not know how to configure the jdbc client to use UTF8, or Unicode. I thought since it is Java and java by default encodes its string in Unicode that there would be little or no conversion required.

    I tried to set the ACP nls_lang on my windows machine once and I will not try that again because my machine would not boot up afterwords. I am not sure if I have to change the NLS_LANG on my windows machine and if so, I do not know the correct way to do it.

  7. #7
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by cmack13
    I am generating the xml on the fly programatically. IE has options to change the encoding of the page. It definitely displays all of the polish characters when I hard code them in my java code. But when I try to get them out of the DB, it displays the blocks in place of the extended characters.
    Then it must be the way you write the output. Java is unicode and I had never had a problem reading unicode data from the database.

    So, how do you do the writing?
    Do you use a Writer initialized with the proper encoding?

  8. #8
    Join Date
    Mar 2009
    Posts
    15
    This is how I write it out to the browser:


    response.setContentType("text/xml");
    response.setCharacterEncoding("UTF-8");

    ServletOutputStream os = response.getOutputStream();
    os.write(xml);//xml is a byte array encoded as UTF-8
    os.flush();
    ....


    Reading from Result Set:

    I have tried the following and then some:

    condition.setForemanLanguagePo(rs.getString("FOREM AN_LANGUAGE_PO"));

    condition.setForemanLanguagePo(new String(rs.getBytes("FOREMAN_LANGUAGE_PO"),"UTF-8"));

    condition.setForemanLanguagePo(new String(rs.getString("FOREMAN_LANGUAGE_PO").getByte s("UTF-8")));

    What method do you use? Are you using any connection polling managed by your server? Are you using jdbc thin or oci?
    Last edited by cmack13; 03-20-09 at 14:42.

  9. #9
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by cmack13
    response.setContentType("text/xml");
    response.setCharacterEncoding("UTF-8");

    ServletOutputStream os = response.getOutputStream();
    os.write(xml);//xml is a byte array encoded as UTF-8
    The "byte array" is the problem. The XML should be stored in a String (how do you obtain that array anyway? Why don't you simply use getString() on your ResultSet)

    Once you have the XML in a String and it still doesn't work, try this:
    Code:
    Writer w = new OutputStreamWriter(os, "UTF-8");
    w.write(xml);
    I think this thread should be moved to a Java forum...

  10. #10
    Join Date
    Mar 2009
    Posts
    15
    I don't think the write method is not the problem. I am using the same write method with the hard coded polish text and it appears fine in IE. I also configured my debug console to support polish text on system.outs and the text appears fine when hard coded in the java code, but when I get it from the db it has the upside down ?s in it.

    I had put an SR into Oracle and they couldn't figure it either.

    I appreciate your effort in trying to help, but I don't believe this is a basic java issue. There is osme type of implicit conversion going in the ojdbc driver. It has to be. Either that, or the data is not entered in the database correctly. Did you see the comments about the dump? Why would the dump in SQL Plus show the wrong values for the character and Toad show the characters correctly?

  11. #11
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Sorry, but I do believe it's an error in your program. And I think it's utterly wrong to deal with character strings as bytes.

    How to you obtain that byte[] array?

    Most probably when converting the string to a byte array some kind of wrong conversion happens String.getBytes() for example uses the default OS encoding and not Unicode which will definitely give you a wrong result. Do use Strings to handle character data otherwise you'll always end up with some strange conversion problems.

    I don't know unicode codepoints well enough in order to know whether the dump is correct or not.

    Oracle's JDBC driver does the conversion correctly. I have never had a problem with it in the last 6 years. And I dealt with arabic, czech and japanese characters.

    Do display the data using a Java based query tool to verify that the JDBC is not the problem.

  12. #12
    Join Date
    Mar 2009
    Posts
    15
    The xml is a string and I use the xml.getBytes("UTF-8"); to get the xml.

    I am just fried on this. I have put a lot of time into it and I have not been able to figure it out. It may be a coding issue or a server config issue.

    I had to set my JVM's file encoding to UTF-8 in order to display the characters in a PDF. Would that cause any issues?

    I appologize if I seem snippy. Like I said I have been working on this for a while and I have tried all the getByte("") methods as soon as I get the string from the result set.

    I will post a snippit of the xml.

    I am doing some translation on the fly using Google translater. We have some text that is entered into our data base manually and some that is staic. The language I translate on the fly is displaying correctly. The language I get from the database does not. All of the text is in the same xml file. So when it shows up in the browser, the translated text looks great but the database text does not.

  13. #13
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by cmack13
    TLike I said I have been working on this for a while and I have tried all the getByte("") methods as soon as I get the string from the result set.
    The question is: Why are you doing that?
    Why don't you simply deal with String objects where everything is taken care of?

    Do try to output the real String you obtained from the database in the servlet rather than using the byte array. I can image the OutputStream doing some fancy encoding stuff on top which breaks the binary data that you send.

  14. #14
    Join Date
    Mar 2009
    Posts
    15
    I have also tried it without rs.getString("COLUMN_NAME") and I get the upside question marks.

    XML out to browser:

    What comes from DB:
    - <Text condition_id="260" hideLine="true" sequence_group="A" sequence_number="1">
    <pt>Mam dżugi ciżg muszż wstawiż do bazy danych. Pozwól nam zobaczyż, jak wiele znaków to.</pt>
    </Text>


    In the same XML what I translate on the fly:
    - <Text condition_id="163094" hideLine="false" sequence_group="A" sequence_number="1">
    <pt>Parapet Wall DetailFurnish i zainstalować 6 "szeroki laminowany wzmocnienie obwodu pasa u podstawy ściany. Zajęcie dokonuje się w drodze do 1" szeroki polimeru łatami bar lub 2 3 / 8 "okrągłego szwu płyt, które są mocowane mechanicznie przy maksymalnym z dnia 6 "na środku. W pełni zgodne z EPDM membraną i w górnej części ściany przy użyciu przedpiersie neoprenu wiązania kleju. Dostarczyć i zainstalować nową metalową nasadkę radzenia sobie jak wskazano w Sheet Metal Accessories.</pt>
    </Text>

  15. #15
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    OK, I'm out of ideas (although I still think it must be something in your application).

    Did you try a Java based query tool to verify the database content?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •