Results 1 to 8 of 8
  1. #1
    Join Date
    Sep 2002
    Location
    INDIA
    Posts
    3

    Unanswered: PROBLEMS IN CONVERTING DATABASE from US&ASCII to UTF8

    CONVERTING DATABASE from US&ASCII to UTF8
    ------------------------------------------------------------

    I am currently converting my database character set from the US7ASCII
    character set to a UTF8 character set. The client applications are all Java
    ( which by default uses UCS2 ) via the JDBC 1.2 thin drivers.

    What I did was I changed the character set of the database to UTF8,
    I also changed the national character set to UTF8. This was done through the ALTER DATABASE CHARACTER SET UTF8 command.

    I started geting the following exception
    "java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv"
    ;

    Narrowing down the problem it was observed many rows where some columns
    had characters beyond the ASCII set ( like an e with an acute on top ).
    These were a plain VARCHAR2 columns.

    I looked at the production database ( which is in US7ASCII ) to see if it had been like
    that before, and it was showing fine there.


    Of course, the original problem is that the non-ASCII character got inserted
    into the database in the first place, when the character set was still
    US7ASCII. The JDBC driver then receives ( 201 ) and ( 86 ) as a single
    character, thereby failing because it is not a valid UTF8 character(!?).


    So my questions are:

    1) Does anybody know of any way to look for non-ASCII characters for a
    specified set of tables .... and then convert them into the proper UTF8
    encoding??


    I then tried exporting that table from the US7ASCII, but having my
    NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8 ... and then import that
    data into the UTF8 database, also having NLS_LANG=AMERICAN_AMERICA.UTF8 and
    NLS_CHAR=UTF8 during the import.

    That non-ASCII character, the e with an acute, became the letter h and many more changed that way.

    So my alternative of exporting the US7ASCII data as UTF8 did not work either.

    I also tried exporting the US7ASCII data as US7ASCII and then importing it as UTF8
    (with NLS_LANG=AMERICAN.AMERICA.UTF8) but it does not allow me to ( I got some IMP error ).

    Any other ideas how to properly convert the database to UTF8 from US7ASCII character set without getting the data corrupted?

  2. #2
    Join Date
    Apr 2002
    Location
    California, USA
    Posts
    482

    Cool

    No database version, so assume the latest - 9i. There is a tool called - Character Set Scanner, which you can run to test how your db CS will convert to the new CS. It creates report for possible conversion errors. It does not do the actual conversion, so no worries there.

    You can also request backport to 8.1.7 from the Oracle Support. How to use it is described Here


    Hope that helps,

    clio_usa
    OCP - DBA

    .
    .
    .

  3. #3
    Join Date
    Sep 2002
    Location
    China
    Posts
    5

    Lightbulb Some suggestions

    Suggest you adopt this way to chage db character set
    Update sys.props$
    set value$='UTF8'
    where name='NLS_CHARACTERSET'

    then, restart oracle to activate the change

    I don't know what the oracle version used is, but you should use above 8.1.7

  4. #4
    Join Date
    Sep 2002
    Location
    INDIA
    Posts
    3

    Re: Some suggestions

    Originally posted by merry_prince
    Suggest you adopt this way to chage db character set
    Update sys.props$
    set value$='UTF8'
    where name='NLS_CHARACTERSET'

    then, restart oracle to activate the change

    I don't know what the oracle version used is, but you should use above 8.1.7

    Hi,

    I have tried all of the combinations, still it does not work. Here is my findings with the following settings:


    1)
    Export Server: US7ASCII
    Export Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well

    Import Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
    Import Server: UTF8

    RESULT: Acute e became h


    2)
    Export Server: US7ASCII
    Export Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well

    Import Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
    Import Server: UTF8

    RESULT: IMP 00016 error


    3)
    Export Server: US7ASCII
    Export Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8

    Import Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
    Import Server: UTF8

    RESULT: Acute E became h


    4)
    Export Server: US7ASCII
    Export Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8

    Import Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
    Import Server: UTF8

    RESULT: Acute e became h


    5)

    Tried using Update sys.props$
    set value$='UTF8'
    where name='NLS_CHARACTERSET'

    RESULT: Acute e shows properly but it gives problem in the application
    "java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv"

    Looking further it was observed the following:
    when you try this command on a column 'city' in a table which contains 'Genèva' (note the acute e after n), it shows

    command: select length(city), vsize(city),substr(city,4,1),city from cities

    Result: 4 6 èva Genèva

    if you see the value of substr(city,4,1) , you will see the problem. Also note that the length shows 4 and size shows 6.


    6)
    Actually the above (point no. 5) is similar to changing the character set of the database with 'ALTER DATABASE CHARACTER SET UTF8'. Same problem is observed then too.


    7)
    I have also tried to with another method, that is by changing the third byte of the export file which specifies the character set, to the UTF8 code by editing the export file with a Hexdecimal editor. After import the same problem has been observed as defined in (5) and (6) above.



    Any other idea as to how this can be accomplised?

    Thanx
    Ashok

  5. #5
    Join Date
    Sep 2002
    Location
    INDIA
    Posts
    3

    Re: Some suggestions

    Originally posted by merry_prince
    Suggest you adopt this way to chage db character set
    Update sys.props$
    set value$='UTF8'
    where name='NLS_CHARACTERSET'

    then, restart oracle to activate the change

    I don't know what the oracle version used is, but you should use above 8.1.7

    Hi,

    I have tried all of the combinations, still it does not work. Here is my findings with the following settings: (I am using Oracle 8.1.7)


    1)
    Export Server: US7ASCII
    Export Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well

    Import Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
    Import Server: UTF8

    RESULT: Acute e became h


    2)
    Export Server: US7ASCII
    Export Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well

    Import Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
    Import Server: UTF8

    RESULT: IMP 00016 error


    3)
    Export Server: US7ASCII
    Export Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8

    Import Client: did not set NLS_LANG / NLS_CHAR, so presumably US7ASCII as well
    Import Server: UTF8

    RESULT: Acute E became h


    4)
    Export Server: US7ASCII
    Export Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8

    Import Client: NLS_LANG=AMERICAN_AMERICA.UTF8 and NLS_CHAR=UTF8
    Import Server: UTF8

    RESULT: Acute e became h


    5)

    Tried using Update sys.props$
    set value$='UTF8'
    where name='NLS_CHARACTERSET'

    RESULT: Acute e shows properly but it gives problem in the application
    "java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv"

    Looking further it was observed the following:
    when you try this command on a column 'city' in a table which contains 'Genèva' (note the acute e after n), it shows

    command: select length(city), vsize(city),substr(city,4,1),city from cities

    Result: 4 6 èva Genèva

    if you see the value of substr(city,4,1) , you will see the problem. Also note that the length shows 4 and size shows 6.


    6)
    Actually the above (point no. 5) is similar to changing the character set of the database with 'ALTER DATABASE CHARACTER SET UTF8'. Same problem is observed then too.


    7)
    I have also tried to with another method, that is by changing the third byte of the export file which specifies the character set, to the UTF8 code by editing the export file with a Hexdecimal editor. After import the same problem has been observed as defined in (5) and (6) above.



    Any other idea as to how this can be accomplised?

    Thanx
    Ashok

  6. #6
    Join Date
    Nov 2003
    Location
    SPAIN-BALEARIC ISLANDS
    Posts
    1

    This work for me in linux and 8.1.7 oracle version

    Basic steps for the conversion of database set of characters in linux and Oracle 8.1.7:


    1- First, we should verify what type of characters we have in our database, for it, we note the value of NLS_LANG (don't try to modify !!!) that will need but late, with the command :

    [oracle@...]#printenv

    2 - Now we carry out an export of all the database : (with the database started) :

    [oracle@...]#exp userid= sys file= full_db. dmp full=yes

    If the command is correct, it will ask the password of sys.

    3 - Erase the present database, with the tool dbassist. IMPORTANT: Take note about "Global Database Name" and SID. (With sqlplus :select * from global_name

    4 - Modify the .bashrc/.bash_profile files, at the NLS_LANG variable with the UTF8 new characterset (example: export NLS_LANG=SPANISH_SPAIN.UTF8). Then, create a new database with the tool dbassist, with the same one SID and Global Database Name, and with the new type of characters UTF8.

    5 - Once it created the new database, we must verify that the type of characters are adequate:
    [oracle@...]#sqlplus "/ as sysdba"
    SQL> select * from nls_database_parameters;

    In concrete, the varibles :

    NLS_CHARACTERSET
    NLS_NCHAR_CHARACTERSET

    6 - Modify the passwords of SYS and SYSTEM with svrmgrl:

    [oracle@...]#svrmgrl
    SVRMGR> alter user system identified by the_same_old_password;
    SVRMGR> alter user sys identified by the_same_old_password;


    7 - Export NLS_LANG, with the old type of characterset.By example :

    [oracle@...]#export NLS_LANG=SPANISH_SPAIN.WE8ISO8859P1

    8 - Import the database:

    [oracle@...]#imp userid=sys buffer=1000000 file=full_db.dmp full=yes commit=yes log= fullimp. log ignore=yes
    9 - Remenber return to the new and correct NLS_LANG .By example :

    [oracle@...]#export NLS_LANG=SPANISH_SPAIN.UTF8

    And that' all !!!

  7. #7
    Join Date
    Dec 2001
    Posts
    80
    Quote Originally Posted by merry_prince
    Suggest you adopt this way to chage db character set
    Update sys.props$
    set value$='UTF8'
    where name='NLS_CHARACTERSET'

    then, restart oracle to activate the change

    I don't know what the oracle version used is, but you should use above 8.1.7
    This method is not recommended, since one of my customer used that before, after changing the NLS_CHARACTERSET, they can not do exp.

    Using character set follow by alter database character set is the formal method since it is docemented in Oracle manual.

  8. #8
    Join Date
    Dec 2010
    Posts
    1

    this worked for me

    I just use the correct driver version for oracle, I was using 10.2.0.2 and by replacing with the correct driver version for the dms (11.2.0.1.0) it worked!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •