Results 1 to 11 of 11
  1. #1
    Join Date
    Jan 2002
    Posts
    16

    Unanswered: BCP character conversion

    I bcped in data into a database with SQL_Latin1_General_Cp1_CI_AS collation. The input data has an embedded character ®(ascii 174). I did not specify any code page using the -C parm. The data was converted to character «(ascii 171). I ran the bcp trying -C1252 and -CRAW and both maintained the correct character. -C437 and -COEM change the character to «.
    Why did this happen? I thought that data would be converted to correctly without any code page specification.

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Different code pages map binary values to glyphs (the graphic symbols that humans know and love) differently. One binary value can map to many different glyphs using different code pages.

    If BPC doesn't know which code page to use for translation, you get "pot luck", especially for characters that aren't well defined. Typically, you want the code page that created the data. Occaisionally, you want the code page that was intended (or at least used) to view the data. Because of the pot-pouri of mappings supported by the different code pages, the business of getting data from point A to point B has grown yet another potentially "interesting" twist to amuse those of us that do the moving!

    -PatP

  3. #3
    Join Date
    Jan 2002
    Posts
    16
    Thanks for the quick reply.
    BOL states:
    "When bulk copying data using native or character format, bcp, by default, converts character data to:

    OEM code page characters when exporting data from an instance of Microsoft® SQL Server™.

    ANSI/Microsoft Windows® code page characters when importing data into an instance of SQL Server. "

    So wouldn't the bcp in use code page 1252 by default. This should be similar to -C1252.

  4. #4
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Well where did the data come from?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  5. #5
    Join Date
    Jan 2002
    Posts
    16
    The bcp ran from my workstation with a code page 437 - if that's your question.

  6. #6
    Join Date
    Aug 2002
    Location
    Prague
    Posts
    77
    From Books Online topic bcp: "OEM Default code page used by the client. This is the default code page used by bcp if -C is not specified."
    From that I suppose that SQL Server interpreted your file as being OEM 437 CP. mojza

  7. #7
    Join Date
    Jan 2002
    Posts
    16
    I guess I still don't understand why the character was changed during the bcp. I can view it correctly from my workstation which is 437, but if I bcp using -C437 or without -C(which uses default OEM code page) it gets converted. I think I'm missing something.

  8. #8
    Join Date
    Aug 2002
    Location
    Prague
    Posts
    77
    In what editor can you see that character correctly? in ANSI (e.g.Notepad) or in OEM (e.g.Edit)? mojza

  9. #9
    Join Date
    Jan 2002
    Posts
    16
    Correctly in notepad or textpad, not correctly in edit. So bcp, running in a command window, is using 437 which changes the character to «?

  10. #10
    Join Date
    Aug 2002
    Location
    Prague
    Posts
    77
    Then, in my opinion, your file was created in code page ANSI 1252 (notepad ok) and bcp interprets your file as cp 437 (default client OEM code page). That leads to a loss of some extended characters that are not compatible between these two pages unless you tell sql server to interpret him as 1252 or without any translation (RAW). Check out this Microsoft article. There is a good explanation and excellent examples. mojza

    http://support.microsoft.com/default...b;en-us;199819

  11. #11
    Join Date
    Jan 2002
    Posts
    16
    Thanks for your help. That article definitely helped explain things. I also looked at the nls files for 437 and 1252 and character 174(offset x0178) reflects ® in the 1252 file and « int 437 file.
    Again, thanks.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •