Results 1 to 5 of 5
  1. #1
    Join Date
    May 2011
    Posts
    3

    Unanswered: XML encoding and collation

    Hi,

    I am new to generating XML files and the whole concept of XML-encoding and SQL Server-collations. The SQL Server 2000 database is in Latin1_General_CI_AS collation. I must use data from the database to create an XML-file.

    In the example XML-files I found, I copied the first line to my XML file:<?xml version="1.0" encoding="UTF-8"?>

    I did some testing and all XML's were conform the XSD. The other company could process them all.

    Now that we went to production, the other company says they can not process my XML-file. I investigated and found that all the elements in the XML-file are good, except for those that contain French characters. like é, è, ...

    I found an element with value :
    <Name><![CDATA[Caf,]]></Name>
    While the value in the database is 'Café'

    So the "é" ended up wrong in the file, and it makes the XML unreadable. It gets displayed as a comma in one editor and as a small square in another one.

    I don't know what I must do to make a correct XML-file. In my feeling it has something to do with the encoding and the collation, but I am just guessing. Should I change encoding="UTF-8" to something else or should I cast the source column to something else than Latin1_General_CI_AS? I found this as an example:
    SELECT CAST('abc' AS varchar(5)) COLLATE French_CS_AS
    Or should I do something totally different?

    Please help.
    If possible, I would also like to know a good source of information (URL) about XML-encoding and their relationship with Collations (if there is one). So I can understand why your solution works and mine failed.
    Last edited by KepETie; 05-19-11 at 14:40. Reason: Addes "SQL Server 2000"

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    To start with, use NCHAR, NTEXT, NVARCHAR datatypes with international data, not CHAR, TEXT, or VARCHAR. That might solve part of your problem.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    May 2011
    Posts
    3
    Thank you for your reaction.

    I changed the type of the temporary table that holds the XML lines to NVARCHAR. The generated XML file still contains 'Caf,'. When I do a select on that column, it shows 'Café'.

    I found SQL Server stores NVARCHAR in UTF-16. I already changed the first line to
    <?xml version="1.0" encoding="UTF-16"?>

    I played with the BCP that writes the content of the temporary table to a file.
    I changed the -c attribute to first -w (didn't help), -n (I can now see 'Café', but all CRLF's are gone now, text looks weird) and -N (I can now see 'Café', but all CRLF's are gone, rest of text looks better).

    XMLSpy says XML files generated with -n or -N are not even well-formed

  4. #4
    Join Date
    May 2011
    Posts
    3
    I found the solution:

    The first XML-line should be:
    <?xml version="1.0" encoding="ISO-8859-1"?>

    And the BCP command arguments should be:
    Code:
    SET @bcpCommand = 'bcp "' + @SQLStr + '" queryout "' + @FileName + '" -T -c -C ACP -S xxx\yyy'
    EXEC master..xp_cmdshell @bcpCommand
    Adding the -C ACP forces BCP to use the ANSI/MicrosoftWindows (ISO 1252) Code Page.

    I did notice the difference between encoding="ISO-8859-1" and ISO 1252. I'm not sure if my solution is 100% correct, but it works for now.

    This helped me a lot.

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Thanks for coming back to post a solution. It may help others with the same issue.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •