I am new to generating XML files and the whole concept of XML-encoding and SQL Server-collations. The SQL Server 2000 database is in Latin1_General_CI_AS collation. I must use data from the database to create an XML-file.
In the example XML-files I found, I copied the first line to my XML file:<?xml version="1.0" encoding="UTF-8"?>
I did some testing and all XML's were conform the XSD. The other company could process them all.
Now that we went to production, the other company says they can not process my XML-file. I investigated and found that all the elements in the XML-file are good, except for those that contain French characters. like é, è, ...
I found an element with value : <Name><![CDATA[Caf,]]></Name>
While the value in the database is 'Café'
So the "é" ended up wrong in the file, and it makes the XML unreadable. It gets displayed as a comma in one editor and as a small square in another one.
I don't know what I must do to make a correct XML-file. In my feeling it has something to do with the encoding and the collation, but I am just guessing. Should I change encoding="UTF-8" to something else or should I cast the source column to something else than Latin1_General_CI_AS? I found this as an example: SELECT CAST('abc' AS varchar(5)) COLLATE French_CS_AS
Or should I do something totally different?
If possible, I would also like to know a good source of information (URL) about XML-encoding and their relationship with Collations (if there is one). So I can understand why your solution works and mine failed.
Last edited by KepETie; 05-19-11 at 13:40.
Reason: Addes "SQL Server 2000"
I changed the type of the temporary table that holds the XML lines to NVARCHAR. The generated XML file still contains 'Caf,'. When I do a select on that column, it shows 'Café'.
I found SQL Server stores NVARCHAR in UTF-16. I already changed the first line to <?xml version="1.0" encoding="UTF-16"?>
I played with the BCP that writes the content of the temporary table to a file.
I changed the -c attribute to first -w (didn't help), -n (I can now see 'Café', but all CRLF's are gone now, text looks weird) and -N (I can now see 'Café', but all CRLF's are gone, rest of text looks better).
XMLSpy says XML files generated with -n or -N are not even well-formed