Results 1 to 7 of 7
  1. #1
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4

    Answered: hidden characters in generated XML (SSIS)

    Hi,

    I don't know if this is the right place to post this question, as it is SSIS related, but here I go.

    I have a few SSIS (2012) packages that create 3 different unicode XML-files and FTP's them to a remote location. Two of the XML- files are flawless, the third cannot be processed as it violates XML well formed rules. When you open the flawed XML file you can't see anything out order, but invisibly it half way contains the hidden characters FF FE.

    The heading <?xml version="1.0" encoding="UCS-2"?> is specified in the Flat File Destination. The XML body is generated in an OLE DB Source with a FOR XML statement.
    The offending characters FF FE come right after the header <?xml version="1.0" encoding="UCS-2"?> and right before the first XML-element.
    The file also contains FF FE right at its beginning, but that is the BOM (Byte Order Mark).

    I investigated the first character generated by the FOR XML statement. That is '<', as it should be. So The FF FE is not generated by the FOR XML statement.
    I have deleted the Flat File Destination building block and copy-and-pasted the Flat File Destination building block element from a package that generates well-formed XML files. It didn't help, the characters FF FE keep appearing a second time. So the FF FE don't come from the Flat File Destination building block neither.
    No one is responsible for the appearance of the second #FFFE in the resulting XML file, yet those disrupting bytes are there.

    How can I prevent or remove those hidden characters?

    I have looked for programs that can search and replace characters from files, like fnr, but they can't remove "hidden" bytes.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  2. Best Answer
    Posted by Wim

    "Found it!
    Code:
    (Get-Content \\servername\home\long_path...\data\FTP\LTB.xml) | ForEach-Object { $_ -replace "\uFEFF<root>", "<root>" } | Set-Content \\\servername\home\long_path...\data\FTP\LTB2.xml -Encoding UNICODE
    "


  3. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    PowerShell runs on nearly every Windows Server running today (pre-installed on 2008 and later, installable on 2003), and PowerShell can find and replace or remove non-printing characters like you've described.

    If you want another quick/free/easy way to fix this, AWK (more specifically gawk.exe from http://gnuwin32.sourceforge.net/packages/gawk.htm) does a splendid job.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  4. #3
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Pat,

    Thank you for your reaction.

    Behold my very first PowerShell script:
    Code:
    (Get-Content \\servername\home\long_path...\data\FTP\LTB.xml) | ForEach-Object { $_ -replace "\uFFFE<root>", "<root>" } | Set-Content \\servername\home\long_path...\data\FTP\LTB2.xml
    The result is not as I expected:
    - The resulting file is no longer unicode; the file size shrunk from 28 k to 14 k.
    - The string with the hidden character "#FFFE<root>" is replaced by "?<root>" and not by "<root>".
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  5. #4
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Code:
    (Get-Content \\servername\home\long_path...\data\FTP\LTB.xml) | ForEach-Object { $_ -replace "\uFFFE<root>", "<root>" } | Set-Content \\servername\home\long_path...\data\FTP\LTB2.xml -Encoding UNICODE
    Preserves the unicode encoding, but finds nothing to replace.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  6. #5
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Found it!
    Code:
    (Get-Content \\servername\home\long_path...\data\FTP\LTB.xml) | ForEach-Object { $_ -replace "\uFEFF<root>", "<root>" } | Set-Content \\\servername\home\long_path...\data\FTP\LTB2.xml -Encoding UNICODE
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  7. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Glad that I could provide a useful idea!

    If that was your first PowerShell script, you've got real potential! PowerShell is Microsoft's preferred (required) scripting environment, the things that you can do with PowerShell are amazing and the vast majority of those things are really easy!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  8. #7
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Quote Originally Posted by Pat Phelan View Post
    Glad that I could provide a useful idea!
    Thank you again!

    Quote Originally Posted by Pat Phelan View Post
    If that was your first PowerShell script, you've got real potential!
    Google did help a lot
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •