Results 1 to 7 of 7
  1. #1
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4

    Unanswered: sp_xml_preparedocument can only process untyped XML

    I am exploring the capabilities of SQL Server 2008R2 to shred XML files into tables. All works fine and smooth. See the code below: (Normally it reads from a file, for testability I included test-xml data in the code.)
    Code:
    DECLARE @x xml --(dbo.DoctorPatients)
    DECLARE @hdoc INT
    
    --SELECT @x = H
    --FROM OPENROWSET (BULK '\\server\path\to\xml\file\HealthCenter.xml', SINGLE_BLOB) AS HealthCenter(H)
    SET @x = N'
    <HealthCenter xmlns="http://www.w3schools.com">
      <Patients>
        <Patient>
          <Id>1</Id>
          <FirstName>John</FirstName>
          <LastName>Doe</LastName>
          <Age>28</Age>
          <DoctorID>1</DoctorID>
        </Patient>
      </Patients>
    </HealthCenter>
    '
    --check content of @x
    SELECT @x as xmlData
    
    EXEC sp_xml_preparedocument @hdoc OUTPUT, @x
    
    -- INSERT INTO destinationTable(Id, FirstName, LastName, Age, DoctorId)
    SELECT 	Id, FirstName, LastName, Age, DoctorId
    FROM OPENXML (@hdoc, '/HealthCenter/Patients/Patient', 1)
    WITH(
    	Id		INT		'Id',
    	FirstName	VARCHAR(50)	'FirstName',
    	LastName	VARCHAR(50)	'LastName',
    	Age		INT		'Age',
    	DoctorID	INT		'DoctorID'
    )
    
    EXEC sp_xml_removedocument @hdoc
    Now we have to process some larger XML files. Shredding one file into a table takes about 2.5 minutes.
    - We would like to add XSD verification to the XML.
    - I read that by "typing" an XML file with its XSD schema, the XML object in the database takes less place to store and can be processed faster.

    Adding the XSD schema to the database.
    Code:
    CREATE XML SCHEMA COLLECTION dbo.DoctorPatients AS
    N'<?xml version="1.0" encoding="UTF-16"?>
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    targetNamespace="http://www.w3schools.com"
    xmlns="http://www.w3schools.com"
    elementFormDefault="qualified">
    
      <xs:element name="HealthCenter">
        <xs:complexType>
          <xs:sequence>
            <xs:element name="Patients">
              <xs:complexType>
                <xs:sequence>
                  <xs:element name="Patient" type="PatientDetails" minOccurs="0" maxOccurs="unbounded"/>
                </xs:sequence> 
              </xs:complexType>
            </xs:element>
          </xs:sequence> 
        </xs:complexType>
      </xs:element>
    
      <xs:complexType name="PatientDetails">
        <xs:sequence>
          <xs:element name="Id" type="xs:integer"/>
          <xs:element name="FirstName" type="xs:string"/>
          <xs:element name="LastName" type="xs:string"/>
          <xs:element name="Age" type="xs:string"/>
          <xs:element name="DoctorID" type="xs:integer"/>
        </xs:sequence>
      </xs:complexType>
    </xs:schema>
    ' 
    GO
    By adding the XSD name after the variable declaration of @x, one can assign that XSD to the XML (basically removing the comment marks "--" in the first line of the upper most code block:
    DECLARE @x xml --(dbo.DoctorPatients)

    Code:
    DECLARE @x xml (dbo.DoctorPatients)
    DECLARE @hdoc INT
    ...
    I thought that would be all I had to do, to both have the XSD check and speed things up. But this is what I got:

    Msg 6628, Level 16, State 1, Procedure sp_xml_preparedocument, Line 1
    sp_xml_preparedocument can only process untyped XML. Cast the input value to XML or to a string type.


    How can I shred the data from the validated XML file (typed XML) into the database?

    What speed improvement may we expect comparing shredding an untyped XML file into a table versus doing the same starting from typed XML?


    PS: I have deleted a large part of the XML and XSD. I hope everything is still well formed.
    Last edited by Wim; 09-13-13 at 04:20. Reason: Replaced "load XML file" by "shred XML file into the database"
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  2. #2
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    I'm not very familiar with XML in SQL Server-land but...

    Create a staging table that uses your new XML schema definition
    Code:
    CREATE TABLE dbo.wim_xmltest (
       foo xml (dbo.DoctorPatients)
    );
    Insert a valid piece of XML (according to schema)...
    Code:
    INSERT INTO dbo.wim_xmltest (foo)
      VALUES (
    N'
    <HealthCenter xmlns="http://www.w3schools.com">
      <Patients>
        <Patient>
          <Id>1</Id>
          <FirstName>John</FirstName>
          <LastName>Doe</LastName>
          <Age>28</Age>
          <DoctorID>1</DoctorID>
        </Patient>
      </Patients>
    </HealthCenter>
    ');
    Code:
    (1 row(s) affected)
    Insert an invalid piece of XML (according to schema)...
    Code:
    INSERT INTO dbo.wim_xmltest (foo)
      VALUES (
    N'
    <HealthCenter xmlns="http://www.w3schools.com">
      <gvee>
        <Patient>
          <Id>1</Id>
          <FirstName>John</FirstName>
          <LastName>Doe</LastName>
          <Age>28</Age>
          <DoctorID>1</DoctorID>
        </Patient>
      </gvee>
    </HealthCenter>
    ');
    Code:
    Msg 6965, Level 16, State 1, Line 2
    XML Validation: Invalid content. Expected element(s): '{http://www.w3schools.com}Patients'. Found: element '{http://www.w3schools.com}gvee' instead. Location: /*:HealthCenter[1]/*:gvee[1].
    HTH
    George
    Home | Blog

  3. #3
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Code:
    DECLARE @xml table (
       foo xml (dbo.DoctorPatients)
    );
    
    DECLARE @hdoc int
          , @x    xml;
    
    BEGIN TRY
      INSERT INTO @xml (foo) VALUES (N'
      <HealthCenter xmlns="http://www.w3schools.com">
        <Patients>
          <Patient>
            <Id>1</Id>
            <FirstName>John</FirstName>
            <LastName>Doe</LastName>r
            <Age>28</Age>
            <DoctorID>1</DoctorID>
          </Patient>
        </Patients>
      </HealthCenter>
      ');
    END TRY
      BEGIN CATCH
        RAISERROR ('Invalid XML', 18, 19);
      END CATCH
    
    SET @x = (
              SELECT TOP 1
                     foo
              FROM   @xml
             );
    
    SELECT @x As xml_data;
    
    EXEC sp_xml_preparedocument @hdoc OUTPUT, @x
    
    SELECT 	Id, FirstName, LastName, Age, DoctorId
    FROM OPENXML (@hdoc, '/HealthCenter', 1)
    WITH(
    	Id		INT		'Id',
    	FirstName	VARCHAR(50)	'FirstName',
    	LastName	VARCHAR(50)	'LastName',
    	Age		INT		'Age',
    	DoctorID	INT		'DoctorID'
    );
    
    EXEC sp_xml_removedocument @hdoc;
    George
    Home | Blog

  4. #4
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Hi George,

    Thank you for your reply.

    What you are basically doing is making a typed XML by validating the XML-file against the XSD and when it's valid, then shred it further as an untyped XML.

    The idea was to load the XML into the database (be it in a variable or in a table is insignificant) and have it validated against the XSD at the same time. Then, as a typed XML can be stored and processed more efficiently, use that typed XML to shred it into the database.

    I saw the code to do it with value() and node() and the like. But I can't get my head around it how to get all data into the database. In the examples I saw they use [1] to select the first element, or [2] to get the second one. I want to shred them all and I have no idea how to do that.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  5. #5
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Just as an aside; the third parameter in OPENXML() should be a 2 in your case (attribute-centric)
    Code:
    SELECT 	*
    FROM OPENXML (@hdoc, '/HealthCenter/Patients/Patient', 2)
    WITH (
       Id        int         'Id'
     , FirstName varchar(50) 'FirstName'
     , LastName  varchar(50) 'LastName'
     , Age       int         'Age'
     , DoctorID  int         'DoctorID'
    );
    George
    Home | Blog

  6. #6
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Had another little play this morning... Hope this script is of some use!

    It utilises the .nodes() and .value() constructors to shred multiple sources all at once:

    Code:
    DECLARE @xml table (
       surrogate_key int IDENTITY(9,37)
     , foo           xml (dbo.DoctorPatients)
    );
    
    INSERT INTO @xml (foo)
      VALUES (N'
      <HealthCenter xmlns="http://www.w3schools.com">
        <Patients>
          <Patient>
            <Id>1</Id>
            <FirstName>John</FirstName>
            <LastName>Doe</LastName>
            <Age>28</Age>
            <DoctorID>1</DoctorID>
          </Patient>
          <Patient>
            <Id>2</Id>
            <FirstName>Jane</FirstName>
            <LastName>Doe</LastName>
            <Age>27</Age>
            <DoctorID>1</DoctorID>
          </Patient>
        </Patients>
      </HealthCenter>
    ')
    , (N'
      <HealthCenter xmlns="http://www.w3schools.com">
        <Patients>
          <Patient>
            <Id>999</Id>
            <FirstName>Batman</FirstName>
            <LastName></LastName>
            <Age>12300</Age>
            <DoctorID>1</DoctorID>
          </Patient>
        </Patients>
      </HealthCenter>
      ');
    
    ;WITH XMLNAMESPACES 
    (
        DEFAULT 'http://www.w3schools.com'
    )
    SELECT x.surrogate_key
         , x.foo As unshredded
         , patients.z.value('(Id)[1]'       , 'int')         As id
         , patients.z.value('(FirstName)[1]', 'varchar(50)') As firstname
         , patients.z.value('(LastName)[1]' , 'varchar(50)') As lastname
         , patients.z.value('(Age)[1]'      , 'int')         As age
         , patients.z.value('(DoctorID)[1]' , 'int')         As doctorid
    FROM   @xml As x
     CROSS
      APPLY foo.nodes('/HealthCenter/Patients/Patient') As patients (z)
    George
    Home | Blog

  7. #7
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Hi George,

    Thank you for the follow-up.

    I can't do it right now. I will try your solution in a few days.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •