Results 1 to 7 of 7
  1. #1
    Join Date
    Jan 2003
    Posts
    47

    Unanswered: How to parse XML file

    Hi All,

    I have a XML file of the following format receives from one of our external applications. I have a requirement to extract distributorid,sellingind for each distributor element and store the same into a table having two columns(distributorid,sellingind). The size of the xml file could be upto 2-3 GB. Instead of using java, I would like to use db2 in case if it supports. I had gone thru the documentation and identified 2 approaches.

    1. DB2 XML extender
    2. Federate using XML Wrapper

    I had gone thru the documentation available over the internet, I still not understood which way is the better approach.

    Any help or documenatation is appreciated.

    <?xml version="1.0" encoding="UTF-8"?><DistributorDomain xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="www.nc.com/schemas/Distributor" xmlns:ACCTSUM="www.nc.com/schemas/AccountSummary" xmlnsstCrd="www.nc.com/schemas/DistributionCredential" xmlnsstLoc="www.nc.com/schemas/DistributionLocation" xmlnsstOrgn="www.nc.com/schemas/DistributionOrganization" xmlnsTD="www.nc.com/schemas/Standards">
    <Distributor>
    <DistributorId>12345</DistributorId>
    <Person>
    <SellInd>true</SellInd>
    </Person>
    </Distributor>
    <Distributor>
    <DistributorId>12346</DistributorId>
    <Person>
    <SellInd>true</SellInd>
    </Person>
    </Distributor>
    <Distributor>
    <DistributorId>12347</DistributorId>
    <Person>
    <SellInd>true</SellInd>
    </Person>
    </Distributor>




    Thanks
    Shri
    p.srinivasarao

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    The largest column size for a CLOB in DB2 (which is where DB2 XML Extender would store the XML) is 2GB (unless it has been changed in DB2 9). The largest column size that can be logged (CLOBs can be defined as NOT LOGGED) is 1 GB.

    XML Extender will probably not be supported beyond version 9 or 10, since its functionality has been replaced by "Pure XML" in V9. So use it at your own risk.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    May 2003
    Posts
    113
    Quote Originally Posted by Marcus_A
    The largest column size for a CLOB in DB2 (which is where DB2 XML Extender would store the XML) is 2GB (unless it has been changed in DB2 9). The largest column size that can be logged (CLOBs can be defined as NOT LOGGED) is 1 GB.

    XML Extender will probably not be supported beyond version 9 or 10, since its functionality has been replaced by "Pure XML" in V9. So use it at your own risk.

    totally agree with Marcus_A. XML extender is not the best choice if you will deal a lot with XML.

    For sandbox purpose, you can try DB2 v9(the expression version is free, I think) and use the 'pure xml' feature. You can store a XML data that larger than 2G. Because by theory, there is no size limitation for XML data though there are system limitation.

    Then use XMLTABLE function to split the document into smaller piece before query on the data.



    By all means, there is no cheap/easy way to query on a 2G~3G document.

    Your case is one of the perfect example for XMLTABLE, the following stmt is from DB2/zos

    INSERT INTO YourTargetTable distributorid,sellingind AS
    SELECT
    FROM XMLTABLE('/Distributor' PASSING CAST(? AS XML)
    COLUMNS "DistributorId" INTEGER,
    "sellingind" CHAR(5)) X

    (1) CAST(? as XML) will bring in your document(in text format),
    (2) '/Distributor' will grab each 'Distributor' element from root, which will contruct to each row of the result table X
    (3) COLUMNS "DistributorId" INTEGER,
    "sellingind" CHAR(5)) X
    will get the context of "DistributorId" and "sellingind" element from that particular 'Distributor'.
    (4) eventrually, the insert from sub-select will put that data into your regular relational table.

    I am pretty positive that DB2 can deal with 1GB document. It will be a challenge for 3GB. But if you are a big customer, DB2 will make it work for you.
    Last edited by nidm; 10-12-07 at 13:07.

  4. #4
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by nidm
    (1) CAST(? as XML) will bring in your document(in text format),

    I am pretty positive that DB2 can deal with 1GB document. It will be a challenge for 3GB. But if you are a big customer, DB2 will make it work for you.
    The data type in the client will be CLOB or so, which introduces the 2GB limitation again.

    Your only chance to handle documents bigger than 2GB would be the XML wrapper - if that wrapper supports it (You should give that a try/read the manual.) From a client application, you have to split the document yourself first.

    p.s: The 2GB LOB limit has not been lifted (yet).
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  5. #5
    Join Date
    May 2003
    Posts
    113
    Quote Originally Posted by stolze
    The data type in the client will be CLOB or so, which introduces the 2GB limitation again.

    Your only chance to handle documents bigger than 2GB would be the XML wrapper - if that wrapper supports it (You should give that a try/read the manual.) From a client application, you have to split the document yourself first.

    p.s: The 2GB LOB limit has not been lifted (yet).
    CAST(? as xml) bind in the hostvar as XML(not lobs) using streaming tech, which don't have such limitation. But I don't know whether any one tries it before.

  6. #6
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by nidm
    CAST(? as xml) bind in the hostvar as XML(not lobs) using streaming tech, which don't have such limitation. But I don't know whether any one tries it before.
    The CAST happens inside the SQL engine (DB2). The only thing that could possibly help is - as you mention - XML host variables. However:
    1. With embedded SQL, you declare XML host variables like this: "SQL TYPE IS XML AS CLOB(n) <hostvar_name>" The base type is CLOB and you have inherently all LOB limitations applied. (http://publib.boulder.ibm.com/infoce...c/t0021516.htm)
    2. The LOB limits are also applicable if you use the XMLPARSE()/XMLSERIALIZE() function because we would transfer the data as LOBs (or only VARCHAR) between applicatian and DB2.
    3. "SQL limits" (http://publib.boulder.ibm.com/infoce...c/r0001029.htm) also state that a serialized XML document may not exceed the 2G limit. Bringing this together with the fact that the XML content must be in its serialized form in the application (http://publib.boulder.ibm.com/infoce.../c0023466.htm), we have tho 2GB limit overall.


    Unless I missed something, I still claim that XML documents must not exceepd 2G in size...
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  7. #7
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Or try this variant:
    Code:
    INSERT INTO YourTargetTable (distributorid,sellingind)
    SELECT d, s
    FROM XMLTABLE('$X/Distributor' passing CAST(? AS XML) as X
    	columns D INTEGER path 'DistributorId',
    	        S CHAR(5) path 'Person/SellInd') AS x
    As has already been said, the XML file to be passed in the "?" can be at most (a bit less than) 2GB large; the above query can safely be called several times, on pieces of the original file, but the "?" must each time be a full, valid XML file. Hence, in cutting apart 1.99 GB pieces of the file, you will have to add an extra <?xml ...> header between some consecutive
    "</Distributor>"
    and
    "<Distributor>"
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •