If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > How to parse XML file

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 10-11-07, 21:23
palchuri palchuri is offline
Registered User
 
Join Date: Jan 2003
Posts: 47
How to parse XML file

Hi All,

I have a XML file of the following format receives from one of our external applications. I have a requirement to extract distributorid,sellingind for each distributor element and store the same into a table having two columns(distributorid,sellingind). The size of the xml file could be upto 2-3 GB. Instead of using java, I would like to use db2 in case if it supports. I had gone thru the documentation and identified 2 approaches.

1. DB2 XML extender
2. Federate using XML Wrapper

I had gone thru the documentation available over the internet, I still not understood which way is the better approach.

Any help or documenatation is appreciated.

<?xml version="1.0" encoding="UTF-8"?><DistributorDomain xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns="www.nc.com/schemas/Distributor" xmlns:ACCTSUM="www.nc.com/schemas/AccountSummary" xmlnsstCrd="www.nc.com/schemas/DistributionCredential" xmlnsstLoc="www.nc.com/schemas/DistributionLocation" xmlnsstOrgn="www.nc.com/schemas/DistributionOrganization" xmlnsTD="www.nc.com/schemas/Standards">
<Distributor>
<DistributorId>12345</DistributorId>
<Person>
<SellInd>true</SellInd>
</Person>
</Distributor>
<Distributor>
<DistributorId>12346</DistributorId>
<Person>
<SellInd>true</SellInd>
</Person>
</Distributor>
<Distributor>
<DistributorId>12347</DistributorId>
<Person>
<SellInd>true</SellInd>
</Person>
</Distributor>




Thanks
Shri
__________________
p.srinivasarao
Reply With Quote
  #2 (permalink)  
Old 10-12-07, 00:07
Marcus_A Marcus_A is offline
Registered User
 
Join Date: May 2003
Location: USA
Posts: 5,196
The largest column size for a CLOB in DB2 (which is where DB2 XML Extender would store the XML) is 2GB (unless it has been changed in DB2 9). The largest column size that can be logged (CLOBs can be defined as NOT LOGGED) is 1 GB.

XML Extender will probably not be supported beyond version 9 or 10, since its functionality has been replaced by "Pure XML" in V9. So use it at your own risk.
__________________
M. A. Feldman
IBM Certified DBA on DB2 for Linux, UNIX, and Windows
IBM Certified DBA on DB2 for z/OS and OS/390
Reply With Quote
  #3 (permalink)  
Old 10-12-07, 11:55
nidm nidm is offline
Registered User
 
Join Date: May 2003
Posts: 113
Quote:
Originally Posted by Marcus_A
The largest column size for a CLOB in DB2 (which is where DB2 XML Extender would store the XML) is 2GB (unless it has been changed in DB2 9). The largest column size that can be logged (CLOBs can be defined as NOT LOGGED) is 1 GB.

XML Extender will probably not be supported beyond version 9 or 10, since its functionality has been replaced by "Pure XML" in V9. So use it at your own risk.

totally agree with Marcus_A. XML extender is not the best choice if you will deal a lot with XML.

For sandbox purpose, you can try DB2 v9(the expression version is free, I think) and use the 'pure xml' feature. You can store a XML data that larger than 2G. Because by theory, there is no size limitation for XML data though there are system limitation.

Then use XMLTABLE function to split the document into smaller piece before query on the data.



By all means, there is no cheap/easy way to query on a 2G~3G document.

Your case is one of the perfect example for XMLTABLE, the following stmt is from DB2/zos

INSERT INTO YourTargetTable distributorid,sellingind AS
SELECT
FROM XMLTABLE('/Distributor' PASSING CAST(? AS XML)
COLUMNS "DistributorId" INTEGER,
"sellingind" CHAR(5)) X

(1) CAST(? as XML) will bring in your document(in text format),
(2) '/Distributor' will grab each 'Distributor' element from root, which will contruct to each row of the result table X
(3) COLUMNS "DistributorId" INTEGER,
"sellingind" CHAR(5)) X
will get the context of "DistributorId" and "sellingind" element from that particular 'Distributor'.
(4) eventrually, the insert from sub-select will put that data into your regular relational table.

I am pretty positive that DB2 can deal with 1GB document. It will be a challenge for 3GB. But if you are a big customer, DB2 will make it work for you.

Last edited by nidm; 10-12-07 at 12:07.
Reply With Quote
  #4 (permalink)  
Old 10-13-07, 04:34
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Quote:
Originally Posted by nidm
(1) CAST(? as XML) will bring in your document(in text format),

I am pretty positive that DB2 can deal with 1GB document. It will be a challenge for 3GB. But if you are a big customer, DB2 will make it work for you.
The data type in the client will be CLOB or so, which introduces the 2GB limitation again.

Your only chance to handle documents bigger than 2GB would be the XML wrapper - if that wrapper supports it (You should give that a try/read the manual.) From a client application, you have to split the document yourself first.

p.s: The 2GB LOB limit has not been lifted (yet).
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #5 (permalink)  
Old 10-13-07, 13:44
nidm nidm is offline
Registered User
 
Join Date: May 2003
Posts: 113
Quote:
Originally Posted by stolze
The data type in the client will be CLOB or so, which introduces the 2GB limitation again.

Your only chance to handle documents bigger than 2GB would be the XML wrapper - if that wrapper supports it (You should give that a try/read the manual.) From a client application, you have to split the document yourself first.

p.s: The 2GB LOB limit has not been lifted (yet).
CAST(? as xml) bind in the hostvar as XML(not lobs) using streaming tech, which don't have such limitation. But I don't know whether any one tries it before.
Reply With Quote
  #6 (permalink)  
Old 10-14-07, 05:52
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Quote:
Originally Posted by nidm
CAST(? as xml) bind in the hostvar as XML(not lobs) using streaming tech, which don't have such limitation. But I don't know whether any one tries it before.
The CAST happens inside the SQL engine (DB2). The only thing that could possibly help is - as you mention - XML host variables. However:
  1. With embedded SQL, you declare XML host variables like this: "SQL TYPE IS XML AS CLOB(n) <hostvar_name>" The base type is CLOB and you have inherently all LOB limitations applied. (http://publib.boulder.ibm.com/infoce...c/t0021516.htm)
  2. The LOB limits are also applicable if you use the XMLPARSE()/XMLSERIALIZE() function because we would transfer the data as LOBs (or only VARCHAR) between applicatian and DB2.
  3. "SQL limits" (http://publib.boulder.ibm.com/infoce...c/r0001029.htm) also state that a serialized XML document may not exceed the 2G limit. Bringing this together with the fact that the XML content must be in its serialized form in the application (http://publib.boulder.ibm.com/infoce.../c0023466.htm), we have tho 2GB limit overall.

Unless I missed something, I still claim that XML documents must not exceepd 2G in size...
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #7 (permalink)  
Old 10-14-07, 09:23
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
Or try this variant:
Code:
INSERT INTO YourTargetTable (distributorid,sellingind)
SELECT d, s
FROM XMLTABLE('$X/Distributor' passing CAST(? AS XML) as X
	columns D INTEGER path 'DistributorId',
	        S CHAR(5) path 'Person/SellInd') AS x
As has already been said, the XML file to be passed in the "?" can be at most (a bit less than) 2GB large; the above query can safely be called several times, on pieces of the original file, but the "?" must each time be a full, valid XML file. Hence, in cutting apart 1.99 GB pieces of the file, you will have to add an extra <?xml ...> header between some consecutive
"</Distributor>"
and
"<Distributor>"
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On