Results 1 to 7 of 7
  1. #1
    Join Date
    Dec 2003
    Posts
    5

    Question Unanswered: Complex SQL*LOADER/Transx questions

    I have a tough problem I am trying to solve and any guru advice would be most appreciated. I have a hierarchical XML structure I want to <B>Jack</B>
    <FOO_BAR>ingest into oracle. I DO NOT wish to store this data as an XMLType- I actually want to parse out these tables into various defined table schemas.

    Here is a quick simplified example of XML:

    <QUEEN_FOO>
    <ATT1>1</ATT1>
    <ATT2>ARIZONA</ATT2>
    <FOO>
    <A>1.1</A>
    <B>Kasumi</B>
    <FOO_BAR>
    <X>42</>
    <Y>NW</Y>
    </FOO_BAR>
    </FOO>
    <FOO>
    <A>2.1</A>

    <X>24</>
    <Y>CA</Y>
    </FOO_BAR>
    </FOO>
    <FOO>
    <A>3.1</A>
    <B>Diane</B>
    <FOO_BAR>
    <X>33</>
    <Y>DE</Y>
    </FOO_BAR>
    </FOO>
    …snip
    </QUEEN_FOO>
    -----------------------------------------------------------------
    Simplified Schema:

    Table QUEEN_FOO {
    QUEEN_ID number primary key(oracle generated sequence),
    ATT1 number,
    ATT2 VARCHAR;
    }


    Table FOO {
    FOO_ ID number primary key(oracle generated),
    QUEEN_ID number foreign key,
    A number,
    B VARCHAR;
    }

    Table FOO_BAR {
    FOO_ ID number foreign key,
    A number,
    B VARCHAR;
    }

    Notice that the primary key id’s do not exist until inserted into the database. The relationship integrity in the XML file is entirely based on the tree hierarchy. It is the problem of getting these foreign keys that is causing me the biggest headache.

    I have looked at the xdk tool Transx for this purpose, but unfortunately the XSL required to transform this very large XML document eludes me. I am not very experienced with XSL and combined with the difficulties of getting the foreign keys I am not certain XSL can be easily made to do what I am asking of it. At best I would have the transform the XML and stuff the tables containing the primary keys that will be used in the other tables as foreign keys- then do another transform that has some embedded sql calls to get the appropriate values. To further complicate this Transx or rather DOM doesn’t really like large files and Transx as far I can tell expects one table per XML file. This means the XSL required to trans for the document would need multiple input .xml files for the table schema and multiple output xml files- 1 for each table. I suppose these problems could be gotten around some what with multiple transforms against the large xml data inputs, but that sounds extra slow.

    The combined weight of the problems above leads me to consider SQL*Loader, which should be able to take a configurable input and allows for some manipulation of the data while it is being stuffed. Do any of the gurus of this group think that SQL*Loader is a viable option? Does anyone have a clever ideas of how to approach this problem?

    Thanks in advance,

    Lorien

  2. #2
    Join Date
    Jul 2003
    Posts
    2,296
    I would do this:

    1. Create a temporary table with just a description_text field varchar2(500)
    2. sqlload EVERYTHING into that temp table, (with a sequenced numbered column too)
    3. write a procedure that selects every line, sorts through it, and then inserts into the appropriate tables.

    at worst I would start by thinking of how I could group things together.
    this can be done if you add a sequenced column with an insert trigger in order to number all your rows.

    this way if you need to group by '<QUEEN_FOO>' you can select the sequence number of the first '<QUEEN_FOO>' and then select the max(seq_no) of the next ending/closeing '</QUEEN_FOO>'.

    Now you have two sequence numbers and all the data that is between those numbers belongs together (I assume). Now you have something to work with. If I know each '<QUEEN_FOO>' group is between sequences 50 and 75 then I can select:

    PHP Code:
    select xml_txt 
    into v_att1
    from temp_xml 
    where seq_no 
    >= 49 and seq_no <= 75
    and xml_txt like '<ATT1>%';

    select xml_txt 
    into v_att2
    from temp_xml 
    where seq_no 
    >= 49 and seq_no <= 75
    and xml_txt like '<ATT2>%';

    insert into queen_foo (
      
    att1,
      
    att2 )
    values (
      
    v_att1,
      
    v_att2 ); 
    see?
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  3. #3
    Join Date
    Dec 2003
    Posts
    5
    Duck,

    I see what you are getting at and it indeed looks like a solution to the problem I described. Unfortunately, I guess I did not describe my problem fully- my bad. To further complicate things see the following:


    <QUEEN_FOO/>
    <att1>1</att1>
    <att2>2</att2>
    <FOO>
    <a>1.1</a>
    <b>bla</b>
    <FOO_BAR>
    <x>tom</x>
    <y>53</y>
    </FOO_BAR>
    </FOO>
    <FOO>
    <a>2.2</a>
    <b>crap</b>
    <FOO_BAR>
    <x>hmm</x>
    <y>8888</y>
    <FOO_BAR_KID>
    <info>something</info>
    </FOO_BAR_KID>
    </FOO_BAR>
    </FOO>
    <FOO>
    <a>3.3</a>
    <b>blabla</b>
    <FOO_BAR>
    <x>sue</x>
    <y>33</y>
    </FOO_BAR>
    <SIB_OF_FOO_BAR>
    <i>25</>
    <j>dave</j>
    <k>drat</k>
    </SIB_OF_FOO_BAR>
    </FOO>
    </QUEEN_FOO>

    I over simplified the example-it did not descibe the fact that some extra relationships may exists in certain cases. This is part of the extra difficulty of the problem. Fixed lengths can not be generated to walk to whole table. Though once a table delimiter is found the number of attributes in a given table is fix .i.e. every attribute that is in a table will have an element in the XML file, even if that element contains no data. I think the problem is fully described now.

    Sorry about that,

    Lorien

  4. #4
    Join Date
    Jul 2003
    Posts
    2,296
    this can still be done the way I described.

    Am I correct that there could be any number of 'FOO's inside a QUEEN_FOO?

    That is okay, you can run a loop.
    Let's start where I left off.

    You know that the QUEEN_FOO encompanses sequence 50-75, but you do not know how many FOOS you have in there. No problem.

    First, find the first FOO in the QUEEN_FOO:
    PHP Code:
    select min(seq_no
    into v_FOO_start
    from temp_xml
    where seq_no 
    50 and seq_no 75
    and xml_txt '<FOO>'
    Now you know what line your FOO starts at and you say that you know that it will definitely have a certain number of columns or whatever.

    Now we need the first FOO_BAR within that FOO
    PHP Code:
    select min(seq_no)
    into v_FOOBAR_start
    from temp_xml
    where seq_no 
    v_FOO_start
    and xml_txt '<FOO_BAR>'
    once you have those starting points you can run loops to capture and load each FOO and FOO_BAR for everyone of your QUEEN_FOOs.

    Let me know what I missed if I missed some important aspect.
    Last edited by The_Duck; 12-18-03 at 16:57.
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  5. #5
    Join Date
    Dec 2003
    Posts
    5

    Lightbulb

    Duck,

    Your assertions are correct and I think you idea solves part of the problem. I can indeed loop through each of the foos and the do further searches for other tables located within the sequence numbers of foo foo.next.

    I am still not sure how I am going to get the foreign key that will be supplied from the parents primary id(create on insert of foo). I guess if I populate a current foo element at the beginning of the loop process then I could set the foreign key for a child by doing a quick select on the id field of foo and pulling out the highest id. If that is the case then I think we have a solution. Oh, happy day! Now all I have to do is figure out the sql*loader syntax to make it all happen.

    Is this what you envisioned as well?

    Thaks again,

    Lorien

    P.S> BTW: Your name wouldn't happen to be Chris would it? I think I remember a Oracle Guru named Chris that had a nickname of Duck.

  6. #6
    Join Date
    Jul 2003
    Posts
    2,296
    I am not Chris, I am only a Duck. Sorry

    for the sqllloader part, just load every line as one long text column with the second column being the sequence number.

    Granted, this is just how I would do it.

    create the temp table:
    PHP Code:
    create table temp_xml (
      
    xml_text  varchar2(500),
      
    xml_seq  number); 
    create the sequence trigger:
    PHP Code:
    drop sequence temp_seq;

    create sequence temp_seq;

    create or replace trigger xml_sequence
        before insert on TEMP_XML
        
    for each row
            
    declare
                
    v_seq number;
            
    begin
                select temp_seq
    .nextval into v_seq from dual;
            if 
    INSERTING then
                
    :new.seq_no := v_seq;
            
    end if;
    end;

    now create a sqlloader control file:
    PHP Code:
    LOAD DATA
    INFILE        
    'xml.txt'    -- Input data file name
    BADFILE        
    'xml.bad'    -- Bad record file name
    DISCARDFILE    
    'xml.dis'    -- Discard file name
    DISCARDMAX    10000000    
    -- Maximum Number for failed discarded records

    APPEND INTO TABLE TEMP_XML
    (
     
    xml_text CHAR

    load your data, write your procedure, have fun.

    keep in mind if you have constraint concerns, then first create a copy of all the tables you will be inserting into with no constraints, load/augment your data into those tables and then move the data afterwards into your real tables.
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  7. #7
    Join Date
    Dec 2003
    Posts
    5

    Smile

    Thank you for your help Duck!


    Lorien

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •