Results 1 to 9 of 9
  1. #1
    Join Date
    Aug 2003
    Posts
    25

    Unanswered: Processing of million records

    Hi The_Duck,


    hello... I want your help on this.

    I have a cursor and have rowcount result of 1,000,000 rows.
    If i divide 1,000,000 rows into 100,000 and have a result of 10, i want
    to process that 10 reults, meaning 1 have a value of 1 - 100,000 and 2 have
    a value of 100,001 - 200, 000, and 3 have a value of 200,001 - 300,000
    and so on, until I reach 10. which mean i have processed all 1,000,000
    rows.
    Please help me. I want to do this because if i processed all those records (1,000,000) my processing are very very slow. It took 10 hours.
    So i want to cut them into let say 100,000.


    Thank you,
    YNOEL

  2. #2
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    One way which is quick and easy is to take the min and max value of a column which is indexed (as the first column in the index). Check the data distribution for the column is quite even i.e. if you divide the column values into 10 buckets i.e. (max-min)/10 then you get roughly equivalent number of rows in each bucket.

    Then all you have to do in your cursor is to have a loop which gets each bucket in turn i.e. where x between min+i*bucket_size and min+(i+1)*bucket_size

    Make sure you assign a large rollback segment for your cursor aswell.

    Another thing is if you can do your processing with one sql statement rather than a cursor then it should be quicker.

    Also dont forget to use bulk binds if you can as this can make a big improvement in performance.

    Alan

  3. #3
    Join Date
    Jul 2003
    Posts
    2,296
    Alan, are you The_Duck now???

    Anyways, I just read in asktom.com (not askduck.com, that will be coming soon, NOT!) about processing updates or deletes of millions of rows. It was a very interesting article/discussion.

    The point was that if this is a batch update job and the assumption is no one is looking/using the data while you are manipulating it, then you can do the following:

    1. use create table as select to do your update using append hint and nologging and parallel as needed.

    example: create table temp_table nologging as select columns1, columns2 from original_table where customer_id < 3000000;

    2. add all constraints and indexes to the temp table as on the original table

    3. drop original table, rename temp table to the original table name, then enable logging.

    4. all set.

    I'll try to find the link to the article.

    Found it. It is a great read.
    http://asktom.oracle.com/pls/ask/f?p...407993912330,Y

    That didn't seem to work. GO here and then click 'How to Update Millions of rows'
    http://asktom.oracle.com/pls/ask/f?p...1939435868::NO
    Last edited by The_Duck; 10-21-03 at 12:10.
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  4. #4
    Join Date
    Aug 2003
    Posts
    25
    Hi The_Duck,

    Nice joke! If ever you have that ==> askduck.com, i will be your
    no. 1 fans.

    Anyway, as of my understanding in asktom example's, he updates only one column from a base table. What if i will update a column from other table. I tried this one but it took 6 hours so i cancelled it.

    Can you help me...

    connect ora01/mis@psdb;
    alter table pcs_header nologging;
    alter table pcfwork nologging;
    insert /*+ append parallel (pcs, 12) */ into pcs_header_temp pcs
    ( part_no,prodn_date,need,defective_qty,other_reques ted_qty,
    supplied_qty,incoming_qty,trans_qty_to_whse,adjust ed_whse_stock,
    computed_whse_stock,line_invty_adjustment,computed _line_invty,
    computed_lacking_qty,zero_out_qty)
    select /*+ parallel (c, 12) */
    a.part_no, a.prodn_date, a.need, b.defective_qty, b.other_requested_qty,
    b.supplied_qty,b.incoming_qty,b.trans_qty_to_whse, b.adjusted_whse_stock,
    b.computed_whse_stock,b.line_invty_adjustment,b.co mputed_line_invty,
    b.computed_lacking_qty,b.zero_out_qty
    from (select d.part_no, c.prodn_date, c.need
    from pcfwork c, parts d
    where c.part_no = d.part_no
    or c.part_no = d.art_part_no
    or c.part_no = d.cust_part_no
    or c.part_no = d.supp_part_no
    or c.part_no = d.cae_part_no ) a,
    pcs_header b
    where b.part_no = a.part_no
    and b.prodn_date = a.prodn_date;
    commit;

    thanks again,
    ynoel


    Special Thanks to Alan

  5. #5
    Join Date
    Oct 2003
    Posts
    706

    Re: Processing of million records

    [i
    I have a cursor and have rowcount result of 1,000,000 rows.
    If i divide 1,000,000 rows into 100,000 and have a result of 10, i want
    to process that 10 reults, meaning 1 have a value of 1 - 100,000 and 2 have
    a value of 100,001 - 200, 000, and 3 have a value of 200,001 - 300,000
    and so on, until I reach 10. which mean i have processed all 1,000,000
    rows.
    Please help me. I want to do this because if i processed all those records (1,000,000) my processing are very very slow. It took 10 hours.
    So i want to cut them into let say 100,000.

    When processing millions of rows, it's a good idea to break the process into a series of transactions each covering only a few thousand rows each. The process should also be designed to be restartable so that if a failure occurs in transaction #152 the processing can be resumed at that point. (What IBM called "checkpoint / restart.")

    It's also a good idea, if you can do it, to turn off indexes for the duration. It is true of any DBMS in the world that if it has to update every single index for every row (as normally it would have to do), it's gonna be doggy-slow. Yet if all those updates are done first and then the computer has to rebuild the index from scratch, that's unexpectedly fast. (It can sort the data and use it to build the B-tree, e.g., in one pass.)
    ChimneySweep(R): fast, automatic
    table repair at a click of the
    mouse! http://www.sundialservices.com

  6. #6
    Join Date
    Oct 2003
    Location
    Bangalore, India
    Posts
    5

    Hi

    Hi All,
    I have a table A and soon after creating it 1million records are put into it.
    Then it being in a real time environment its size goes on increasing atleast
    half million each day.However each day a cleanup application cleans half million
    every day.After inserting 1 million at the creation of table its put in the
    realtime environment.When ever a new record is inserted a contuniously listening
    application detects it and applies a select operation on table A. Then it send
    the selected items to some table B and then updates the table A.
    The listening operation may be 3 or 4 in number.How ever the select and update
    operations are taking a lot of time and the initial insertion of 1 million
    records is taking 11 hours to complete so what can be the possible architecture.
    I tried sending the rowid to a queue/pipe after inserting in table A and then
    used it to select and update.But still its taking a lot of time. Plz suggest
    some method.
    Thanks and regards,
    Sandy..

  7. #7
    Join Date
    Jul 2003
    Posts
    2,296
    ynoel

    - If you run it using the insert, you should make the temp table nologging.
    example:
    alter table pcs_header_temp NOLOGGING;

    - add indexes to the tables you are selecting from for the columns in the where clauses
    ie: create index temp_p on pcs_header (part_no ,prodn_date ) tablespace indx nologging;

    analyze table ...


    - you want to create the temp table as you select into it.
    also, use parallel if you have more than one CPU

    example (you can change column names as you like with "a.part_no as PART_NO"):
    PHP Code:
    create /*+ APPEND */ table pcs_header_temp NOLOGGING
    as 
    SELECT --/*+ parallel (c, 12) */ 
    a.part_no
    a.prodn_date
    a.need
    b.defective_qty
    b.other_requested_qty
    b.supplied_qty,
    b.incoming_qty,
    b.trans_qty_to_whse,
    b.adjusted_whse_stock
    b.computed_whse_stock,
    b.line_invty_adjustment,
    b.computed_line_invty
    b.computed_lacking_qty,
    b.zero_out_qty
    FROM 
    (select 
      d
    .part_no
      
    c.prodn_date
      
    c.need
     from pcfwork c
    parts d
     where 
      c
    .part_no d.part_no
      
    or c.part_no d.art_part_no
      
    or c.part_no d.cust_part_no 
      
    or c.part_no d.supp_part_no
      
    or c.part_no d.cae_part_no a,
    pcs_header b
    where 
    b
    .part_no a.part_no
    and b.prodn_date a.prodn_date
    - The_Duck
    you can lead someone to something but they will never learn anything ...

  8. #8
    Join Date
    Aug 2003
    Posts
    25
    Hi The_Duck,

    THANK YOU VERY MUCH !!! You're great....
    Look at my sql below, it tooks 27 minutes now. Thanks for your help.

    SQLWKS> create index pcs_header_idx1 on pcs_header(part_no, prodn_date)
    2> tablespace indx nologging;
    Statement processed.
    Parse 154.44 (Elapsed) 0.00 (CPU)
    Execute/Fetch 0.01 (Elapsed) 0.00 (CPU)
    Total 154.45 0.00
    SQLWKS> commit
    SQLWKS> ANALYZE TABLE pcs_header COMPUTE STATISTICS;
    Statement processed.
    Parse 217.51 (Elapsed) 0.00 (CPU)
    Execute/Fetch 0.00 (Elapsed) 0.00 (CPU)
    Total 217.51 0.00
    SQLWKS>
    SQLWKS> select num_rows, blocks, empty_blocks from dba_tables
    2> where table_name = 'PCS_HEADER';
    NUM_ROWS BLOCKS EMPTY_BLOC
    ---------- ---------- ----------
    1726461 12267 174
    1 row selected.
    Parse 0.00 (Elapsed) 0.00 (CPU)
    Execute/Fetch 0.23 (Elapsed) 0.00 (CPU)
    Total 0.23 0.00
    SQLWKS>
    SQLWKS> create /*+ APPEND */ table pcs_header_temp NOLOGGING
    2> as
    3> SELECT a.part_no, a.prodn_date, a.need, b.defective_qty, b.other_requested_qty,
    4> b.supplied_qty, b.incoming_qty, b.trans_qty_to_whse, b.adjusted_whse_stock,
    5> b.computed_whse_stock, b.line_invty_adjustment, b.computed_line_invty,
    6> b.computed_lacking_qty, b.zero_out_qty
    7> FROM pcfwork a, pcs_header b, parts d
    8> WHERE a.part_no IN ( d.part_no, d.art_part_no, d.cust_part_no, d.supp_part_no, d.cae_part_no)
    9> AND b.part_no = a.part_no
    10> AND b.prodn_date = a.prodn_date;
    Statement processed.
    Parse 1645.18 (Elapsed) 0.00 (CPU)
    Execute/Fetch 0.00 (Elapsed) 0.00 (CPU)
    Total 1645.18 0.00
    SQLWKS>

    ynoel
    Last edited by ynoel; 10-23-03 at 23:37.

  9. #9
    Join Date
    Jul 2003
    Posts
    2,296
    awesome!
    that is a huge improvement.

    keep in mind you can also put indexes on your subquery
    to help that run faster if you don't already have them.
    It looks like the tables in your subquery are:
    pcfwork parts

    I'll let you work out the details on those indexes.
    - The_Duck
    you can lead someone to something but they will never learn anything ...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •