Results 1 to 10 of 10
  1. #1
    Join Date
    Jan 2003
    Posts
    4,292
    Provided Answers: 5

    Unanswered: Ideas to speed up delete process

    DB2 8.2 (FP 10) on linux.

    I have a table that has nearly 122 million rows. I am testing a "Batch" process that will delete 4.6 million of those rows and then later on will
    delete over 500,000 corresponding rows from a parent table. Right now, I delete 10,000 rows at a time and it is taking just over 5 hours to delete the 4.6 million rows. If I leave the FK to the parent table in place when I delete from the parent table, it takes about 5 hours just to delete 5000 rows. Needless to say, we cannot have a production system processing this for over 100 hours. So I came up with the idea to alter the FK to the parent table to "NOT ENFORCE" before the delete. It now only takes about 30 minutes to delete all 500,000+ rows doing 10,000 at a time. But when I reinstate the FK to "ENFORCE" after the delete, that process takes 7.5 hours. Although this is alot better, I am looking for a better way to speed up both the deleteion of the 4.6 million rows, and reinstating the FK. As this will be moved into production which is a 24/7 operation, I cannot use "NOT LOGGED INITIALLY" as we cannot afford the table to be corrupted if the process fails.

    Any ideas would be greatly appreciated.
    Thanks,
    Andy

  2. #2
    Join Date
    Feb 2004
    Location
    Copenhagen
    Posts
    220

    Arrow Why FK ?

    Hi Andy

    I'n my experience using explicit FK's in enviroments with
    a large amount of data is not a very good idea.

    Maybe you could ask your function or SP to check
    the existence of the foreign value on insert and dispense
    with the FK binding entirely

    Otherwise you may have to divide your table along
    some convenient variable

    Good Luck
    Kristian K. Hansen
    Project Supervisor
    National Board of Health

  3. #3
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650
    Are there indexes in the child table on the FK columns ?

    Sathyaram
    Visit the new-look IDUG Website , register to gain access to the excellent content.

  4. #4
    Join Date
    Jan 2003
    Posts
    4,292
    Provided Answers: 5
    There is one index on the child table. The child table has a many-to-one relationship with the parent table. The index which is the primary key on the child table incorporates the PK of the parent table plus 2 more columns. The index is, in order, the PK of the parent then the 2 other columns. So you would think that with the FK being enforced, that when a rows was deleted from the parent table, it would do an index scan on the child table. But that does not look like the case.

    Tank's idea of not having the FK may be valid for our situation, at least for this batch process. The entire batch takes 15 hours, half of which occurs when the FK is turned back on. I might be able to get away with dropping the FK before
    the batch process, then recreating it when it is complete.

    Andy

  5. #5
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650
    If you intend to drop the FKs, I would still recommend defining 'NOT ENFORCED' FK Constraints ... This will help the optimizer to understand the data relationship ...

    Going back to your original question, how is the data deleted ? Is it a vanilla delete stmt with a where clause or is it more complex ? You may consider optimizing the way the records for delete are selected ?
    Also check the optimizer plan for the DELETE Statement ... You may be able to identify why the index is not being used for the DELETE on the child index


    Sathyaram
    Visit the new-look IDUG Website , register to gain access to the excellent content.

  6. #6
    Join Date
    Jan 2003
    Posts
    4,292
    Provided Answers: 5
    Sathyaram,
    This "Batch" process is really a program that will move a subset of the data from one DB to another. For each table with data that will be moved:
    1) create a global temp table LIKE the table
    2) insert into it the data that will be moved
    3) export the data to IXF
    4) import the data into destination DB
    5) delete the data from table based on temp table.

    The delete process works like this:

    create second temp table like the first
    while first temp table has rows
    {
    insert into secondTempTable select * from firstTempTable FETCH FIRST 10000 rows only
    DELETE from realTable where (every column....) in (select * from secondTempTable)
    DELETE from firstTempTable where (every column...) in (select * from secondTempTable)
    COMMIT
    }


    All the processing is done in the correct order so that referential integrity
    is taken into account.

    Thanks for your help,
    Andy

  7. #7
    Join Date
    Apr 2004
    Posts
    54
    What about using patritioned tables in ESE Viper?

  8. #8
    Join Date
    Jan 2003
    Posts
    4,292
    Provided Answers: 5
    Being these are production databases and Viper is only in beta right now, I do not think that is even an option.

    Andy

  9. #9
    Join Date
    Mar 2005
    Posts
    108
    In my opinion, you don't need the temp tables. This put too much extra work, especially when you have a sub query. Try the following:

    1. export to ... select * from your_table where ... (search records need to be removed);
    2. load to your target table, instead of import;
    3. delete from your_table

  10. #10
    Join Date
    Jan 2003
    Posts
    4,292
    Provided Answers: 5
    This "Batch" process is moving data based on one criterion starting at a particular table and all related data to that data will also be moved. This goes about 10 levels deep in the relational heirarchy.

    1) Using temp tables actually sppeds up this process. If they were not used, I would be having predicates that would contain subselects of many tables working up the relational heirarchy all the way up to the base table.
    [where x in (select y from ... where x in (select y from ...) this can be very tedious and does not perform well.] Instead, by using temp tables, I only have to check the immediate "parent" temp table. I put indexes on the temp table and this goes real fast. The entire export takes less than 1 hour in my tests.
    2) load is not an option since the target tables need to be available to other processes.
    3) The temp tables are necessary. If any data gets added after export but before delete, I do not want to be deleting that data also because it meets the predicate of the export also. I only delete the data that is in the temp tables so that any data added between is left alone.


    I want to thank Tank and Sathyaram for their comments. I have been able to take the delete process from 15 hours to an acceptable 2.75 hours. By making the FKs "NOT ENFORCED", the process is greatly improved.

    Andy

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •