Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    Join Date
    Nov 2007
    Posts
    265

    Unanswered: Best way to delete a huge number of rows?

    Hi,

    What's the best way to delete a substantial number of rows from a table? This particular table is HUGE so I can't export, drop and import the necessary data like I have done for other tables in our db. Is the ideal way to just create a stored procedure with startdate, enddate parameters.. pass these params to the delete command and then COMMIT;? Or would you recommend some other method? Thanks!!

  2. #2
    Join Date
    Aug 2008
    Location
    Toronto, Canada
    Posts
    2,368
    Can you export what you want to keep and then load/replace? Load builds pages and writes them to the table so it's fast plus very minimal logging

  3. #3
    Join Date
    Aug 2008
    Location
    Toronto, Canada
    Posts
    2,368

  4. #4
    Join Date
    Nov 2007
    Posts
    265
    Quote Originally Posted by db2girl View Post
    Can you export what you want to keep and then load/replace? Load builds pages and writes them to the table so it's fast plus very minimal logging
    Unfortunately, I cannot since this table is humongous.. deleting rows is the only way to go..

  5. #5
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by db2user24 View Post
    Unfortunately, I cannot since this table is humongous.. deleting rows is the only way to go..
    I don't see how this is related to the table size (by the way, "humongous" means different things to different people...) Would you rather run your delete SP for a couple of months?
    ---
    "It does not work" is not a valid problem statement.

  6. #6
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    This has been discussed several times.

    I have written stored procedures to do deletes on very large tables. The stored procedure has input parameters to specify how many minutes the SP would run, or a certain number of rows to be deleted, before the SP ended so that I could run it during non-prime-time hours on a scheduled basis for as many nights as it took to delete all the required rows (based on another input criteria such as date range).

    The stored procedure opens a cursor based on the criteria to be deleted. The Cursor is defined using WITH HOLD option, so the cursor is not closed when a commit is issued. The cursor is also defined WITH UR to minimize an lock contention.

    The SP fetches rows from the cursor and if the row is to be deleted, then the SP issues a separate delete based on the PK. Do not use "delete where current of cursor".

    After a certain number of deletes (as specified by another input parameter) the SP does a commit. This releases locks on any rows that have been deleted. I usually do somewhere between 100 and 1000 deletes in a single commit interval. Committing less often will make the SP only slightly faster and could degrade concurrency. Even if concurrency is not a big issue, I would not go higher than 5000 deletes in a commit interval. Make sure you have one final commit at the end of your SP which may be necessary depending on your SP logic.

    Also make sure you are doing archive log maintenance on a regular basis so that the archive log path does not fill up.

    I would set the following environment variable to increase concurrency:

    db2set DB2_SKIPDELETED=YES (you will need to restart instance)
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  7. #7
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    If you plan to take a backup after the DELETE operation, you could also turn on NOT LOGGED INITIALLY for the table and cut short on the logging overhead.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  8. #8
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by stolze View Post
    If you plan to take a backup after the DELETE operation, you could also turn on NOT LOGGED INITIALLY for the table and cut short on the logging overhead.
    Given that the table in question is apparently extremely large, make sure you update your resume before you try this.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  9. #9
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by Marcus_A View Post
    Given that the table in question is apparently extremely large, make sure you update your resume before you try this.
    Probably you are right. If something goes wrong during the delete, you're in deep trouble restoring the data again.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  10. #10
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    And this option
    Code:
    ALTER TABLE EMPLOYEE LOCKSIZE TABLE
    is that not relevant for this issue. Also activate "infinite loggin" (logsecond := -1) and go-and-delete.
    Just hope that you do not have to rollback after a couple of hours

  11. #11
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by dr_te_z View Post
    And this option
    Code:
    ALTER TABLE EMPLOYEE LOCKSIZE TABLE
    is that not relevant for this issue. Also activate "infinite loggin" (logsecond := -1) and go-and-delete.
    Just hope that you do not have to rollback after a couple of hours
    You would only want to lock the table if no one else is using it. Besides, DB2 will escalate to table lock automatically if it runs out of memory for row locks.

    It seems to me that infinite logging only works if you have enough disk space to hold all the logs for a single transaction. That seems unlikely in the case of the OP. Maybe I am missing something? Also, if this is an HADR database, infinite logging is not allowed.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  12. #12
    Join Date
    Jan 2009
    Location
    Zoetermeer, Holland
    Posts
    746
    Quote Originally Posted by Marcus_A View Post
    DB2 will escalate to table lock automatically if it runs out of memory for row locks.
    That's true, you know it is going to happen, so why not set it yourself and save all the initial locking overhead?

    Quote Originally Posted by Marcus_A View Post
    infinite logging only works if you have enough disk space to hold all the logs for a single transaction.
    Also true but the alternative is calculating the right number of secondary logs. When that is not big enough, you will have room on disk but the transaction will fail afterall.When you set that big enough, the existing problem of having enough disk to hold all those logs is still valid, so... I like infinite logging .

  13. #13
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by dr_te_z View Post
    That's true, you know it is going to happen, so why not set it yourself and save all the initial locking overhead?

    Also true but the alternative is calculating the right number of secondary logs. When that is not big enough, you will have room on disk but the transaction will fail afterall.When you set that big enough, the existing problem of having enough disk to hold all those logs is still valid, so... I like infinite logging .
    None of those is the proper solution IMO. The best way is to write a SP as I described above.

    You are assuming that no one else needs to use the database while the deletes are being done, and I was not assuming that. I don't know which is the actual situation for the OP.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  14. #14
    Join Date
    Nov 2007
    Posts
    265
    Thanks Marcus, you are right in your assumption...I would prefer to have something running during off peak hours so it doesn't affect the performance of our app during regular business hours..sproc sounds like it's the way to go..

  15. #15
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650
    An alternative to the cursor approach suggested by Marcus, you can also use

    Code:
    delete from table2 where pkcol in (select pkcol from table2 fetch first 1000 rows only where time between t2 and t2)
    Visit the new-look IDUG Website , register to gain access to the excellent content.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •