Results 1 to 13 of 13
  1. #1
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10

    Unanswered: Best way to Delete Records from a Massive Table (heap)

    So I've stumbled across an audit table on one of our systems that has reached a hearty 180M rows in size.

    The table is a heap (no indexes whatsoever).

    Each record has a datetime value indicating when it was created.

    I need to delete everything that was created prior to the last 6 months; what is my best plan of attack?
    George
    Home | Blog

  2. #2
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    I've thought of a few options, but none without their downsides...
    1. Simple batched delete based on the datetime column value
    2. Create a clustered index on the datetime column, then perform a batched delete process
    3. Select last 6 months data in to temp table, truncate main table and then insert from temp
    4. As above but with a permanent table

    Downsides
    1. Each batch is going to take a good old while to complete due to lack of an index
    2. Index creation is going to be very time consuming. Could potentially build in tempdb to lower impact?
    3. Initial select could (read: probably will) take more time than the window of opportunity allows
    4. as above but with a lesser impact on tempdb?

    I think option 1 might be best as I can just do a few batches a night until we reduce the problem enough.

    Either way, none of this is going to be a quick process. In theory I can have some downtime (ideally schedule the task as an out of hours job), but our window of opportunity is only 2-3 hours a day.
    George
    Home | Blog

  3. #3
    Join Date
    Sep 2006
    Location
    Surrey, UK
    Posts
    995
    Provided Answers: 2
    If you're going to have to do this again in the future, I'd suggest creating an index on this table during one window of downtime. Then I'd probably look at creating a table to hold a cut-off date, and scheduling a job to purge records from the audit table that were created before this date. You could either amend this date manually, or use another job to bring it forward by a week each week (or whatever period suits you).
    10% of magic is knowing something that no-one else does. The rest is misdirection.

  4. #4
    Join Date
    Aug 2004
    Location
    Dallas, Texas
    Posts
    831
    Make another table with the same structure, move (copy) six months data and truncate source table then repopulate source from temp table. Create a nightly purge job and keep the table down to six months audit data.

  5. #5
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Going forward I was planning on using a moving window of 6 months and schedule a cleanup job over each weekend (or more frequently if appropriate).

    Speaking of which; I may be able to wangle more downtime over a weekend...
    George
    Home | Blog

  6. #6
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Quote Originally Posted by corncrowe View Post
    Make another table with the same structure, move (copy) six months data and truncate source table then repopulate source from temp table. Create a nightly purge job and keep the table down to six months audit data.
    The following query has been running for half an hour without result...
    Code:
    SELECT Count(*)
    FROM   dbo.some_log (NOLOCK)
    WHERE  datecreated >= '2013-01-01'
    Perhaps I need to batch the copy to the second table?
    George
    Home | Blog

  7. #7
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    I am with corncrowe with one or two little edits. A heap that sized is going to have data scattered all over the disk which is the cause of your current difficulties. Any new table you move the data into ought to be clustered, and I would use relative small set based batches of 5 or 10 thousand records. Just because your selects are just going to drag and drag.

    Is the table partitioned? I would hope so for something of that size. If so, you could create a new table and move all of the data into the new table really fast with Partition Switching, then you could cluster the original table and only move the data back into the original table that you want in small set based batches ran against the monster heap.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  8. #8
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Loop through it, deleted 10000 records at a time.
    Then put a nightly batch job on it to deleted old records regularly.
    You don't want to run one massive delete, because that will lock the table up and possibly lock up the application that is creating the audit records.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  9. #9
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    <aside>
    New job, unfamiliar with this system, only became aware of this problem when things started falling over
    </aside>

    Done some digging. We currently only have 4 months of data in this table with a couple of erroneous older records which implies to me that there's already something [potentially] tidying this stuff up.

    This isn't a massive company and this system doesn't have *that* much activity i.e. the number of records in this table doesn't match the activity I expect at all.

    I think there's a bigger underlying problem here that I need to look in to first.


    Cheers for all the replies; I will crack on and get back soon
    George
    Home | Blog

  10. #10
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Call me old fashioned, but if I can take the table offline then I would:
    1. Export data to text file
    2. Extract "interesting" rows using a tool like awk or PowerShell
    3. Sort extracted data outside of SQL
    4. Truncate table
    5. Load extract file into table
    If I had to leave the table online while "thinning" it:
    1. Open a fast-forward cursor on SELECT * FROM nastyTable
    2. While valid rows
    3. If it ain't interesting, delete it
    4. If necessary, COMMIT to manage locking
    5. Fetch another row
    6. end of while loop
    7. Close up the cursor
    The advantage to this is that you can let it run for hours (if your log can tolerate that).

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Quote Originally Posted by gvee View Post
    This isn't a massive company and this system doesn't have *that* much activity i.e. the number of records in this table doesn't match the activity I expect at all.

    I think there's a bigger underlying problem here that I need to look in to first.
    I'm dealing with a situation where the vendor, when updating say, 20 columns in a table, executes 20 separate UPDATE statements.
    While looping through the records one at a time.
    You can see how this would play havoc with any delta auditing.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Uff-da, your case (twenty separate updates on a single row) is something that developers seem to love, but DBAs usually hate!

    I've seen cases where thye DBA is handed a pile of stuff like gvee has, and a cursor is the slow boat to china that gets the job done without bringing the whole system to its knees. It is very ugly, but it can dodge the bullets associated with both row locking and massive updates that would cripple a "zero down time" system.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  13. #13
    Join Date
    Aug 2004
    Location
    Dallas, Texas
    Posts
    831
    Quote Originally Posted by blindman View Post
    Loop through it, deleted 10000 records at a time.
    Then put a nightly batch job on it to deleted old records regularly.
    You don't want to run one massive delete, because that will lock the table up and possibly lock up the application that is creating the audit records.
    Need to follow-up with a backup and shrink log files or they will get really big and performance may suffer.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •