Results 1 to 11 of 11
  1. #1
    Join Date
    Nov 2008
    Posts
    3

    Unanswered: Fastest way to delete from a large table

    Hi guys,

    I have a problem with deleting from a very large table. The table is a log of some sort with a combined clustering index on the columns type and time. I want to delete all files older than some date. It seems to me that there are two ways to do this:

    1. Create a loop that deletes all rows for each type, using the index already on the table.
    2. Create a new index for time and delete all in one command.

    My question is which is faster? And of course if you know of any other magical way of doing this.

    Thx.

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    #2 won't work I'm afraid unless you are deleting ~3% or less of the table.

    I would go with #1, taking advantage of the clustered index columns.

    Code:
    ...WHERE type = @type AND date <= @date
    If the table is offline for this, it might be quicker to delete all nonclustered indexes first and rebuild them once the job is complete.

    EDIT - corrected error (changed nonclustered to clustered)
    Last edited by pootle flump; 11-03-08 at 04:45.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  3. #3
    Join Date
    Nov 2008
    Posts
    3
    ok, thanks I'm going to go with #1.

  4. #4
    Join Date
    Apr 2007
    Posts
    183
    Keep deletes in small batches if possible to minimize log file growth.

    Code:
    WHILE @@ROWCOUNT > 0
    	DELETE	TOP (1000)
    		f
    	FROM	(
    			SELECT	ROW_NUMBER() OVER (PARTITION BY Type ORDER BY date) AS recID
    			FROM	Table1
    			WHERE	date < @date
    		) AS f

  5. #5
    Join Date
    Nov 2008
    Posts
    3
    I made some tests with the method Peso suggested and my method with creating a cursor to delete each type separate. The cursor way was much faster. I tested it on a table with 6,152,721 rows. With cursor it took 1:04 and with the other 3:39.

    I'll post the code so you can comment:
    First with cursor
    Code:
     
    declare CustList cursor for
    SELECT type from TypeTable
    OPEN CustList
    FETCH NEXT FROM CustList 
    INTO @type
    WHILE @@FETCH_STATUS = 0
    BEGIN
        delete from logTable where type= @type AND time <'11-04-2008'
        FETCH NEXT FROM CustList INTO @type 
    END
    CLOSE CustList
    DEALLOCATE CustList
    Next without cursor
    Code:
    WHILE @@ROWCOUNT > 0
    DELETE	TOP (1000)
    	f
    FROM	(
    		SELECT ROW_NUMBER() OVER (PARTITION BY type ORDER BY time) AS recID
    		FROM	logfloats
    		WHERE time <'11-04-2008'   
    	) AS f

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Yep - I would expect that. If you look at the plans, #1 will seek on the index, #2 will scan. In effect you are batching the deletes in #1 anyway so you get the double whammy of seeking on the index and using Peso's batches.

    The above assumes that type is the leading column in the index.

    Actually Peter - I've reread your query - what is the point of the ROW_NUMBER()? You don't use it...
    Testimonial:
    pootle flump
    ur codings are working excelent.

  7. #7
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    I feel like it's a replacement for an ORDER BY clause?
    George
    Home | Blog

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    ok...so what purpose does imposing order on the delete serve?
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    Top without order is meaningless, and if you were deleting in order of the index it may be faster?!!?/121?


    I don't ruddy know!
    George
    Home | Blog

  10. #10
    Join Date
    Apr 2007
    Posts
    183
    My mistake. I thought you were going to delete duplicates...

    Try this
    Code:
    WHILE @@ROWCOUNT > 0
    	DELETE	TOP (1000)
    	FROM	logFloats
    	WHERE	Time < '11-04-2008'

  11. #11
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by zetkopf
    The table is a log of some sort with a combined clustering index on the columns type and time
    Peter - that is going to crawl if the CI is (type, time) compared to the OP's code. In effect, his code batches the deletes anyway.
    Testimonial:
    pootle flump
    ur codings are working excelent.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •