Results 1 to 4 of 4
  1. #1
    Join Date
    Oct 2003

    Unanswered: How do I speed up an UPDATE?

    I have a single UPDATE statement that has been running for 13+ hours and I have no idea of when it will complete.

    Recipients has 80 million records
    UpdatedStagingRecipients has 34 million records.

    Why would this possibly take so long? Is there anything that I can do at all?

    UPDATE Recipients
    	SET Recipients.First = UpdatedStagingRecipients.First
    	, Recipients.Last = UpdatedStagingRecipients.Last
    	, Recipients.StreetAddress = UpdatedStagingRecipients.StreetAddress
    	, Recipients.City = UpdatedStagingRecipients.City
    	, Recipients.State = UpdatedStagingRecipients.State
    	, Recipients.Postal = UpdatedStagingRecipients.Postal
    	, Recipients.Country = UpdatedStagingRecipients.Country
    	, Recipients.DOB = UpdatedStagingRecipients.DOB
    	, Recipients.Obscene = UpdatedStagingRecipients.Obscene
    	, Recipients.Gender = UpdatedStagingRecipients.Gender
    	, Recipients.IPv4 = UpdatedStagingRecipients.IPv4
    	, Recipients.NameSourceID = UpdatedStagingRecipients.NameSourceID
    	, Recipients.NameLine = UpdatedStagingRecipients.NameLine
    	, Recipients.AddressSourceID = UpdatedStagingRecipients.AddressSourceID
    	, Recipients.AddressLine = UpdatedStagingRecipients.AddressLine
    	, Recipients.RecordCreationSourceID = UpdatedStagingRecipients.RecordCreationSourceID
    FROM Recipients INNER JOIN UpdatedStagingRecipients ON (Recipients.UserName = UpdatedStagingRecipients.UserName AND Recipients.DomainID = UpdatedStagingRecipients.DomainID)
    TRUNCATE TABLE UpdatedStagingRecipients

  2. #2
    Join Date
    Jun 2003
    DO you have indexes on both the tables on (DomainId + Username) ?

  3. #3
    Join Date
    Feb 2004
    In front of the computer
    Provided Answers: 54
    Are any of the column not NULL-able?

    Is the distribution of values for IPV4 relatively even? Could that column be indexed?

    Is the distribution for Last relatively even? Could it be indexed?

    The biggest problem that I see is even if the update averages processing 500 rows per second (implying very stout hardware), that means it will process (at most) 1.8 million rows per hour. For 80 million updates, you'll need 45 hours!

    If you can use a non-NULL-able column with reasonably even distribution of values, you can cut the job into smaller pieces. The mere act of cutting the job up will add overhead (making it take even longer), but it will give you feedback on progress and the much better ability to manage the process. The reduction of load on the log will also help speed the process along too. The net result of splitting the job into pieces should be to make it easier to manage, and possibly faster overall.


  4. #4
    Join Date
    Oct 2003
    Some suggestions:

    (1) If possible, do the work as a series of UPDATEs, not just one, and COMMIT the transaction between each unit. You might run 37,000 transactions each one of which only occupies 1,000 rows' worth of undo-information. And if the process dies after finishing 36,990 of them, you know exactly where and how to resume. (And you can "resume.")

    (2) Consider locking the table for exclusive access during the update. Almost 50% of the rows are going to be updated. The DBMS may be writing changes back to disk wastefully.

    (3) Sort the input table in ascending order. Don't make the computer do a JOIN; prepare the input in a separate step.

    (4) Drop or disable all of the indexes you can. It is much faster to rebuild an index on 80 million records once, after all updates are complete, than it is to update that index 37 million times!
    ChimneySweep(R): fast, automatic
    table repair at a click of the

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts