Results 1 to 6 of 6
  1. #1
    Join Date
    Mar 2009
    Posts
    27

    Unanswered: Updating 99714810 records without Set Rowcount?

    OK, so you guys have been a great resource in the past and I'm a little in over my head again...

    I've got a table (Table A) that currently has 99,714,810 records in it since 1/2010 and will continue to grow until June. And I've been told to do an update on it weekly! YEOUCH!

    I'll be pushing something like 40-100K records (Table B) to this server and then using them to update a column on Table A. Table B has an clustered index on the column used for the join.

    Table A is not under my control and has only a non-clustered, non-unique index on the column used in the join. There is no clustered index on Table A. Adding one is not an option.

    Running a straight update took 11.5 hours.

    What was recommended was using the following:

    set rowcount 1000000
    select 1
    while @@rowcount > 0
    begin
    -- Update CDI_RID on rows that have a match
    UPDATE A
    SET A.CDI_RID = B.CDI_RID
    FROM dbo.TableB B
    JOIN .dbo.TableA A
    ON A.ContactHistoryID = B.ContactHistoryID
    WHERE A.CDI_RID is NULL
    End

    This did make things run much faster. However, SET ROWCOUNT is not supported in 2008. So, I rewrote this as represented below, but then it took 9 hours again. Recommendations? What am I missing here? Transactions?

    DECLARE @Tbl_MaxID BIGINT
    DECLARE @Loop_MinID BIGINT
    DECLARE @Loop_MaxID BIGINT

    BEGIN

    SET NOCOUNT ON;

    SET @Tbl_MaxID = (SELECT MAX(ContactHistoryID) FROM dbo.TableA)
    SET @Loop_MinID = (SELECT MIN(ContactHistoryID) FROM dbo.TableA)
    SET @Loop_MaxID = @Loop_MinID + 500000

    -- Update records in batches of 500000
    WHILE (@Tbl_MaxID >= @Loop_MinID)
    BEGIN

    -- Update CDI_RID on rows that have a match
    UPDATE A
    SET A.CDI_RID = B.CDI_RID
    FROM dbo.TableB B
    INNER JOIN (SELECT *
    FROM dbo.TableA
    WHERE ContactHistoryID
    BETWEEN @Loop_MinID AND @Loop_MaxID) AS A
    ON A.ContactHistoryID = B.ContactHistoryID
    WHERE CH.CDI_RID IS NULL;

    -- Bump up row set
    SET @Loop_MinID = @Loop_MaxID + 1
    SET @Loop_MaxID = @Loop_MaxID + 500001

    END

  2. #2
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    If you simply did:
    Code:
    UPDATE A
    SET A.CDI_RID = B.CDI_RID
    FROM dbo.TableB B
    where a.ContactHistoryID = b.ContactHistoryID
    what do you come up with for an estimated showplan? Does the optimizer want to do a scan on table A?

    Supposing it does, I would try something along the lines of
    Code:
    declare @SomeMaxID int (?)
    set @SomeMaxID = 50000
    
    while (exists (select * from Table B))
      begin
        UPDATE A
        SET A.CDI_RID = B.CDI_RID
        FROM (select * from dbo.TableB where ContactHistoryID < @SomeMaxID) B
        where a.ContactHistoryID = b.ContactHistoryID
    
       delete tableB where ContactHistoryID < @SomeMaxID
    
       set @SomeMaxID = @SomeMaxID + 50000
      end
    See if you can get the execution plan for the individual updates. If even one tablescan sneaks in, then it will likely take 9 hours for just that one tablescan.

    EDIT: Another thing to try would be to make sure that statistics have been updated on TableB before the update, so the optimizer is assured of getting the correct distribution of values.

  3. #3
    Join Date
    Mar 2009
    Posts
    27
    What I have done is wrap the update statement in a transaction. Using sets of 500K rows for each transaction runs about 3 minutes per cycle--which adds up to about 9 hours to work through all the rows. Since it can cycle that quickly, I'm guessing it is not doing a table scan of the big table, TableA.

    Unfortunately, Table B is the small table with the clustered index--and it would not work to delete rows from that table as part of this process. Not all rows in Table B will be in Table A, er, yet. So I have to know which rows were updated and then lug that back.

  4. #4
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    Does either the small update, or the larger update use the index on the big table?

  5. #5
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    How many rows do you have "WHERE A.CDI_RID is NULL"? If you have a maintenance window when you can kick users off, you could move the qualifying rows into a temporary table, update them there, then bulk insert them back into A...
    ---
    "It does not work" is not a valid problem statement.

  6. #6
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    You could also try using MERGE instead of UPDATE - see if it comes up with a better plan.
    ---
    "It does not work" is not a valid problem statement.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •