I need to update a large table, about 55 million rows, without filling the transaction log, in the shortest time as possible.
The goal is to alter the table and change the data type for Text column from VARCHAR(7900) to NVARCHAR(MAX).
Since I cannot do it with an ALTER TABLE statement (it would fill up the transaction log) I'm thinking to:
- rename column Text in Text_OLD
- add Text column of type NVARCHAR(MAX)
- copy values in batches from Text_OLD to Text
The table is defined like:
create table DATATEXT(
rID INTEGER NOT NULL,
sID INTEGER NOT NULL,
pID INTEGER NOT NULL,
cID INTEGER NOT NULL,
err TINYINT NOT NULL,
multi TINYINT NOT NULL,
count INTEGER NOT NULL,
repos BIGINT NOT NULL,
Sequence ROWVERSION NOT NULL
CONSTRAINT pk_data PRIMARY KEY( rID, sID, pID, cID )
exec sp_indexoption 'DATATEXT.pk_data', 'AllowPageLocks', FALSE;
CREATE INDEX IX_CID ON DATATEXT( cID );
exec sp_indexoption 'DATATEXT.IX_CID', 'AllowPageLocks', FALSE;
CREATE INDEX IX_Sequence ON DATATEXT( Sequence ) WITH FILLFACTOR = 100;
exec sp_indexoption 'DATATEXT.IX_Sequence', 'AllowPageLocks', FALSE;
I've thought about a stored procedure doing this but I'd like an advice on how to copy values in batch from Text_OLD to Text.
The code I would start with (doing just this part) is the following, but maybe there are more efficient ways to do it, or at least there's a better way to select @startSeq in the WHILE loop (avoiding to select a bunch of 100000 sequences and later selecting the max).
declare @startSeq timestamp
declare @lastSeq timestamp
select @lastSeq = MAX(sequence) from [DATATEXT] where [Text] is null
select @startSeq = MIN(Sequence) FROM [DATATEXT] where [Text]is null
BEGIN TRANSACTION T1
WHILE @startSeq < @lastSeq
with sequenceBunch as (
select top 100000 Sequence from [DATATEXT] where [Text] is null order by Sequence)
select @startSeq = MAX(Sequence) from sequenceBunch
UPDATE [DATATEXT] SET [Text]=[Text_OLD] where Sequence < @startSeq
COMMIT TRANSACTION T1
If you can, set the recovery mode to simple as this will log less information than other options.
Also, if you can, periodically truncate the log (when in simple mode) so as not to fill it.
It doesn't make a vast difference but I think I would add the new column with a different name, rather than renaming the existing column first. This is only because a rollback might be easier (don't rock the boat until you really have to).
You are right to do the operation in batches. However I might change your loop logic slightly (again, this is mostly personal preference)
SET NOCOUNT ON;
DECLARE @rc int = 937;
WHILE @rc > 0
UPDATE TOP (10000) DATATEXT
SET Text_New = "Text"
WHERE Text_New IS NULL
AND "Text" IS NOT NULL
SET @rc = @@RowCount;