Results 1 to 12 of 12
  1. #1
    Join Date
    Oct 2003
    Posts
    268

    Unanswered: Tuning an "UPDATE" statement on a large data set?

    I'm updating the name data in a large user database with the following UPDATE statement. The staging table was bulk loaded from a flat file and contains 10 million records. The production table (Recipients) contains 15 million records. This worked correctly but this single update statement took an entire ten hours to run which is way too long. While it was running the server was clearly 100% disk bound. CPU activity was near nothing. We've just upgraded RAM from 1GB to 2GB but we expect data sizes to grow significantly and we can't keep adding RAM. Absolutely nothing else is running on this server. Any ideas how I can optimize this?

    UPDATE Recipients
    SET [First] = Stages.[First]
    , [Last] = Stages.[Last]
    FROM
    Stages
    INNER JOIN Recipients ON
    (Stages.UserName = Recipients.UserName
    AND Stages.DomainID = Recipients.DomainID)
    WHERE
    (CASE WHEN Stages.[First] IS NULL THEN 1 ELSE 0 END
    + CASE WHEN Stages.[Last] IS NULL THEN 1 ELSE 0 END)
    <=
    (CASE WHEN Recipients.[First] IS NULL THEN 1 ELSE 0 END
    + CASE WHEN Recipients.[Last] IS NULL THEN 1 ELSE 0 END)

    Text execution plan. I've made small annotations with the % information from the graphical execution plan:

    |--Clustered Index Update(OBJECT[Recipients].[dbo].[Recipients].[PK_Recipients]), SET[Recipients].[First]=[Stages].[First], [Recipients].[Last]=[Stages].[Last]))
    |--Top(ROWCOUNT est 0)
    |--Sort(DISTINCT ORDER BY[Bmk1000] ASC))
    14% |--Merge Join(Inner Join, MANY-TO-MANY MERGE[Stages].[DomainID], [Stages].[UserName])=([Recipients].[DomainID], [Recipients].[UserName]), RESIDUAL([Recipients].[UserName]=[Stages].[UserName] AND [Recipients].[DomainID]=[Stages].[Domain
    25% |--Clustered Index Scan(OBJECT[Recipients].[dbo].[Stages].[IX_Stages]), ORDERED FORWARD)
    61% |--Clustered Index Scan(OBJECT[Recipients].[dbo].[Recipients].[PK_Recipients]), ORDERED FORWARD)

    Everything I've heard on the subject suggests you change the index scans to index seeks. How do I do this?

    Any other tuning advice is greatly appreciated.

    Here are the exact statements I used to create the tables:

    CREATE TABLE Recipients (
    ID INT IDENTITY (1, 1) NOT NULL,
    UserName VARCHAR (50) NOT NULL,
    DomainID INT NOT NULL,
    First VARCHAR (24) NULL,
    Last VARCHAR (24) NULL,
    StreetAddress VARCHAR (32) NULL,
    City VARCHAR (24) NULL,
    State VARCHAR (16) NULL,
    Postal VARCHAR (10) NULL,
    SourceID INT NULL,

    CONSTRAINT PK_Recipients PRIMARY KEY CLUSTERED (DomainID, UserName)
    )

    CREATE TABLE Stages (
    ID INT NULL,
    UserName VARCHAR(50) NOT NULL,
    DomainID INT NULL,
    Domain VARCHAR(50) NOT NULL,
    First VARCHAR(24) NULL,
    Last VARCHAR(24) NULL,
    StreetAddress VARCHAR(32) NULL,
    City VARCHAR(24) NULL,
    State VARCHAR(24) NULL,
    Postal VARCHAR(10) NULL
    )
    CREATE CLUSTERED INDEX IX_Stages ON Stages (DomainID, UserName)

  2. #2
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    My guess is the optimizer choose the merge join because it'll most effecient...

    And I guess it would be right...

    You are doing 10 million....seeks out to 15 million would probably be worse...

    And because there's so much activity...it paging heavily in and out...

    Did you look at the virtual memory on the box?

    Betcha it wa HUGE...

    I would break it up in to smaller update sets in a loop

    Use UserID or name and determine an even distribution of data...say at 1 million each

    The BEGIN and COMMIT Each block

    You'll free up a lot of resources...trans log, memory, paging, ect

    Did you set you're trans log to unlimited growth? What's it at now?

    Hell...use to blow out trans logs in 6.5 all the time....
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  3. #3
    Join Date
    Oct 2003
    Posts
    268
    The transaction log is at 2618 MB. It is allowed unlimited growth. Is this a problem? Is this consuming RAM?

    I'm not using transactions at all so I don't see why I would use COMMIT or why that would help.

    Of course, Virtual Memory is maxed out. Even after upgrading to 2GB, SQL Server is using everything.

    Are you saying that changing an "index scan" to an "index seek" is a bad idea? I would like to try that but I can't figure out how; the correct indexes seem to be already in place.

  4. #4
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Of course it's using the transaction log...and a 2.6 gig log is pretty darn big

    What if the statement failed? It would have to roll back and it use the logs for that

    What's your recovery model set to?

    When did you l;ast take a back up (and if not simple, how often do you dump the logs)?

    I guess you could try and Force an index hint...and in this case yes, I think index scan is appropriate over seek...because of the merge join..

    It's basuicalluy slamming your 2 tables side by side and then scanning row by row for matches...because of the volume I think it's the right thing to do...

    It's gonna scan the 10 million anyway...it has too...

    So would you rather go
    row 1 scan 15 million
    row 2 scan 15 million
    row 3 scan 15 million
    row 4 scan 15 million
    ...
    row 10,000,000 san 15 million

    OR

    Slam table 1 next 2 table table in key order
    Scan

    I'm just saying its a lot of data to slam together and I would think if you broke it up in to batches it would be better...

    Also you need to take in to the fact that you're probabluy incurring obverhead by having the need to grow the tranny log. Whats the percentage growth?

    If it's small it'll be doing that all day...

    You need to manage the log better.....
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  5. #5
    Join Date
    Oct 2003
    Posts
    268
    The transaction growth is set to 10% with an unlimited max. I don't care too much about the transaction log (it's used for rollbacks and recoveries). There is plenty of disk space so that isn't a huge concern; but RAM is of course. What can I do to minimize the RAM that the transaction log consumes.

    Quote Originally Posted by Brett Kaiser
    Of course it's using the transaction log...and a 2.6 gig log is pretty darn big

    Also you need to take in to the fact that you're probabluy incurring obverhead by having the need to grow the tranny log. Whats the percentage growth?

    If it's small it'll be doing that all day...

    You need to manage the log better.....

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Backup up your database, and then turn off logging prior to running your process.

    Also consider breaking your process into several queries that independently address the various possible criteria combinations, such as this one:

    UPDATE Recipients
    SET [First] = Stages.[First], [Last] = Stages.[Last]
    FROM Stages
    INNER JOIN Recipients ON
    Stages.UserName = Recipients.UserName
    AND Stages.DomainID = Recipients.DomainID
    WHERE
    Stages.[First] IS Null
    and Recipients.[First] is not null
    and Recipients.[Last] is not null

    These may run faster than trying to do the whole thing all at once with that non-sargable WHERE clause.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    Quote Originally Posted by blindman
    and then turn off logging prior to running your process.
    What the heck are you talking about?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  8. #8
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Silly me. I was thinking of old-time SQL Server, where you could actually CONTROL your environment. (Sigh of nostalgia...)
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  9. #9
    Join Date
    Oct 2003
    Posts
    268
    Quote Originally Posted by Brett Kaiser
    You need to manage the log better.....
    So what can I do to manage the log better?

  10. #10
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    What's you're backup strategy?

    And your recovery model?

    As far as managing the log, You want to perform your update in batches...

    and then COMMIT the batch...what you're doing is causing the log to get huge....and perhaps tempdb...

    with smaller sets of updates committed the log and tempdb won't get so huge...

    You need to do it in a loop

    Unserstand?

    You might want to set it up to dump the log after every update as well...
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  11. #11
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    you know what might be the best option?

    Create a view of the join, select all of the columns


    Backup the database
    bcp out the view
    TRUNCATE The Table
    bcp in the data file

    Should be the best bet....

    Those are all minimally logged...
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  12. #12
    Join Date
    Oct 2003
    Posts
    268
    I've revised the process so rather than importing/merging lots of little files individually, I'm going to import ALL files, and then do one huge merge dudupe process. This should work...

    Thanks for the feedback, you two!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •