Results 1 to 11 of 11

Thread: T-SQL Cursors?

  1. #1
    Join Date
    Oct 2003
    Posts
    268

    Unanswered: T-SQL Cursors?

    I have a stored proc that merges records from an undeduped table to a deduped table. It is running really slowly. Merging 70 million records against a deduped 70 million is taking 115 hours and counting on decent hardware. Very slow.

    I suspect there is significant room for optimization here. Some of my ideas:

    - Write/update back to the cursor rather than executing separate UPDATE statements.
    - Try a dynamic cursor instead of a READ ONLY cursor.
    - Dump new elements to a separate table or a file and then do a single large INSERT.

    Anyone else think any of these ideas will work? Can anyone think of something better?

    BTW, I've tried to replaced the procedural cursor code with set based UPDATES/INSERTS but the short version of the story is that that route just didn't pan out. I know that is very common optimization advice.

    I've made minor simplifications to the code:
    - Took out code to handle last line
    - Took out progress logging code
    - Removed some DECLARE statements. These are needed to run but it should be obvious what they were supposed to be.

  2. #2
    Join Date
    Oct 2003
    Posts
    268
    Code:
    CREATE PROCEDURE mergeStagingIntoRecipients AS
    
    -- Large performance savings on operations with many INSERT/UPDATE statements
    SET NOCOUNT ON
    
    -- Variables that hold the merged and deduped data for the current UserNmae/DomainID combination.
    DECLARE @mergedEmailUser VARCHAR(50)
    DECLARE @mergedDomainID INT
    DECLARE @mergedExistingDomainID INT
    DECLARE @mergedFirstName VARCHAR(24)
    DECLARE @mergedLastName VARCHAR(24)
    DECLARE @mergedStreetAddress VARCHAR(32)
    DECLARE @mergedCity VARCHAR(24)
    DECLARE @mergedState VARCHAR(24)
    DECLARE @mergedPostal VARCHAR(10)
    DECLARE @mergedCountry VARCHAR(16)
    DECLARE @mergedDob SMALLDATETIME
    DECLARE @mergedObscene BIT
    DECLARE @mergedGender CHAR(1)
    DECLARE @mergedIPv4 INT
    DECLARE @mergedNameSourceID INT
    DECLARE @mergedNameSourceLine INT
    DECLARE @mergedAddressSourceID INT
    DECLARE @mergedAddressSourceLine INT
    DECLARE @mergedRecordCreationSourceID INT
    DECLARE @mergedNumNameNULLs INT
    DECLARE @mergedNumAddressNULLs INT
    
    -- Variables to hold the data read from the cursor
    DECLARE @rowEmailUser VARCHAR(50)
    DECLARE @rowDomainID INT
    DECLARE @rowFirstName VARCHAR(24)
    DECLARE @rowLastName VARCHAR(24)
    DECLARE @rowStreetAddress VARCHAR(32)
    DECLARE @rowCity VARCHAR(24)
    DECLARE @rowState VARCHAR(24)
    DECLARE @rowPostal VARCHAR(10)
    DECLARE @rowCountry VARCHAR(16)
    DECLARE @rowDob SMALLDATETIME
    DECLARE @rowObscene BIT
    DECLARE @rowGender CHAR(1)
    DECLARE @rowIPv4 INT
    DECLARE @rowSourceID INT
    DECLARE @rowSourceLine INT
    DECLARE @rowNumNameNULLs INT
    DECLARE @rowNumAddressNULLs INT
    
    DECLARE @existingDomainID INT
    -- Omitted other @existing... DECLARE statements for brevity (over 10K posting limit)
    
    DECLARE stagingCursor CURSOR READ_ONLY FOR
    	SELECT
    		Stages.UserName
    		, Stages.DomainID
    		, Stages.First
    		, Stages.Last
    		, Stages.StreetAddress
    		, Stages.City
    		, Stages.State
    		, Stages.Postal
    		, Stages.Country
    		, Stages.DOB
    		, Stages.Obscene
    		, Stages.Gender
    		, Stages.IPv4
    		, Stages.SourceID
    		, Stages.Line
    		, 	(CASE WHEN Stages.[First] IS NULL THEN 1 ELSE 0 END
    			+ CASE WHEN Stages.[Last] IS NULL THEN 1 ELSE 0 END) AS NumNameNULLs
    		,	(CASE WHEN Stages.[StreetAddress] IS NULL THEN 1 ELSE 0 END
    			+ CASE WHEN Stages.[City] IS NULL THEN 1 ELSE 0 END
    			+ CASE WHEN Stages.[State] IS NULL THEN 1 ELSE 0 END
    			+ CASE WHEN Stages.[Postal] IS NULL THEN 1 ELSE 0 END
    			+ CASE WHEN Stages.[Country] IS NULL THEN 1 ELSE 0 END) AS NumAddressNULLs
    		, Recipients.DomainID AS ExistingDomainID
    		, Recipients.First AS ExistingFirst
    		, Recipients.Last AS ExistingLast
    		, Recipients.StreetAddress AS ExistingStreetAddress
    		, Recipients.City AS ExistingCity
    		, Recipients.State AS ExistingState
    		, Recipients.Postal AS ExistingPostal
    		, Recipients.Country AS ExistingCountry
    		, Recipients.DOB AS ExistingDOB
    		, Recipients.Obscene AS ExistingObscene
    		, Recipients.Gender AS ExistingGender
    		, Recipients.IPv4 AS ExistingIPv4
    		, Recipients.NameSourceID AS ExistingNameSourceID
    		, Recipients.NameLine AS existingNameSourceLine
    		, Recipients.AddressSourceID AS ExistingAddressSourceID
    		, Recipients.AddressLine AS existingAddressSourceLine
    		, Recipients.RecordCreationSourceID AS ExistingRecordCreationSourceID
    		, 	(CASE WHEN Recipients.[First] IS NULL THEN 1 ELSE 0 END
    			+ CASE WHEN Recipients.[Last] IS NULL THEN 1 ELSE 0 END) AS ExistingNumNameNULLs
    		,	(CASE WHEN Recipients.[StreetAddress] IS NULL THEN 1 ELSE 0 END
    			+ CASE WHEN Recipients.[City] IS NULL THEN 1 ELSE 0 END
    			+ CASE WHEN Recipients.[State] IS NULL THEN 1 ELSE 0 END
    			+ CASE WHEN Recipients.[Postal] IS NULL THEN 1 ELSE 0 END
    			+ CASE WHEN Recipients.[Country] IS NULL THEN 1 ELSE 0 END) AS ExistingNumAddressNULLs
    	FROM Stages
    	LEFT OUTER JOIN Recipients ON (Stages.UserName = Recipients.UserName AND Stages.DomainID = Recipients.DomainID)
    	ORDER BY Stages.UserName, Stages.DomainID
    
    OPEN stagingCursor
    FETCH NEXT FROM stagingCursor INTO
    	@rowEmailUser, @rowDomainID, @rowFirstName, @rowLastName, @rowStreetAddress, @rowCity, @rowState, @rowPostal, @rowCountry
    	, @rowDob, @rowObscene, @rowGender, @rowIPv4, @rowSourceID, @rowSourceLine, @rowNumNameNULLs, @rowNumAddressNULLs
    	, @existingDomainID, @existingFirstName, @existingLastName, @existingStreetAddress, @existingCity, @existingState, @existingPostal, @existingCountry
    	, @existingDob, @existingObscene, @existingGender, @existingIPv4
    	, @existingNameSourceID, @existingNameSourceLine, @existingAddressSourceID, @existingAddressSourceLine, @existingRecordCreationSourceID, @existingNumNameNULLs, @existingNumAddressNULLs
    
    WHILE @@FETCH_STATUS = 0
    BEGIN
    	IF (@mergedEmailUser = @rowEmailUser AND @mergedDomainID = @rowDomainID) BEGIN
    		-- Merge a staging record with the current UserName/DomainID combination
    		IF (@rowNumNameNULLs < @mergedNumNameNULLs) BEGIN
    			SET @mergedNumNameNULLs = @rowNumNameNULLs
    			SET @mergedNameSourceID = @rowSourceID
    			SET @mergedNameSourceLine = @rowSourceLine
    
    			SET @mergedFirstName = @rowFirstName
    			SET @mergedLastName = @rowLastName
    		END ELSE IF @mergedNameSourceID IS NULL BEGIN
    			SET @mergedNameSourceID = @rowSourceID
    			SET @mergedNameSourceLine = @rowSourceLine
    		END
    
    		IF (@rowNumAddressNULLs < @mergedNumAddressNULLs) BEGIN
    			SET @mergedNumAddressNULLs = @rowNumAddressNULLs
    			SET @mergedAddressSourceID = @rowSourceID
    			SET @mergedAddressSourceLine = @rowSourceLine
    
    			SET @mergedStreetAddress = @rowStreetAddress
    			SET @mergedCity = @rowCity
    			SET @mergedState = @rowState
    			SET @mergedPostal = @rowPostal
    			SET @mergedCountry = @rowCountry
    		END ELSE IF @mergedAddressSourceID IS NULL BEGIN
    			SET @mergedAddressSourceID = @rowSourceID
    			SET @mergedAddressSourceLine = @rowSourceLine
    		END
    		IF (@rowDob IS NOT NULL) BEGIN
    			SET @mergedDob = @rowDob
    		END
    		IF (@rowObscene = 1) BEGIN
    			SET @mergedObscene = 1
    		END
    		IF (@rowGender IS NOT NULL) BEGIN
    			SET @mergedGender = @rowGender
    		END
    		IF (@rowIPv4 IS NOT NULL) BEGIN
    			SET @mergedIPv4 = @rowIPv4
    		END
    		IF (@mergedRecordCreationSourceID IS NULL) BEGIN
    			SET @mergedRecordCreationSourceID = @rowSourceID
    		END
    
    		FETCH NEXT FROM stagingCursor INTO
    			@rowEmailUser, @rowDomainID, @rowFirstName, @rowLastName, @rowStreetAddress, @rowCity, @rowState, @rowPostal, @rowCountry
    			, @rowDob, @rowObscene, @rowGender, @rowIPv4, @rowSourceID, @rowSourceLine, @rowNumNameNULLs, @rowNumAddressNULLs
    			, @existingDomainID, @existingFirstName, @existingLastName, @existingStreetAddress, @existingCity, @existingState, @existingPostal, @existingCountry
    			, @existingDob, @existingObscene, @existingGender, @existingIPv4
    			, @existingNameSourceID, @existingNameSourceLine, @existingAddressSourceID, @existingAddressSourceLine, @existingRecordCreationSourceID, @existingNumNameNULLs, @existingNumAddressNULLs
    	END ELSE BEGIN
    		IF (@mergedEmailUser IS NOT NULL) BEGIN
    			-- Finished iterating 1+ records of a UserName/DomainID combination. INSERT/UPDATE the merged record
    			if (@mergedExistingDomainID IS NULL) BEGIN
    				-- New record. Insert
    				INSERT INTO Recipients(UserName, DomainID, First, Last, StreetAddress, City, State, Postal, Country, DOB, Obscene, Gender, IPv4, NameSourceID, NameLine, AddressSourceID, AddressLine, RecordCreationSourceID)
    					VALUES (@mergedEmailUser, @mergedDomainID, @mergedFirstName, @mergedLastName, @mergedStreetAddress, @mergedCity, @mergedState, @mergedPostal, @mergedCountry
    					, @mergedDob, @mergedObscene, @mergedGender, @mergedIPv4, @mergedNameSourceID, @mergedNameSourceLine, @mergedAddressSourceID, @mergedAddressSourceLine, @mergedRecordCreationSourceID)
    			END ELSE BEGIN
    				-- Existing record. Update
    				UPDATE Recipients
    					SET First = @mergedFirstName
    					, Last = @mergedLastName
    					, StreetAddress = @mergedStreetAddress
    					, City = @mergedCity
    					, State = @mergedState
    					, Postal = @mergedPostal
    					, Country = @mergedCountry
    					, DOB = @mergedDOB
    					, Obscene = @mergedObscene
    					, Gender = @mergedGender
    					, IPv4 = @mergedIPv4
    					, NameSourceID = @mergedNameSourceID
    					, NameLine = @mergedNameSourceLine
    					, AddressSourceID = @mergedAddressSourceID
    					, AddressLine = @mergedAddressSourceLine
    					, RecordCreationSourceID = @mergedRecordCreationSourceID
    				WHERE UserName = @mergedEmailUser AND DomainID = @mergedDomainID
    			END
    		END
    
    		-- Reached a new UserName/DomainID combination. Set current data.
    		SET @mergedEmailUser = @rowEmailUser
    		SET @mergedDomainID = @rowDomainID
    		SET @mergedExistingDomainID = @existingDomainID
    
    		SET @mergedFirstName = @existingFirstName
    		SET @mergedLastName = @existingLastName
    		SET @mergedStreetAddress = @existingStreetAddress
    		SET @mergedCity = @existingCity
    		SET @mergedState = @existingState
    		SET @mergedPostal = @existingPostal
    		SET @mergedCountry = @existingCountry
    		SET @mergedDob = @existingDob
    		SET @mergedObscene = ISNULL(@existingObscene, 0)
    		SET @mergedGender = @existingGender
    		SET @mergedIPv4 = @existingIPv4
    		SET @mergedNameSourceID = @existingNameSourceID
    		SET @mergedNameSourceLine = @existingNameSourceLine
    		SET @mergedAddressSourceID = @existingAddressSourceID
    		SET @mergedAddressSourceLine = @existingAddressSourceLine
    		SET @mergedRecordCreationSourceID = @existingRecordCreationSourceID
    		SET @mergedNumNameNULLs = @existingNumNameNULLs
    		SET @mergedNumAddressNULLs = @existingNumAddressNULLs
    
    		-- Don't advance cursor. Next iteration through the loop will process the first staging record.
    	END
    END
    CLOSE stagingCursor
    DEALLOCATE stagingCursor

  3. #3
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Wanna make it faster?

    Don't use a cursor!
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  4. #4
    Join Date
    Oct 2003
    Posts
    268
    Quote Originally Posted by blindman
    Wanna make it faster?

    Don't use a cursor!
    LOL! We tried that before; finally got it working and the performance didn't pan out.

  5. #5
    Join Date
    Apr 2004
    Location
    Kansas City, MO
    Posts
    734
    That's because you didn't do it right.
    MeanOldDBA
    derrickleggett@hotmail.com
    When life gives you a lemon, fire the DBA.

  6. #6
    Join Date
    Oct 2003
    Posts
    268
    Quote Originally Posted by RogerWilco
    I have a stored proc that merges records from an undeduped table to a deduped table. It is running really slowly. Merging 70 million records against a deduped 70 million is taking 115 hours and counting on decent hardware. Very slow.
    Thanks for your help so far everyone.

    What I think I will do is:
    - INSERT new records into a new unindexed table
    - INSERT updated records into another unindexed table.
    Both of the above should be very fast. Then after the cursor logic, do one major INSERT and one major UPDATE. That sounds like the best approach.

  7. #7
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I'll be really surprised if you find a tolerable answer using cursors. This is the kind of problem that I've only been able to solve by converting it from row oriented (client/server like code) to set oriented (relational code). Let us know how this turns out for you.

    -PatP

  8. #8
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    OK...I gotta know 3 things.

    1. What do you store in obscene?

    2. Is there smoke eminating out of any of your boxes yet?

    3. Where do you work? And can I commute there?
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  9. #9
    Join Date
    Oct 2003
    Posts
    268
    Quote Originally Posted by Brett Kaiser
    OK...I gotta know 3 things.

    1. What do you store in obscene?

    2. Is there smoke eminating out of any of your boxes yet?

    3. Where do you work? And can I commute there?
    LOL

    1) Obscene is a bit column that flags users who entered "inappropriate" data. If they entered a street address of "123 Go to Hell Blvd" or something like that.

    2) We've burned out a few hard drives already (RAID 5 so we are OK)

    3) I work in Austin, TX

  10. #10
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I'll hazard guesses on the first two.

    1) I would expect that obscene is a boolean or Yes/No column, much like sex.
    2) I wouldn't expect the boxes to be smoking in any way, shape, or form.
    3) Sorry, no I have clue on these.

    -PatP

  11. #11
    Join Date
    Oct 2003
    Posts
    268
    Quote Originally Posted by RogerWilco
    - INSERT new records into a new unindexed table
    - INSERT updated records into another unindexed table.

    Both of the above should be very fast. Then after the cursor logic, do one major INSERT and one major UPDATE. That sounds like the best approach.
    Does anyone think that logging to a flat file and then loading through bcp would be faster or slower than logging to an unindexed table? T-SQL doesn't have native flat file access commands (have to go through shell commands).

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •