Here is a simplified version of what I'm trying to do:
Name .Date ..Created ....Modified ..Job
John ..5/5/2003 5/10/2003 .300
Pete ..5/8/2003 5/15/2003 .315 delete me!
Pete ..5/8/2003 5/15/2003 5/17/2003 315
Sam ..7/7/2003 7/10/2003 7/15/2003 .350 delete me!
Sam ..7/7/2003 7/10/2003 7/18/2003 .350
I am hoping to come up with a query that will ..
Figure out the records in which all fields match except the MODIFIED field. Where this happens I want to do the following:
1) Delete one of the records that matches but has a blank MODIFIED field
2) Delete one of the records that matches but has a MODIFIED field that is greater than a previous MODIFIED field (Record was modified and then modified again)
There are about 110,000 stored on a SQL Server table and there are very few duplicates. If I pull all records that I don't want deleted, it will take a very long time to update these records that are already stored on the table.
I decided to run some performance tests just for shits.
I created a table using this script:
CREATE TABLE #temp (f1 INT, f2 INT)
DECLARE @x INT
DECLARE @rx INT
SET @x = 1
WHILE @x < 110000
SET @rx = ROUND(RAND() * 100, 0)
INSERT INTO #temp(f1, f2)
VALUES (@x, @rx)
SET @x = @x + 1
So now I have a 110,000 record table with one field ranging 1-110,000 and another field of random numbers ranging from 0 to 100.
This query takes about 28 seconds (give or take a second) to execute.
Then I ran this query, which is identicle to the one you said you have a performance issue with:
SELECT * FROM #temp WHERE f1 = (SELECT MAX(f1) FROM #temp t1 WHERE t1.f2=#temp.f2)
DROP TABLE #temp
As expected, this query returned 101 records with the highest f1 value for a given f2 random value.
This query took 1 second to run, including the drop. Now granted, I probably recieved a nominal performance gain using a temp table, regardless, I don't think 1-2 seconds is so bad. If 2 seconds is STILL not good enough, you could compile the whole thing into an sp and regain a little time.