If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Microsoft SQL Server > keep duplicate with highest score fuzzy grouping

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-22-12, 17:52
Dorf Dorf Dorf Dorf is offline
Registered User
 
Join Date: Jan 2012
Posts: 7
keep duplicate with highest score fuzzy grouping

Hi i have recently decided to dedupe my data but i am having a problem after running fuzzy grouping with the query on updating which duplicate to keep

_key_in is unique, _key_out is the duplicates so for example:

_key_in , _key_out , name , score , dedupe
1 , 1 , ron , 10 , purge
2 , 1 , ronn , 15 , keep
3 , 3 , john , 5 , keep
4 , 4 , matt , 15 , keep
5 , 4 , mat , 10 , purge
6 , 4 , matt , 15 , purge

I want to keep the _key_out with the higher score by setting the field de_dupe to 'keep' and the remainder to 'purge'. The score can also be the same within a duplicate so in the case it is the same i just need to keep one it doesnt matter which one. The query i have below nearly works but it marks duplicates with the same score as keep.

If someone could please take a look at it and help me fix it up it would be greatly appreciated

Code:
UPDATE 	b
SET 	b.dedupe_result = 'keep'
FROM 	
	[BusinessListings].[dbo].[MongoOrganisationACTM1Destination] b
INNER JOIN
(
	SELECT 
           Distinct(_key_out) AS _key_out,
           quality_score 
	FROM [BusinessListings].[dbo].[MongoOrganisationACTM1Destination] t1
	WHERE quality_score = 
		(
			SELECT MAX(quality_score) 
			FROM [BusinessListings].[dbo].[MongoOrganisationACTM1Destination] 
			WHERE _key_out = t1._key_out)
		)
) d ON d._key_out = b._key_out AND d.quality_score = b.quality_score
Reply With Quote
  #2 (permalink)  
Old 01-22-12, 19:43
kordirko kordirko is offline
Registered User
 
Join Date: Jan 2012
Posts: 67
Hi,

it would be really helpful if you provided us CREATE TABLE statements and also INSERTs statemens with sample data,
here is an example:
Code:
go
create table Mongo(
_key_in int primary key,
_key_out int,
name varchar(100), 
score int,
dedupe varchar(100)
);
go
insert into mongo values(1 , 1 , 'ron' , 10 , 'purge' );
insert into mongo values(2 , 1 , 'ronn' , 15 , 'keep' );
insert into mongo values(3 , 3 , 'john' , 5 , 'keep' );
insert into mongo values(4 , 4 , 'matt' , 15 , 'keep' );
insert into mongo values(5 , 4 , 'mat' , 10 , 'purge' );
insert into mongo values(6 , 4 , 'matt' , 15 , 'purge' );



Try this query it should meet your requirements:
Code:
go
merge into mongo mon
using (
	select _key_in,
	case ROW_NUMBER() OVER (PARTITION BY _key_out ORDER BY score desc) 
		 when 1 then 'keep'
		 else 'purge'
	end new_dedupe
	from Mongo
) new
ON (mon._key_in = new._key_in )
WHEN MATCHED THEN UPDATE
   SET mon.dedupe = new.new_dedupe
;

Code:
go
select * from Mongo;

_key_in     _key_out    name                   score       dedupe
----------- ----------- ---------------------- ----------- ---------
1           1           ron                    10          purge
2           1           ronn                   15          keep
3           3           john                   5           keep
4           4           matt                   15          keep
5           4           mat                    10          purge
6           4           matt                   15          purge

(6 row(s) affected)

Last edited by kordirko; 01-22-12 at 19:59.
Reply With Quote
  #3 (permalink)  
Old 01-22-12, 23:15
Dorf Dorf Dorf Dorf is offline
Registered User
 
Join Date: Jan 2012
Posts: 7
thanks kordirko that worked perfectly
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On