| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |

01-22-12, 17:52
|
|
Registered User
|
|
Join Date: Jan 2012
Posts: 7
|
|
|
keep duplicate with highest score fuzzy grouping
|
|
Hi i have recently decided to dedupe my data but i am having a problem after running fuzzy grouping with the query on updating which duplicate to keep
_key_in is unique, _key_out is the duplicates so for example:
_key_in , _key_out , name , score , dedupe
1 , 1 , ron , 10 , purge
2 , 1 , ronn , 15 , keep
3 , 3 , john , 5 , keep
4 , 4 , matt , 15 , keep
5 , 4 , mat , 10 , purge
6 , 4 , matt , 15 , purge
I want to keep the _key_out with the higher score by setting the field de_dupe to 'keep' and the remainder to 'purge'. The score can also be the same within a duplicate so in the case it is the same i just need to keep one it doesnt matter which one. The query i have below nearly works but it marks duplicates with the same score as keep.
If someone could please take a look at it and help me fix it up it would be greatly appreciated
Code:
UPDATE b
SET b.dedupe_result = 'keep'
FROM
[BusinessListings].[dbo].[MongoOrganisationACTM1Destination] b
INNER JOIN
(
SELECT
Distinct(_key_out) AS _key_out,
quality_score
FROM [BusinessListings].[dbo].[MongoOrganisationACTM1Destination] t1
WHERE quality_score =
(
SELECT MAX(quality_score)
FROM [BusinessListings].[dbo].[MongoOrganisationACTM1Destination]
WHERE _key_out = t1._key_out)
)
) d ON d._key_out = b._key_out AND d.quality_score = b.quality_score
|
|

01-22-12, 19:43
|
|
Registered User
|
|
Join Date: Jan 2012
Posts: 67
|
|
Hi,
it would be really helpful if you provided us CREATE TABLE statements and also INSERTs statemens with sample data,
here is an example:
Code:
go
create table Mongo(
_key_in int primary key,
_key_out int,
name varchar(100),
score int,
dedupe varchar(100)
);
go
insert into mongo values(1 , 1 , 'ron' , 10 , 'purge' );
insert into mongo values(2 , 1 , 'ronn' , 15 , 'keep' );
insert into mongo values(3 , 3 , 'john' , 5 , 'keep' );
insert into mongo values(4 , 4 , 'matt' , 15 , 'keep' );
insert into mongo values(5 , 4 , 'mat' , 10 , 'purge' );
insert into mongo values(6 , 4 , 'matt' , 15 , 'purge' );
Try this query it should meet your requirements:
Code:
go
merge into mongo mon
using (
select _key_in,
case ROW_NUMBER() OVER (PARTITION BY _key_out ORDER BY score desc)
when 1 then 'keep'
else 'purge'
end new_dedupe
from Mongo
) new
ON (mon._key_in = new._key_in )
WHEN MATCHED THEN UPDATE
SET mon.dedupe = new.new_dedupe
;
Code:
go
select * from Mongo;
_key_in _key_out name score dedupe
----------- ----------- ---------------------- ----------- ---------
1 1 ron 10 purge
2 1 ronn 15 keep
3 3 john 5 keep
4 4 matt 15 keep
5 4 mat 10 purge
6 4 matt 15 purge
(6 row(s) affected)
|
Last edited by kordirko; 01-22-12 at 19:59.
|

01-22-12, 23:15
|
|
Registered User
|
|
Join Date: Jan 2012
Posts: 7
|
|
|
|
thanks kordirko that worked perfectly
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|