# Thread: Duplicate record Count

1. Registered User
Join Date
Jul 2010
Posts
10

## Unanswered: Duplicate record Count

Hello,

Any help or advice is really appreciated

I have the following table

Table A
vendorname

walmart
walmartinc
walmartinc
cubfoods
cubfoods
cubfood
asper
apser

I need the view the count of the duplicated vendors. if I perform the following

select vendorname,count(*) as duplicates from vendors
group by vendorname having count(*) >1

the following o/p is displayed

Vendorname duplicates
walmartinc 2
cubfoods 2

As we can see that (walmart, walmart inc),(asper,apser) are the same vendors; but because of the extra character or number its been considered unique; and because of which I am unable to specify the correct number of unique and duplicate vendors in the vendor table of nearly 15000 records.

please advice me how can i avoid this problem; is there a way to compare two columns based on the first 7 characters.

Thank you

2. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
Originally Posted by tammy2512
As we can see that (walmart, walmart inc),(asper,apser) are the same vendors; but because of the extra character or number its been considered unique
Maybe you can see that, but without knowing how you make that determination isn't clear.
Originally Posted by tammy2512
is there a way to compare two columns based on the first 7 characters.
Definitely.

If you need to retain the current values for the "vendor" column, create a duplicate column to hold the original values. I'll explain that process in a minute.

Compute the current counts using:
Code:
```SELECT Count(*), vendor
FROM [Table A]
GROUP BY vendor
ORDER BY vendor```
Examine this output of this statement to find the "duplicates" using your criteria. Each time you find a pair of values that meet your "duplicate" criteria, replace the values with the smaller count with the values of the larger count. For example
Code:
```UPDATE [Table A]
SET vendor = 'walmartinc'
WHERE  'walmart' = vendor```
Once this process is complete, you'll have a valid count.

If you need to keep the original values, add a colum to hold those values.
Code:
```SELECT *, vendor AS [Original vendor]
INTO [my new table name]
FROM [Table A]```
-PatP

3. Registered User
Join Date
Jul 2010
Posts
10
Thank you so much for your reply

Examine this output of this statement to find the "duplicates" using your criteria. Each time you find a pair of values that meet your "duplicate" criteria, replace the values with the smaller count with the values of the larger count.

Most of the time I will not be aware of the duplicate criteria; for example for vendor walmart, i will not know that there is vendor walmartinc; then how am i supposed to mention the duplicate criteria in the following code; becuase i deal with more then 50k records sometimes

UPDATE [Table A]
SET vendor = 'walmartinc'
WHERE 'walmart' = vendor

Thank you for your time
Last edited by tammy2512; 08-20-10 at 21:12.

4. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
If you don't know how to tell if walmart and walmartinc are duplicates, there is no possibility of writing code that can do it. What you have written tells me that human inspection is required, because code could never reliably tell that asper and apser are the same vendor.

-PatP

5. Registered User
Join Date
Nov 2004
Posts
1,428
You can also create a translation table:
Code:
```GivenName	CorrectName
Walmart		Walmart
Wallmart	Walmart
WalmartInc	Walmart
...
asper		asper
apser		asper
...```
Do a join on the GivenName and use the CorrectName in the GROUP BY. This way you don't alter the data people have put in. Some may insist that their "Walmart" or "Walmart inc" is the only right way of writing that name.

To populate the translation table, check for DISTINCT GivenNames that are not yet present in the translation table. First add all the correct names, by entering it in both GivenName and CorrectName. (*) Then run your script again. Now you will get the alternative names that you will have to connect with the correct name, and put (a number of) them in GivenName and CorrectName. Continue from (*) until all used vendor names can be matched with the correct vendor name.

Each time, before running the report, you can check your table on the occurrence of new GivenNames that are not yet present in the translation table (use the script you wrote before). The maintenance of the translation table is a manual chore.

The best way to deal with this is to limit the possible domain of values for the Vendor column by using a selection list from witch users can select the vendor name. But that implies control over the data entry application.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•