# Thread: Generate a CLuster ID based on two other columns

1. Registered User
Join Date
Jul 2011
Posts
2

## Unanswered: Generate a CLuster ID based on two other columns

Hello guys,

I have the following table with the following data:

create table tbl1
(
ClusterID VARCHAR(20) null,
ID1 VARCHAR(20) null,
ID2 VARCHAR(20) null
)
GO

insert into tbl1 tbl1 values
(null, '1', '2');
insert into tbl1 tbl1 values
(null, '2', '3');
insert into tbl1 tbl1 values
(null, '1', '3');
insert into tbl1 tbl1 values
(null, '4', '5');
insert into tbl1 tbl1 values
(null, '5', '6');
insert into tbl1 tbl1 values
(null, '6', '7');
GO

Is there a way to generate some sort of a Cluster ID based on the columns ID1 and ID2?

The result should be something like this:

ClusterID ID1 ID2
-------------------------
CL1 1 2
CL1 2 3
CL1 1 3
CL2 4 5
CL2 5 6
CL2 6 7

Any help is much appreciated.

Thank you!
Andrei

2. Registered User
Join Date
Jul 2011
Posts
1

## Update Statement

So it looks like the ClusterID of CL1 will be used when ID1 & ID2 are 3 or under otherwise use CL2. Could you use case in an update statement like below and then set the rules to what you need by changing the case expression.

Code:
```update tbl1
set ClusterID =
(CASE
when id1 <= 3 and ID2 <=3 THEN 'CL1'
ELSE 'CL2'
END
)```

3. Resident Curmudgeon
Join Date
Feb 2004
Location
In front of the computer
Posts
15,579
You haven't given enough information to create a general solution. Without knowing what determines the value of your "Clustered ID", the best I can do is:
Code:
```UPDATE tbl1
SET ClusterID = CASE
WHEN  ('1' = ID1 AND '2' = ID2)
OR ('2' = ID1 AND '3' = ID2)
OR ('1' = ID1 AND '3' = ID2)
THEN 'CL1'
WHEN  ('4' = ID1 AND '5' = ID2)
OR ('5' = ID1 AND '6' = ID2)
OR ('6' = ID1 AND '7' = ID2)
THEN 'CL2'
END```
Odds are good that this isn't what you want, but you'll need to explain the problem better for me to create a better solution.

-PatP

4. Registered User
Join Date
Jul 2011
Posts
2
Guys, thanks for the interest in solving this problem. Apologies for not giving enough information. I have it in my head and sometimes I have the wrong impression that just a few details are enough. I will try again :-)

I have a table in my database with company information. Unfortunately it does contain duplicated records. I was giving the task to find a solution to remove these duplicates. The problem is that the duplicates are not exact and I used some fuzzy algorithms to identify them.

This is a sample of my table with company info:

ID CompanyName
------------------------
1 Alfa Inc
2 Alfa
3 Alfa Co.
4 Beta New York
5 Beta Incorporated
6 Beta Co
7 Beta Oy
8 Omega

After deduping I got the following table:

ClusterID ID1 ID2
-------------------------
CL1 1 2
CL1 2 3
CL1 1 3
CL2 4 5
CL2 5 6
CL2 7 5

I will explain it a bit:

The company with ID=1 is the same as the one with ID = 2 (Alfa Inc and Alfa). At the same time, the company with ID = 2 is the same as the company with ID=3. Also, ID=1 is the same with ID=3. That means that the first 3 lines should get the same CLusterID. The same with the next three lines (company name = Beta). They should all get a new Cluster ID => CL2.

It is not a general rule that the Cluster is given to only 3 records. It can be only 2 or 5, or etc.

Thanks