Results 1 to 4 of 4
  1. #1
    Join Date
    Jul 2011
    Posts
    2

    Unanswered: Generate a CLuster ID based on two other columns

    Hello guys,

    I have the following table with the following data:

    create table tbl1
    (
    ClusterID VARCHAR(20) null,
    ID1 VARCHAR(20) null,
    ID2 VARCHAR(20) null
    )
    GO

    insert into tbl1 tbl1 values
    (null, '1', '2');
    insert into tbl1 tbl1 values
    (null, '2', '3');
    insert into tbl1 tbl1 values
    (null, '1', '3');
    insert into tbl1 tbl1 values
    (null, '4', '5');
    insert into tbl1 tbl1 values
    (null, '5', '6');
    insert into tbl1 tbl1 values
    (null, '6', '7');
    GO

    Is there a way to generate some sort of a Cluster ID based on the columns ID1 and ID2?

    The result should be something like this:

    ClusterID ID1 ID2
    -------------------------
    CL1 1 2
    CL1 2 3
    CL1 1 3
    CL2 4 5
    CL2 5 6
    CL2 6 7

    Any help is much appreciated.

    Thank you!
    Andrei

  2. #2
    Join Date
    Jul 2011
    Posts
    1

    Update Statement

    So it looks like the ClusterID of CL1 will be used when ID1 & ID2 are 3 or under otherwise use CL2. Could you use case in an update statement like below and then set the rules to what you need by changing the case expression.

    Code:
    update tbl1
    set ClusterID =
    	(CASE
    	 when id1 <= 3 and ID2 <=3 THEN 'CL1'
    	 ELSE 'CL2'
    	 END
    	)

  3. #3
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    You haven't given enough information to create a general solution. Without knowing what determines the value of your "Clustered ID", the best I can do is:
    Code:
    UPDATE tbl1
       SET ClusterID = CASE
          WHEN  ('1' = ID1 AND '2' = ID2)
             OR ('2' = ID1 AND '3' = ID2)
             OR ('1' = ID1 AND '3' = ID2)
             THEN 'CL1'
          WHEN  ('4' = ID1 AND '5' = ID2)
             OR ('5' = ID1 AND '6' = ID2)
             OR ('6' = ID1 AND '7' = ID2)
             THEN 'CL2'
          END
    Odds are good that this isn't what you want, but you'll need to explain the problem better for me to create a better solution.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  4. #4
    Join Date
    Jul 2011
    Posts
    2
    Guys, thanks for the interest in solving this problem. Apologies for not giving enough information. I have it in my head and sometimes I have the wrong impression that just a few details are enough. I will try again :-)

    I have a table in my database with company information. Unfortunately it does contain duplicated records. I was giving the task to find a solution to remove these duplicates. The problem is that the duplicates are not exact and I used some fuzzy algorithms to identify them.

    This is a sample of my table with company info:

    ID CompanyName
    ------------------------
    1 Alfa Inc
    2 Alfa
    3 Alfa Co.
    4 Beta New York
    5 Beta Incorporated
    6 Beta Co
    7 Beta Oy
    8 Omega


    After deduping I got the following table:

    ClusterID ID1 ID2
    -------------------------
    CL1 1 2
    CL1 2 3
    CL1 1 3
    CL2 4 5
    CL2 5 6
    CL2 7 5

    I will explain it a bit:

    The company with ID=1 is the same as the one with ID = 2 (Alfa Inc and Alfa). At the same time, the company with ID = 2 is the same as the company with ID=3. Also, ID=1 is the same with ID=3. That means that the first 3 lines should get the same CLusterID. The same with the next three lines (company name = Beta). They should all get a new Cluster ID => CL2.

    It is not a general rule that the Cluster is given to only 3 records. It can be only 2 or 5, or etc.

    Thanks

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •