Results 1 to 3 of 3
  1. #1
    Join Date
    Jul 2010
    Posts
    21

    Unanswered: Select duplicate which exists in two columns

    Hi,

    Last time, I did asked about selecting duplicate rows that only involve with a column. This time i need to find duplicate rows which exist in two or more columns. Now consider I have this example of data:
    Code:
    user +  logtime +   item
    -----+----------+---------------
    1    | 12:00    | 35
    1    | 12:00    | 35
    1    | 12:00    | 35
    1    | 12:00    | 12
    1    | 12:00    | 32
    2    | 12:00    | 101
    3    | 12:00    | 101
    3    | 12:00    | 111
    3    | 12:00    | 112
    4    | 12:00    | 18
    4    | 12:00    | 17
    5    | 12:00    | 231
    5    | 12:00    | 231
    5    | 12:00    | 241
    6    | 12:00    | 52
    I tried something like this:
    Code:
    SELECT 
        * 
    FROM tbl_test
    WHERE (user, item) 
                IN (SELECT user, item 
    	         FROM tbl_test t2 
    	         GROUP BY user, item 
    	         HAVING COUNT(*) > 1 );
    
    Result:
    user +   logtime+   Item
    -----+----------+---------------
    1    | 12:00    | 35
    1    | 12:00    | 35
    1    | 12:00    | 35
    5    | 12:00    | 231
    5    | 12:00    | 231
    This code works well. However, this is not the result that i want. My expected result should be like this:
    Code:
    user +   logtime+   item
    -----+----------+---------------
    1    | 12:00    | 35
    1    | 12:00    | 35
    1    | 12:00    | 35
    1    | 12:00    | 12
    1    | 12:00    | 32
    2    | 12:00    | 101
    3    | 12:00    | 101   ----> item same, user different
    3    | 12:00    | 111
    3    | 12:00    | 112
    4    | 12:00    | 18
    4    | 12:00    | 17     ----> user same, item different
    5    | 12:00    | 231
    5    | 12:00    | 231
    5    | 12:00    | 241
    It should include all the user or item that has duplicates.
    e.g 1: user 1 got five rows with three different items. All rows will be included because they have same user id.
    Code:
    user +   logtime+   item
    -----+----------+---------------
    1    | 12:00    | 35
    1    | 12:00    | 35
    1    | 12:00    | 35
    1    | 12:00    | 12
    1    | 12:00    | 32
    .
    .
    .
    e.g 2: Since item is same for user 2 and 3, it will be selected.
    Code:
    user +   logtime+   item
    -----+----------+---------------
    .
    .
    2    | 12:00    | 101
    3    | 12:00    | 101
    3    | 12:00    | 111
    3    | 12:00    | 112
    .
    .
    .
    How can i get this right? Thanks in advance.

  2. #2
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    1: user is a reserved word, so, when using it as a field name, it needs to be wrapped with quotes ("user"). Or, rename the field to something like user_id...

    You said: "It should include all the user or item that has duplicates."

    Your code works for all the user AND item, in combination, that has duplicates.

    I think you're going to need to issue two separate subqueries (one by item, one by user,) and use UNION to join the results. Something along the lines of:

    Code:
    Select TOT.* from
    (
    SELECT user_id, logtime, item
    FROM tbl_test 
    WHERE item 
                IN (SELECT  t2.item 
                 FROM tbl_test t2 
                 GROUP BY t2.item 
                 HAVING COUNT(t2.item) > 1 ) 
    
    UNION ALL
    SELECT user_id, logtime, item
    FROM tbl_test 
    WHERE user_id 
                IN (SELECT  t3.user_id 
                 FROM tbl_test t3 
                 GROUP BY t3.user_id 
                 HAVING COUNT(t3.user_id) > 1 )
    ) TOT
    ORDER BY TOT.user_id, TOT.item
    ;
    You may need to use UNION ALL...
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  3. #3
    Join Date
    Jul 2010
    Posts
    21
    Hi,

    Thanks for the solution. I also found a solution by coding it like this:
    Code:
    SELECT * 
    FROM 
           (SELECT *,
             ROW_NUMBER() over (PARTITION BY user_id,item) as row_num
             FROM  tbl_test
            )
    WHERE row_num < 3
    Just for sharing.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •