Page 1 of 2 12 LastLast
Results 1 to 15 of 21
  1. #1
    Join Date
    May 2008
    Posts
    97

    Unanswered: Distinct on Join: Stange Behaviour

    Hi Guys,

    I'm trying to retrieve all columns in a table filtered by 1 distinct column. If someone can help me understand why I am getting back all rows that would be great!

    So I read a tutorial to do this: Here if it matters
    SELECT * FROM #FILTEREDATRS AS A
    RIGHT JOIN
    (SELECT distinct distcol FROM #FILTEREDATRS) AS DISTTABLE
    ON
    A.distcol = DISTTABLE.distcol

    Info on the tables I'm using:
    #FILTEREDATRS table contains 10,000 rows
    (SELECT distinct distcol FROM #FILTEREDATRS) subquery contains 403 rows

    My final result returned with 10 000 rows containing duplicate 'distcol'! This is so very very strange.. how could the full entity of my query return 10 000 rows when the table I'm RIGHT JOINING (The subquery) only have 403! Its almost like no query was performed...

    Did I miss something?? If you could answer that I would be really happy!
    Last edited by Eric the Red; 07-14-11 at 19:11.

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    The RIGHT OUTER JOIN guarantees that the right row will be returned whether or not there is a matching left row... The columns from the LEFT row might be NULL if there is no match, but any right rows will be present.

    Based on your query, every row from the A table will be returned as long as there is at least one matching row in the DISTTABLE.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    May 2008
    Posts
    97
    Quote Originally Posted by Pat Phelan View Post
    The RIGHT OUTER JOIN guarantees that the right row will be returned whether or not there is a matching left row... The columns from the LEFT row might be NULL if there is no match, but any right rows will be present.

    Based on your query, every row from the A table will be returned as long as there is at least one matching row in the DISTTABLE.

    -PatP
    Is it also safe to say that if Distable is 403 rows, then the max rows my full query should return is 403 rows?

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by Eric the Red View Post
    Is it also safe to say that if Distable is 403 rows, then the max rows my full query should return is 403 rows?
    No. Execute this and see what happens:
    Code:
    DROP TABLE #a
    GO
    DROP TABLE #disttable
    GO
    SELECT 1 AS distcol
       INTO #a
    UNION SELECT 2 UNION SELECT 3
    
    SELECT * INTO #disttable FROM #a
    
    SELECT *
       FROM #a AS a
       RIGHT JOIN #disttable AS disttable
          ON (a.distcol = disttable.distcol)
    
    INSERT INTO #a SELECT * FROM #a
    
    SELECT *
       FROM #a AS a
       RIGHT JOIN #disttable AS disttable
          ON (a.distcol = disttable.distcol)
    
    INSERT INTO #a SELECT * FROM #a
    
    SELECT *
       FROM #a AS a
       RIGHT JOIN #disttable AS disttable
          ON (a.distcol = disttable.distcol)
    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Quote Originally Posted by Eric the Red
    My final result returned with 10 000 rows containing duplicate 'distcol'! This is so very very strange.. how could the full entity of my query return 10 000 rows when the table I'm RIGHT JOINING (The subquery) only have 403! Its almost like no query was performed...
    The 403 table contains ALL possible values of the base table distcol column. There is not one distcol that will not be present in the 403 table.

    The RIGHT OUTER JOIN will at least return all the 403 rows in the table at the right side, even if it wouldn't be able to find a match with the left table.

    If the left table would have 403 matching records, the result would still be 403 rows.

    But now the left table has 10 000 rows, and they will all match with one record at the right side, by definition, because the 403 records were extracted out of the 10 000 base table to start with.

    So you will indeed end up with the same amount of records you started with.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  6. #6
    Join Date
    May 2008
    Posts
    97
    Quote Originally Posted by Pat Phelan View Post
    No. Execute this and see what happens:
    Code:
    DROP TABLE #a
    GO
    DROP TABLE #disttable
    GO
    SELECT 1 AS distcol
       INTO #a
    UNION SELECT 2 UNION SELECT 3
    
    SELECT * INTO #disttable FROM #a
    
    SELECT *
       FROM #a AS a
       RIGHT JOIN #disttable AS disttable
          ON (a.distcol = disttable.distcol)
    
    INSERT INTO #a SELECT * FROM #a
    
    SELECT *
       FROM #a AS a
       RIGHT JOIN #disttable AS disttable
          ON (a.distcol = disttable.distcol)
    
    INSERT INTO #a SELECT * FROM #a
    
    SELECT *
       FROM #a AS a
       RIGHT JOIN #disttable AS disttable
          ON (a.distcol = disttable.distcol)
    -PatP
    Output1
    ---------
    1 1
    2 2
    3 3

    Output2
    ---------
    1 1
    1 1
    2 2
    2 2
    3 3
    3 3

    Output3
    ---------
    1 1
    1 1
    1 1
    1 1
    2 2
    2 2
    2 2
    2 2
    3 3
    3 3
    3 3
    3 3

  7. #7
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    See how even though there are only three rows in the #DISTTABLE, the number of result set rows changes directly as the number of rows in #a changes. The SELECT statements that produce the output in my example exactly match the statement in your original question.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  8. #8
    Join Date
    May 2008
    Posts
    97
    Quote Originally Posted by Pat Phelan View Post
    See how even though there are only three rows in the #DISTTABLE, the number of result set rows changes directly as the number of rows in #a changes. The SELECT statements that produce the output in my example exactly match the statement in your original question.

    -PatP
    EDIT: I now understand this part you made me run. So how do I make my output only 403 rows?

    Quote Originally Posted by Wim View Post
    But now the left table has 10 000 rows, and they will all match with one record at the right side, by definition, because the 403 records were extracted out of the 10 000 base table to start with.

    So you will indeed end up with the same amount of records you started with.
    Ok I think I understand this problem. So what solution can you recommend? My goal is to show records in the #FILTEREDATRS table where the 'distcol' column is unique..
    Last edited by Eric the Red; 07-15-11 at 10:29.

  9. #9
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Quote Originally Posted by Eric the Red View Post
    My goal is to show records in the #FILTEREDATRS table where the 'distcol' column is unique..
    If you mean by 'unique', as occurring only once in #FILTEREDATRS, this should work:
    Code:
    SELECT A.* 
    FROM #FILTEREDATRS AS A
    	INNER JOIN (SELECT Distcol, count(*)
    			FROM #FILTEREDATRS
    			GROUP BY Distcol
    			HAVINg COUNT(*) = 1
    			) AS DISTTABLE ON
    		A.distcol = DISTTABLE.distcol
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  10. #10
    Join Date
    May 2008
    Posts
    97
    Quote Originally Posted by Wim View Post
    If you mean by 'unique', as occurring only once in #FILTEREDATRS.
    That's exactly it. The only table I have right now is the

    #FILTEREDATRS table: so to eliminate the 'table doesn't exist' errors from your query, I had to run parts of the query to get table A and Disttable.

    Anyway, past those error I got,

    No column was specified for column 2 of 'DISTTABLE'.
    So that you know most of the columns in my Distable are text values.

    I really appreciate the help and I'll see if I can use something from it

  11. #11
    Join Date
    Mar 2007
    Location
    Holmestrand, Norway
    Posts
    332
    Code:
    count(*) as SomeColumnName
    Ole Kristian Velstadbråten Bangås - Virinco - MSSQL.no - Facebook - Twitter

  12. #12
    Join Date
    May 2008
    Posts
    97
    Quote Originally Posted by roac View Post
    Code:
    count(*) as SomeColumnName
    EDIT ::: Did it. But the query is not returning all columns (403) from DISTTABLE now only returning 78. So I'm missing distcol values in my final result

    MAN. This sql shit gets complicated.......
    Last edited by Eric the Red; 07-15-11 at 12:39.

  13. #13
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    Quote Originally Posted by Eric the Red View Post

    MAN. This sql shit gets complicated.......
    A day to learn. A lifetime to master.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  14. #14
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Got it working but returning less than 403 distinct rows. I'm at 72 Rows with this entire query.
    Those 72 rows you got in the result set are rows with a Distcol value that only occurs once in the table with 10 000 records.

    If you want to have 403 rows, run
    Code:
    SELECT DISTINCT Distcol 
    FROM #FILTEREDATRS
    If you want to have "unique" records, you must run
    Code:
    SELECT DISTINCT * 
    FROM #FILTEREDATRS
    The number of records will be a number between (and including) 403 and 10 000.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  15. #15
    Join Date
    May 2008
    Posts
    97
    Quote Originally Posted by Wim View Post
    Those 72 rows you got in the result set are rows with a Distcol value that only occurs once in the table with 10 000 records.
    I needed something slightly different: It doesn't matter how many times discol appears in the table with 10 000 records. BUT I only want to see only the 1st time this records apears in that table. Slightly different..



    If you want to have 403 rows, run
    Code:
    SELECT DISTINCT Distcol 
    FROM #FILTEREDATRS
    If I run this great! BUT i don't have all the other fields from the large 10 000 row table. And if I add a "distinct discol, * " then the distinct is working on the entire columns! Bad..

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •