Results 1 to 14 of 14
  1. #1
    Join Date
    Mar 2003
    Posts
    130

    Question Unanswered: filtering records from a search without using "select distinct"

    This is a contrived example so I may not have it totally correct. I have three tables with the following structure:

    Code:
    Table1
    ------
    Table1_Id int (PK)
    Table1_Field1 varchar(50)
    Table1_Field2 varchar(50)
    
    
    Table2
    ------
    Table2_Id int (PK)
    Table1_Id int (FK)
    
    Table3
    ------
    Table3_Id int (PK)
    Table2_Id int (FK)
    Table3_Field1 varchar(50)
    Table3_Field2 varchar(50)

    There is a one to many relationship from Table1 to Table2 as well as a one to many relationship from Table2 to Table3.

    I have a proc right now that searches for Table1 records with fields from Table1 and Table3. Table2 is in the join section only. The field from Table3 is not returned in the result set so I have duplicate records from Table1.

    The only way I know of to prevent the duplicates is to use a select distinct but I've been told this isn't a good idea...suggested that I use a [cross][outer]apply. Any idea on how I can do this?

    Here is a sample stored proc.


    Code:
    create proc spTable1_Search
        @Table1_Id int = null,
        @Table3_Field1 varchar(50) = null
    as
    
        select distinct
            t1.Table1_Id
            , t1.Table1_Field1
            , t1.Table1_Field2
        from Table1 t1
            inner join 
                Table2 t2 on t2.Table1_Id = t1.Table1_Id
            inner join 
                Table3 t3 on t3.Table2_Id = t2.Table2_Id
        where
            (@Table1_Id is null or t1.Table1_Id = @Table1_Id)
            and
                (@Table3_Field1 is null or t3.Table3_Field1 @Table3_Field1)
    
    end

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by wey97 View Post
    The only way I know of to prevent the duplicates is to use a select distinct but I've been told this isn't a good idea...
    whoever told you that is wrong

    DISTINCT has a specific purpose, and this seems like it is one of those situations tailor-made for it

    you could also use GROUP BY, but that amounts to the same thing
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    I wonder what was the reasoning behind that "bad idea" opinion... And I'm not even sure how APPLY would help here. You could try something along these lines as an alternative:
    Code:
    select whatever from table1 where exists (
     select 1 from table2, table3 where table1.t1id = table2.t2id and ...
    )
    ---
    "It does not work" is not a valid problem statement.

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Quote Originally Posted by wey97 View Post
    The only way I know of to prevent the duplicates is to use a select distinct but I've been told this isn't a good idea.
    I will kind of support the guys about use of DISTINCT. As they say this is wrong, but with a little refinement this would amount to best practice.

    A better way to phrase it would be that you should exhaust every other method of removing distinct rows before resorting to using DISTINCT. You should also know exactly why you have duplicate rows and be able to justify why they are appearing. DISTINCT is often used by (with the greatest of respect) inexperienced developers to "make the results right". The reason this is not preferred is that DISTINCT puts (an often unnecessary) load on query.

    I would absolutely support n_i's method here. I always put any existence checks in a an EXISTS clause and never reference the table in the FROM clause. To me it is more logically sound, consistent and it also allows the optimiser to come up with more efficient plans.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  5. #5
    Join Date
    Mar 2003
    Posts
    130
    Quote Originally Posted by r937 View Post
    whoever told you that is wrong

    DISTINCT has a specific purpose, and this seems like it is one of those situations tailor-made for it

    you could also use GROUP BY, but that amounts to the same thing
    That's interesting since (like in this case) people think of DISTINCT as an ugly term but GROUP BY is used often.


    Quote Originally Posted by n_i View Post
    I wonder what was the reasoning behind that "bad idea" opinion... And I'm not even sure how APPLY would help here. You could try something along these lines as an alternative:
    Code:
    select whatever from table1 where exists (
     select 1 from table2, table3 where table1.t1id = table2.t2id and ...
    )
    I have another problem. The query should have been left join. Let me make this a little more concrete and it may make more sense.


    Code:
    Product
    ---------------
    ProductId int
    ProductName varchar(50)
    
    
    ProductList
    --------------
    ProductListId int
    ProductId int
    ProductShipId int
    
    ProductShip
    ------------
    ProductShipId int
    ProductShipName varchar(50)
    There may not be product listing rows for all products and there may not be product ship rows for all product listings.

    How can I modify the "where exists" above to work this way?
    Last edited by wey97; 03-04-10 at 11:27.

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by wey97 View Post
    I have another problem.
    you have more than just one other problem...

    Quote Originally Posted by wey97 View Post
    ... and there may not be product ship rows for all product listings.
    that's wrong

    what you've described is a classic many-to-many relationship, with ProductList as the association or relationship table (with a completely useless ProductListId column, by the way)

    as such, it makes no sense whatsoever for there to exist a row in this table with a value for the ProductId column and no value for the ProductShipId column

    by the way, your first "dumbed down" example had a hierarchical one-to-many-to-many structure, not a one-to-many-to-one (i.e. many-to-many) relationship

    i'm betting what you've just shown isn't the real schema either
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Mar 2003
    Posts
    130
    Quote Originally Posted by r937 View Post
    you have more than just one other problem...

    that's wrong

    what you've described is a classic many-to-many relationship, with ProductList as the association or relationship table (with a completely useless ProductListId column, by the way)

    as such, it makes no sense whatsoever for there to exist a row in this table with a value for the ProductId column and no value for the ProductShipId column

    by the way, your first "dumbed down" example had a hierarchical one-to-many-to-many structure, not a one-to-many-to-one (i.e. many-to-many) relationship

    i'm betting what you've just shown isn't the real schema either
    It would be nice if you'd calm down a bit. I don't think the nasty tone is necessary to get your point across.
    Quote Originally Posted by wey97
    This is a contrived example...
    So I wouldn't have to post my entire table structure.

    I made a mistake in the second table. Should have been:

    Code:
    Product
    ---------------
    ProductId int
    ProductName varchar(50)
    
    
    ProductList
    --------------
    ProductListId int
    ProductId int
    
    
    ProductShip
    ------------
    ProductShipId int
    ProductListId int
    ProductShipId varchar(50)

  8. #8
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    It's Rudy's turn to play bad cop. I get to play good cop.

    there's an error in ProductShip (3rd column)
    Testimonial:
    pootle flump
    ur codings are working excelent.

  9. #9
    Join Date
    Mar 2003
    Posts
    130
    Sorry.

    Code:
    Product
    ---------------
    ProductId int
    ProductName varchar(50)
    
    
    ProductList
    --------------
    ProductListId int
    ProductId int
    
    
    ProductShip
    ------------
    ProductShipId int
    ProductListId int
    ProductShipInvtId varchar(50)

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by wey97 View Post
    It would be nice if you'd calm down a bit. I don't think the nasty tone is necessary to get your point across.
    calm down? nasty? you ain't seen nothing yet

    perhaps you should be less sensitive

    i think you've proved my point about the folly of "dumbing down" the example -- half a dozen (and counting) posts just to settle on what the tables look like and where your problem might be

    in any case, the "[cross][outer]apply" still doesn't, um, apply

    i'd go with the EXISTS solution
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Mar 2003
    Posts
    130
    Quote Originally Posted by r937 View Post
    calm down? nasty? you ain't seen nothing yet

    perhaps you should be less sensitive

    i think you've proved my point about the folly of "dumbing down" the example -- half a dozen (and counting) posts just to settle on what the tables look like and where your problem might be

    in any case, the "[cross][outer]apply" still doesn't, um, apply

    i'd go with the EXISTS solution
    This is getting hilarious. I guess "post god" status gives you a 'get out of jail free card' to be a jerk. A normal user and the thread would be locked.

    I don't think posting my entire table structure would be helpful with a lot of columns that won't make sense to anyone. My employer may not be comfortable with that anyway.

    GROUP BY has been approved so go away now.

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by wey97 View Post
    ... so go away now.
    no, ~you~ go away

    calling someone a jerk for being frank just reveals your insecurities

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  13. #13
    Join Date
    Mar 2003
    Posts
    130
    Where's the "ignore" button?

  14. #14
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    This has been great, but I don't always have time for this during the workday. I was wondering if you guys would be coming to a city near me and if you have a late show.


    Now, back on topic. Distinct can be an ugly word, because it always causes a sort. A group by, I know in DB2 and probably the others, can sometimes avoid a sort IF proper indexing is in place to support it.
    Though as was already mentioned you should try to remove duplicates via your query where possible. In a different design of the first set of tables you mentioned, such as:
    table1
    table_id1(pk)
    ...

    table2
    table_id1
    table_id2
    ....
    pk = table_id1
    table_id2
    fk = table_id1

    table3
    table_id1
    table_id2
    table_id3
    ....
    pk = table_id1
    table_id2
    table_id3
    fk = table_id1
    table_id2
    You could do the check by looking for just the min/max of the next table in line or an open ended date for the current row of data, things of that sort. Also, with this sort of design you could just leave out table2 in the first query you mentioned.

    Also, I think you and Rudy got off on wrong foot. He does supply quite a bit of help to folks out here and was just trying to point out with the contrived stuff, you had incorrect info and that does take everyones time. You could supply the true tables but truncated to what you need to get point across, or rename the columns before posting

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •