Results 1 to 5 of 5
  1. #1
    Join Date
    Oct 2009
    Posts
    5

    Unanswered: Why use a where clause when filters can go in join clause?

    Hello All,

    This is a general question from a report designer with 7 or so months of SQL experience:

    I write my queries with the ANSI 92 "join...on" syntax and typically put all my equi-join conditions in the "on" clause and my filters in the where clause. In the case of outer joins I obviously put my filters in the join clause. I like putting my filters in the where clause because I feel that it's more readable that way.

    My team has one BI genius who gives us pointers every now and then, and recently he told my supervisor that it's better to put the majority of conditions in the on clause exclusively. This leads to on clauses like below:

    Code:
    join crm.party_role prl
             on pe.id = prl.party_id
             and prl.party_type_role_code = 'stu'
             and pe.created_on >= '01-JAN-2010'
    If I can put all of the filters in the on part of the join clause like above... then what is the purpose (in ANSI 92 syntax) of having a where clause at all? Are there really any performance benefits of doing it the way above as opposed to this below?:

    Code:
    join crm.party_role prl
             on pe.id = prl.party_id
           ............
    
    WHERE   prl.party_type_role_code = 'stu'
             and pe.created_on >= '01-JAN-2010'
    Thanks in advance for the feedback!

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    It's all about the order in which predicates are applied. Join predicates are evaluated first, then the WHERE clause predicates are evaluated against the result of the join(s). This is precisely why you "obviously put my filters in the join clause".

    These days query optimizers are often intelligent enough to "push down" WHERE clause predicates to an earlier stage of the query execution if it can be determined that they will not affect the join logic. In the time of yore this may not have been the case, and a programmer would perform the push-down manually, by moving predicates around, to help the optimizer.

    You can compare execution plans of the two query variants to see if the manual push-down makes any difference in your situation.
    ---
    "It does not work" is not a valid problem statement.

  3. #3
    Join Date
    Oct 2009
    Posts
    5
    Hi Nick,

    Thanks for your reply. That but about the optimizer is my whole point though. I'm reasonably sure that id the columns are indexed that the evaluation of joins/filters will always be close to optimal.

  4. #4
    Join Date
    Oct 2009
    Posts
    5
    Sorry about that somewhat short and misspelled reply. I'm typing on a mobile device and some sort of glitch caused that to post prematurely.

    Anyway. I guess my point is that unless I'm giving optimizer hints, won't the query have the same execution plan regardless of where I place my joins/filters and in what order I place them.

    Let me apologize if I'm wrong. I'm an mis major who has only been doing this for 6 months. I don't really need to tune my queries as they run at most once a day. In fact, most of them are one off for ad hoc requests.

  5. #5
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Query execution plan is secondary to the logic of the query. You have shown that you understand that when you place predicates appropriately when using outer joins. The same principle applies regardless of the join type though; you join the sets first, then filter whatever is needed.

    When the query gets to the optimizer, it may realize that with inner joins some predicates (e.g. those that involve only one set) can be pushed down and applied before the join, thus potentially improving performance. However, in many cases the optimizer is unable to make a decision and leaves such predicates for after the join, thus negatively affecting performance.

    In your example "pe.created_on >= '01-JAN-2010'" could confuse the optimizer, while as a human being you would know that the condition could be applied before the join, reducing the number of rows that need to be joined.

    That's why I said that, unless you look at the plans of both queries, it's impossible to say whether the two query variants will perform differently, while it's clear that logically they are equivalent.

    Hope this all makes sense.
    ---
    "It does not work" is not a valid problem statement.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •