Results 1 to 12 of 12
  1. #1
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132

    Post Unanswered: JOIN vs Agregations

    Hi,
    Maybe the topic have already been touched
    in that case don't hesitate to give me a link on
    the corresponding thread.

    I don't know the history concerning the managment
    of joined tables through JOIN conditions or WHERE
    statements.
    But I was wondering if ANSI plans to manage aggregations
    with JOIN syntaxes (like subqueries when we deal with
    effectives dates for example) or if the future is given to
    the subqueries

    So do we have to keep using the JOIN syntax for his
    lisibility or finally use only subqueries or both ?

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    i'm not sure i understand your question, but i think the answer is "both"

    it would be best if you could give an actual example of the sql
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Sep 2003
    Location
    The extremely Royal borough of Kensington, London
    Posts
    778
    There is a problem with your question, which is you are comparing two completely different concepts. It is therefore not possible to give you an answer specifically addressing your question, however, I can comment on the use of aggregate functions and their relationship with the Cartesian product of two sets.

    The Cartesian product of two sets is not required in order to use an aggregate function. Conversely, one is not required to use aggregate functions in producing the Cartesian product of two sets.
    Bessie Braddock: Winston, you are drunk!
    Churchill: And Madam, you are ugly. And tomorrow, I'll be sober, and you will still be ugly.

  4. #4
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132
    Sorry I have badly presented my question
    Let me give you an example

    Let's imagine those two tables
    Code:
    +------------+
    | employee   |
    +------------+
    | empl_id    |
    | first_name |
    | last_name  |
    | company_id |
    +------------+
    
    +-----------+
    | empl_job  |
    +-----------+
    | empl_id   |
    | eff_date  |
    | job_title |
    +-----------+
    Let's imagine we'd like to get the current job title
    for each employees of the company xxx
    Is it better to do

    Code:
    SELECT
       a.first_name AS "FirstName"
     , a.last_name AS "LastName"
     , b.job_title AS "JobTitle"
    FROM
     employee AS a
     JOIN empl_job AS b ON (
      a.empl_id = b.empl_id
      b.eff_date = (
       SELECT MAX(eff_date)
       FROM empl_job AS c
       WHERE b.job_id = c.job_id
         AND a.empl_id = c.empl_id
         AND eff_date <= SYSDATE
      )
     )
    WHERE
     a.company_id = xxx
    or

    Code:
    SELECT
       a.first_name AS "FirstName"
     , a.last_name AS "LastName"
     , b.job_title AS "JobTitle"
    FROM
     employee AS a
     JOIN empl_job AS b ON (a.empl_id = b.empl_id)
    WHERE a.company_id = xxx
       AND b.eff_date = (
        SELECT MAX(eff_date)
        FROM empl_job AS c
        WHERE b.job_id = c.job_id
         AND a.empl_id = c.empl_id
         AND eff_date <= SYSDATE
       )
      )
    or

    Code:
    SELECT
       a.first_name AS "FirstName"
     , a.last_name AS "LastName"
     , b.job_title AS "JobTitle"
    FROM
       employee AS a
     , empl_job AS b
    WHERE a.company_id = xxx
       AND a.empl_id = b.empl_id
       AND b.eff_date = (
        SELECT MAX(eff_date)
        FROM empl_job AS c
        WHERE b.job_id = c.job_id
         AND a.empl_id = c.empl_id
         AND eff_date <= SYSDATE
       )
      )
    ?

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    so it appears that your question is "which approach is better?"

    the answer is: test them and see, each dbms is different

    by the way i can think of a few more variations of SQL for your problem
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  6. #6
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132
    With your experience,
    What syntax do you considere the most maintainable ?
    Which is the most portable ? and/or the most performant ?
    And if I have to specify a DB, let's choose Oracle

  7. #7
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    they are all equally maintainable

    the most portable one is the one which does not use proprietary SQL like SYSDATE

    performant? you will have to test them on your particular dbms
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  8. #8
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    The one that is easiest to maintain is the one that is easiest to understand by someone else. So maybe a few comments may help to explain what a subquery is doing or why certain constructs are used in specific places.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  9. #9
    Join Date
    Mar 2007
    Location
    636f6d7075746572
    Posts
    770
    What about:
    Code:
    SELECT
       a.first_name AS "FirstName"
     , a.last_name AS "LastName"
     , b.job_title AS "JobTitle"
    FROM
     employee AS a
     JOIN empl_job AS b ON (a.empl_id = b.empl_id)
     JOIN (
      SELECT empl_id, MAX(eff_date) as latest
      FROM empl_job AS c
      GROUP BY empl_id
      ) as x ON x.empl_id = a.empl_id AND x.latest = b.eff_date
    )
    WHERE a.company_id = xxx
    note: I think the above works the same as the others (i've not tested it).

  10. #10
    Join Date
    Oct 2004
    Location
    Paris, FRANCE
    Posts
    132
    Indeed this one seems good to maintain
    Generally I prefere always use JOINs because for example
    if you want to exclude a table you simply need to comment
    the concerned JOIN part instead of having to seach and
    comment all the conditions in the WHERE.

  11. #11
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Speaking about readability & maintainability: I prefer using "temporary views", also known as "common table expressions" (or CTEs):
    Code:
    WITH
    a(empl_id, "FirstName", "LastName") AS
    ( SELECT empl_id, first_name, last_name FROM employee
      WHERE company_id = xxx
    )
    ,
    b(empl_id, eff_date, "JobTitle") AS
    ( SELECT empl_id, eff_date, job_title FROM empl_job )
    ,
    x(empl_id, latest) AS
    (
      SELECT empl_id, MAX(eff_date)
      FROM empl_job
      GROUP BY empl_id
    )
    SELECT  a."FirstName", a."LastName", b."JobTitle"
    FROM
     a INNER JOIN b ON a.empl_id = b.empl_id
       INNER JOIN x ON x.empl_id = a.empl_id AND x.latest = b.eff_date
    )
    Performance should ideally be left to the DBMS' optimizer: no matter which formulation is used, the implementation (& performance) will be optimal.
    (Reality is still different, of course, but DBMSs are evolving towards the ideal world ...)
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  12. #12
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Careful: The SQL standard mandates that CTEs are evaluated sequentially and before the main part of the SELECT statement. Of course, this only means that an implementation (DBMS) must produce a result as if this evaluation order was used - the implementation can do whatever it likes as long as the correct results are returned.

    Therefore, while the rule to use CTEs is surely not a bad idea in many cases, I would just like to urge everyone else to not consider this as a hard rule.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •