Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    Join Date
    May 2003
    Posts
    8

    Unanswered: Performance problem on OUTER JOIN

    I'm looking for help with problems with very slow running queries. These have been narrowed down to an issue with OUTER JOINs.

    For example,
    FROM
    ...
    TAB1
    LEFT OUTER JOIN TAB2 ON TAB1.COL1 = TAB2.COL2,
    ...
    where there are indexes on both TAB1.COL1 and TAB2.COL2
    The query can be speeded up (though it loses rows) by changing this to
    FROM
    ...
    TAB1
    INNER JOIN TAB2 ON TAB1.COL1 = TAB2.COL2,
    ...

    The performance difference is a factor of around 20. I have looked at the explain plans for the 2 versions and the difference is that when the LEFT OUTER JOIN is used, a table scan of TAB2 is performed whereas for the INNER JOIN, the index on TAB2 is used.

    The problem does not relate just to this one query but seems to be inherent in the use of OUTER JOINs. Do you know of any way we can tune DB2 so that is uses indexes to process OUTER JOINs or is just the way the database works?

    Bruce

  2. #2
    Join Date
    Jan 2003
    Posts
    1,605
    Hi,

    I have in production many outer joins, but I have not experienced diference between inner and outer join.
    Please check if the both sqls returns the same amount of data. If there is outer join with 20 times more records it can last more time.

    I have tested the outer and inner join on Windows 2000 and on Linux S/390 (IBM mainfraime).

    Hope this helps,
    Grofaty

  3. #3
    Join Date
    May 2003
    Posts
    8
    Hi,

    The difference in the number of rows returned is not enormous - maybe 30% less.

    This system is running on Windows NT4 - which shouldn't be a million miles away from 2000. If you don't have a problem, it sounds like there is something wrong with my configuration.

    Any ideas?

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    In case you haven't already done so, Reorg both tables/indexes and do Runstats.

    Also, is this in a cursor? Are you actually fetching all rows, or stopping before the end of file on the cursor?

  5. #5
    Join Date
    May 2003
    Posts
    8
    Hi,

    I'm pretty sure that there has been a recent REORG/RUNSTATS but I did another on both tables just to be sure. It didn't make any difference.

    The data is not being fetched via a cursor - all rows are being returned.

    There is one other clue. The problem was first noticed after we upgraded from DB2 7.2 FP6 to FP9. I couldn't find anything relevant in the release notes but there could be a connection.

    Bruce

  6. #6
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1

    Re: Performance problem on OUTER JOIN

    Originally posted by neutrinoit
    I'm looking for help with problems with very slow running queries. These have been narrowed down to an issue with OUTER JOINs.

    For example,
    FROM
    ...
    TAB1
    LEFT OUTER JOIN TAB2 ON TAB1.COL1 = TAB2.COL2,
    ...
    where there are indexes on both TAB1.COL1 and TAB2.COL2
    The query can be speeded up (though it loses rows) by changing this to
    FROM
    ...
    TAB1
    INNER JOIN TAB2 ON TAB1.COL1 = TAB2.COL2,
    ...

    The performance difference is a factor of around 20. I have looked at the explain plans for the 2 versions and the difference is that when the LEFT OUTER JOIN is used, a table scan of TAB2 is performed whereas for the INNER JOIN, the index on TAB2 is used.

    The problem does not relate just to this one query but seems to be inherent in the use of OUTER JOINs. Do you know of any way we can tune DB2 so that is uses indexes to process OUTER JOINs or is just the way the database works?

    Bruce
    Well, it's not a perf problem! LEFT joins are not the same as INNER joins or RIGHT joins... They all have their respective purpose... You don't "optimize" a query by replacing a LEFT join with an INNER one!

    Since all rows from the "left" table in a LEFT join are present in the result set, DB2 (as Oracle, MySQL, InterBase, Informix, Firebird, SQL Server, Name_Your_Favorite_RDBMS_Here, etc.) has to do a table scan...

    Hope this Helps.

  7. #7
    Join Date
    May 2003
    Posts
    8
    Yes I realise they are different. I wasn't trying to optimise by replacing the LEFT JOIN by an INNER JOIN - just trying to narrow down the problem.

    I don't understand the last bit. It there is an index on the column being joined on, why does DB2 have to do a table scan. Isn't all the data it needs is in the index?

    Bruce

  8. #8
    Join Date
    May 2003
    Posts
    8
    PS If you read the original post, the table scan was on the right table not the left!!

  9. #9
    Join Date
    Sep 2002
    Location
    Montreal, Canada
    Posts
    224
    Provided Answers: 1
    Originally posted by neutrinoit
    PS If you read the original post, the table scan was on the right table not the left!!
    Oups! Quite strange indeed!

  10. #10
    Join Date
    Jan 2003
    Posts
    1,605
    Hi,

    Please read the general guide lines for DB2 performance on topic:
    How can I improve DB2 database performance?
    http://dbforums.com/showthread.php?threadid=782441

    Maybe you can fine something usefull for your problem.

    Hope this helps,
    Grofaty

  11. #11
    Join Date
    Jun 2003
    Location
    Canada
    Posts
    37

    Re: Performance problem on OUTER JOIN

    Without seeing the whole SQL statement and knowing more about the tables and indexes involved, it is somewhat difficult to give you definitive directions.

    Still, here are some immediate thoughts you may want to consider on your quest:
    - If this segment is part of a nested query, other sub-queries may restrict the optmizer from selecting the index desired;
    - The syntax of the complete statement may force the WHERE clause to be performed in Stage 2 after the join has completed;
    - in determining the most cost effective join order, DB2 has its hands tied more when it comes to OUTER JOINs than in the case of INNER JOINs;
    - The optimizer will not use the index when the query can be executed more efficiently without using an index (the table has a small number of rows or using the existing indexes might require additional I/O - based on the cardinality of the index and the cluster ratio of the index) -> check the clusterratio! ;
    - Depending on the size of the tables, CPU utilization, etc., the optimizer choosing a tablescan instead of an indexed search may actually have no observable impact at all...So the difference in performance may not be coming form the tablescan ;-)
    - Ambiguous cursors could be another reason but you said the query is defning a cursor.

    Otherwise, in general, grofaty says, there should not be a fundamental performance difference in serving an INNER JOIN vs. an OUTER JOIN.

    Hope this helps a notch.

    Julius

  12. #12
    Join Date
    Mar 2003
    Posts
    343
    I disagree with the statement that there should not be a difference in performance between an inner and an outer join. Let us take the example in question. When db2 is asked to do an inner join it only has to get the rows that matched on the joined columns from tab1 and tab2. However, when it has to do a left outer join, it has to get all rows in tab2 which match tab1 on join criteria and then all rows in tab2 which do not match. In order to do that, it would have to scan the entire tab2 relevant index and then get the rows - in this case, it would obviously make sense to do a table scan instead of an index scan.

  13. #13
    Join Date
    May 2003
    Posts
    8
    Thanks for that. I think that I understand the problem now.

    In this particular case I know which rows on the LEFT side of the join have a corresponding row in the RIGHT table. The join column in the LEFT table contains either a unique key into the joined table or NULL. Is there any way of expressing this in SQL without using a LEFT OUTER JOIN?

    Bruce

  14. #14
    Join Date
    Apr 2003
    Posts
    191
    Hi Bruce,

    the normal way to code around a left join is using UNION ALL.

    If you know from the left table if there is a corresponding row on the right table then going with union all should be very much ok.

    Johann

    Originally posted by neutrinoit
    Thanks for that. I think that I understand the problem now.

    In this particular case I know which rows on the LEFT side of the join have a corresponding row in the RIGHT table. The join column in the LEFT table contains either a unique key into the joined table or NULL. Is there any way of expressing this in SQL without using a LEFT OUTER JOIN?

    Bruce

  15. #15
    Join Date
    May 2003
    Posts
    8
    OK - I see how to do it now. Thanks everyone for your help.

    Bruce

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •