Results 1 to 2 of 2
  1. #1
    Join Date
    Sep 2002
    Posts
    456

    Unanswered: Query performance

    OS390 V. 7
    DB2 UDB

    Hi Guys,
    I have been playing around with some queries the last couple of days trying to improve their performance. I have "discovered" that using outer joins can have a very negative impact on performance.

    The situation we have encountered occurs with several queries with the following format:

    Select col1, col2, col3
    FROM tab1 C left join tab2 b
    ON A.ID = B.ID
    INNER JOIN TAB3 c
    ON C.ID = A.ID
    WHERE A.CONDITION= VALUE

    I have tried rearranging the order in which the tables were joined, but the results were always the same.

    When I look at the explain, it shows that DB2 is producing a materialized table. This table is an 'intermediate' place to store the temporary data. DB2 cannot resolve some joins in a single step. I think it resolves the inner join into the materialized table and then works on the outer join. My understanding of the problem is that the materialized table has no indexes, so the resulting tablespace scans are very expensive - especially if you have a large number of rows in that materialized table (which we do).

    To resolve this problem, as there appears to be no way to improve this query with the LEFT JOIN, I have gone back to using a UNION. The first half of the statement picks up all the rows for the inner join (tables A and C) that do not have any related entries in table B. The second half picks up those rows that have related entries in A, B and C (i.e. both parts of the UNION have only inner joins).

    The UNION statement looks like the following:
    SELECT TABLEA_columns, TABLEB_columns, '-----', '-----', ...
    FROM TABLEA A, TABLEB B
    WHERE A.ADDR_ID = B.ADDR_ID
    AND NOT EXISTS
    (SELECT 1
    FROM TABLEC C
    WHERE C.ADDR_ID = B.ADDR_ID)
    AND A.CLIENT_ID = :hv
    UNION
    SELECT TABLEA_columns, TABLEB_columns, TABLEC_columns
    FROM TABLEA L, TABLEB M, TABLEC N
    WHERE L.ADDR_ID = M.ADDR_ID
    AND M.ADDR_ID = N.ADDR_ID
    AND L.CLIENT_ID = :hv

    Using the UNION approach, the estimated cost of the query went down by two orders of magnitude (i.e. 1/100th of the cost, from E+06 to E+04). It's still expensive for an online query, but it's much better than it was.

    A bit of research in the IBM manuals and on the DB2 users list revealed nothing of use. Other shops have experienced the problem and the only solution suggested seems to be implementing the query with a UNION.

    ---------------------------------------------------------------------

    Am I right in assuming all this or there is another way to write this query?

    Thanks in advance

    Paul

  2. #2
    Join Date
    Jun 2002
    Location
    UK
    Posts
    525
    I have had similar problems. I found that the optimiser would evaluate the joins before applying the conditions in the WHERE predicate. It seemed to help if I expanded the conditions in the join syntax.

    Select col1, col2, col3
    --FROM tab1 c left join tab2 b
    FROM tab1 a left join tab2 b
    ON A.ID = B.ID
    INNER JOIN TAB3 c
    ON C.ID = A.ID
    -- WHERE A.CONDITION= VALUE
    AND A.CONDITION = VALUE

    HTH

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •