Results 1 to 13 of 13
  1. #1
    Join Date
    Jul 2002
    Location
    ATL
    Posts
    170

    Unanswered: How to avoid Nested Loop join

    I have a query running hours and when ran an explain on that query it is spending lot of time on nested loop join. How can i prevent the optimizer using nested loop join.

    I cannot rewrite the query as it is generated by third party tool. As a DBA what changes i can make at system catalogs to make optimizer not to do NLJOIN

    thx
    koganti

  2. #2
    Join Date
    May 2003
    Posts
    113
    An index on the joined columns should be helpful.

  3. #3
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    Koganti,
    Nested loop joins are what you are after. According to IBM it is the most efficient type of join. As mentioned already, perhaps you do not have an index on the tables that matches on the columns you are joining between the tables. How many columns of the index are matching? Are you performing a tablespace scan on the outer table of the join and that causing your long run times? For anyone to properly give you advice you really need to provide more info ( SQL, table and index definitions, the explain).

  4. #4
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Quote Originally Posted by dav1mo
    According to IBM it is the most efficient type of join.
    Where did you read that? I'm asking because such a statement is simply not true in general.

    There are (many) cases where nested loop joins are the best choice. But there are also (many) situations where other joins like hash joins or sort-merge-joins are way better. The DB2 optimizer takes all join types into consideration and chooses the access plan must suitable for the query.

    For anyone to properly give you advice you really need to provide more info ( SQL, table and index definitions, the explain).
    That's right. Without knowing those details (and also knowing which DB2 version on which platform is used), no one can provide a good answer.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  5. #5
    Join Date
    Jul 2002
    Location
    ATL
    Posts
    170
    Sorry guys i was not able to post sql and explain as it is very large.

    But NLJOIN is the most expensive for me. I have all the right indexes on all the columns that are joined. Iam having tables with 5million rows and the other tables are pretty empty. I still indexes the emplty tables and make them volatile. Iam just trying to figure out if i can do anything on DB side or updating any stats to cheat the optimizer not to choose NLJOIN

  6. #6
    Join Date
    Jul 2002
    Location
    ATL
    Posts
    170
    Iam on AIX 5.3 V8.1 Fixpack6 and the query is generated by peoplesoft

  7. #7
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    The most efficient join method is usually the one that DB2 chooses, assuming that the statistics are correct and the correct query optimizaiton level is used (this is one thing you could try and change with "set current query optimization 7" or even "set current query optimization 2").

    However, all other things being equal, if DB2 chooses a nested loop join, and DB2 chooses it correctly, than the query will probably run faster than a similar (but slightly different) query that uses some other kind of join method. I think that is what dav1mo is trying to say.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  8. #8
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    With a relatively small *outer* table (in terms of # rows), and an index on the join columns of the *inner* table, a NLJ is indeed optimal.

    The optimizer will typically use other join methods when the tables become (much) larger (in terms of # rows) but not too wide, since e.g. merge join requires the tables to be accessed in a sorted way.

    This sorted access can be achieved by an index access; but if an other index is already used for filtering, sorting will be obtained by an additional physical sort.

    I'm assuming you're just joining two tables, i.e., the NLJ is not the last step in a multi-table join.
    If you really want a merge join, try the following:
    * suppose filtering on tableA is on columns colA1 and colA2,
    * filtering in tableB is on colums colB1 and colB2,
    * the join condition is colA3 = colB3 AND colA4 = colB4
    Then join need either the two indexes (colA3, colA4, colA1, colA2) and (colB3, colB4, colB1, colB2) or the two indexes (colA4, colA3, colA1, colA2) and (colB4, colB3, colB1, colB2).

    Make sure the cardinality statistics for your tables are correct! Default statistics are typically too small to make NLJ suboptimal.

    Also, if the two indexes are not the cluster indexes, a more complex join method (hybrid join) will be needed to avoid sorting, so if possible try to make those two indexes clustering.
    Last edited by Peter.Vanroose; 12-21-07 at 03:34.
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  9. #9
    Join Date
    Jan 2003
    Posts
    1,605
    Quote Originally Posted by koganti
    Sorry guys i was not able to post sql and explain as it is very large.
    Hi,
    click on "Post Reply" button and there is "Manage Attachment" button and attach file to forum. Also please post SQL and also tables definitions (db2 describe table table_name). Also post indexes on tables: (db2 select * from syscat.indexes where tabname='TABLE_NAME') also post primary keys and foreign keys definitions. Also post explain.

    Before executing explain execute statistics on all tables in SQL:
    db2 runstats on table table_name with distribution and detailed indexes all

    P.S.: There is no magic command to solve the problem. You need to analyze explain output.
    Hope this helps,
    Grofaty

  10. #10
    Join Date
    Jan 2003
    Posts
    1,605
    Quote Originally Posted by stolze
    There are (many) cases where nested loop joins are the best choice. But there are also (many) situations where other joins like hash joins or sort-merge-joins are way better. The DB2 optimizer takes all join types into consideration and chooses the access plan must suitable for the query.
    Hi,
    I agree. It depends what kind of SQL and data model you have. Some complex SQLs uses hash join because it is more suitable. DB2 is just using the join method what is the optimal case by case. But hash joins is one possible indicator that there is some missing index on table.
    Hope this helps,
    Grofaty

  11. #11
    Join Date
    Aug 2008
    Posts
    1

    Solution Here

    I understand exactly the issue the poster described coming especially coming from oracle/informix to work with db2.

    In warehousing environments its not always possible to make sure staging indexes align exactly with all other tables especially when you may be interfering with user analysis indexes.

    The following query takes forever to return results because of the sorting...
    ------------------------------------------------------
    Select
    BigTable1.Column
    From BigTable1
    Left
    Outer
    Join BigTable2
    On BigTable1.UnIndexedKey = BigTable2.IndexedKey
    ------------------------------------------------------
    Db2 for some fun reason decides to spend plenty of time in many situations sorting the BIG table before it does anything. Which in my opinion is a short term waste of time.

    So after some painstaking fun waiting for queries to return rows... I went back to a tride and true screamer of performance which avoids some painfull sorts when you know you don't want them.

    ---------------------------------------------------------------------
    Select
    (Select Column From BigTable2 Where BigTable2.IndexedKey =
    BigTable1.UnIndexedKey) As Column
    From
    BigTable1
    ---------------------------------------------------------------------
    The second approach yielded immediate results by the thousands per second and did not exhibit the 10 to 15 minute wait period on sorting across millions of rows shown in the first query. The end to end net thru-put on the tables to return all 100million rows was faster using method two than the sort approach because of the overhead in up front processing.

    I'm actually a little dissapointed that the db2 optimizer doesn't approach the first query better than this or at least allow a user to supplement a directive in the query SQL to avoid the sort.

    * This was also just a pain to test queries on large volumes... hitting execute and having to wait 10 to 15 minutes to see how fast it was going to run is a real pain especially between just two tables.

    I'm sure some of this may have to do with how my client's dba is executing statistics on the data but... i still am a bit frustrated at the inability to override with a hint with inline sql.
    ----------------------------------------------------------------------

    * I hope this helps some others feeling some pain from the "Sort all the time" algorthym.

  12. #12
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Have you opened a PMR with IBM support to get this addressed? For sure, there are situations where the DB2 optimizer does not choose the best plan - for various reasons. For those cases, it is your best shot to work together with the developer to get the issue resolved.

    The other approaches are:
    • There are specific classes/courses that teach how to write SQL statements and work around issues where the DB2 optimizer doesn't choose the right plan.
    • You can use optimizer guidelines to force the optimizer to a specific plan.

    That it takes a long time for "just two tables" is a strange statement. Of course it can take a long time - it all depends on the query itself (and what the optimizer makes with that). The number of tables is not the deciding factor.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  13. #13
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by timswiley
    I'm actually a little dissapointed that the db2 optimizer doesn't approach the first query better than this or at least allow a user to supplement a directive in the query SQL to avoid the sort.
    I completely understand your disappointment.
    Still, (1) DB2 *does* implement the query in the best possible way (based on the optimizer's knowledge), and
    (2) DB2 allows passing hints/directives together with a query (be it not as flexibilly as e.g. Oracle does).

    1. Indeed, it's the (lack of correct) statistics that make the optimizer choose the "wrong" implementation. With his knowledge of the data and the system, the "sort" is faster/cheaper (in I/O and CPU) than the nested loop.
    2. Rewrite the query such that you force a nested loop join.
    This query rewrite is an art on its own, so I agree it's not a user-friendly aspect of DB2.
    But this is not a bug of DB2, rather a feature: suppose the data changes (e.g., the tables grow or shrink) so the statistics change. Then the implementation will change again (e.g. from nested loop join to merge scan join with sorting) without you having to rewrite the query over and over again. So DB2's philosophy is: "the optimizer knows better than the query writer", exactly in the spirit of the original SQL and RDBMS design back in the 70ies!

    To come back to your query: rewrite it as follows and it will (most likely) run faster (i.e., without sorting):
    Code:
    Select
    BigTable1.Column
    From BigTable1
    Left
    Outer
    Join BigTable2
    On BigTable1.UnIndexedKey = BigTable2.IndexedKey
    OPTIMIZE FOR 1 ROW
    (Beware: this is not a "one fits all" solution: your need to analyze the problem (merge scan instead of nested loop) then rewrite the query such that the "better" implementation becomes more attractive to the optimizer.
    That's by the way exactly what you did, by using a subquery.
    Both solutions have the same drawback: a nested loop join is forced, even when the data changes such that a sort-based join becomes optimal.
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •