Results 1 to 7 of 7
  1. #1
    Join Date
    Mar 2003
    Posts
    69

    Unanswered: db2 sql Query - Replace "Not in"

    Hi,
    I have a sql query which is taking around 4 minutes to execute. We are using the os/390 db2 v6.1 and connecting using the db2connect v 7.1.

    The costly part of the query is "not in". We would like to remove this and I know that replacing "in" to "=" yields good results. How should I implement the same for "not in".

    Bellow is an example which simulates my sql query.

    "select app_no from table1 a, table2 b where a.app_no = b.app_no and
    b.stat_f not in ('0','1','2','3','4','5') "

    the stat_f can contain any value including blank.

    Please advise me.

    Thanks,
    Anto.

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    How about:
    b.stat_f > '5' and b.stat_f < '1'

    Not sure if the above would actually run faster.

    How large is the joined table? How many rows do you expect to return. If the number of returned rows is small compared to the table size, you might benefit from an index on b.stat_f.

    I assume you have indexes on a.app_no and b.app_no?

  3. #3
    Join Date
    Mar 2003
    Posts
    69
    Yes, app_no is the primary key. So I have the index for it.
    Also I can not mention the > 5 and <1 as I have some char's also.
    Basically bellow are the values which I am expecting should be placed under the "not in". Sorry for not mentioning in the previous post. My sql query is,


    "select app_no from table1 a, table2 b where a.app_no = b.app_no and
    b.stat_f not in ('0','1','2','3','4','5','6','7','8','A','B','C',' D') "

    I would say most of the times I need to check whether stat_f is blank, but some times it may be the other conditions.

    table1 has around ten million records and table2 also have around ten million records.
    And the number or rows it returns may be between hundred to eight thousand.
    We already have index for bgck_stat_f also.

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    What are the valid values you are looking for? Try to put them in one or more 'between' statements rather than excluding the ones you mentioned.

    You could define b.stat_f as the clustering index for the table, but that might not be appropriate based on other query or transaction requirements.

  5. #5
    Join Date
    Mar 2003
    Posts
    69
    Yes, between has significant improvement. I am wondering what is the difference does the db2 engine makes for the "in" "between" and "=". Beacause, when I use "=" it is very fast compared with "in".

  6. #6
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    You would have to do an explain and see if the "not in" precluded the use of the index on b.stat_f compared to "=" or "between"

  7. #7
    Join Date
    Jun 2003
    Location
    Canada
    Posts
    37
    NOT IN, NOT LIKE and NOT BETWEEN predicates pretty much preclude the optimizer from using a matching index scan. Always try to use positively formulated predicates.

    If you know the complete domain of the possible b.stat_f values, and the total set is only a couple values more than the list given in your NOT IN predicate, you may chose to use IN (...) instead. This way the optimiser will use the matching index scan approach if you have an appropriate index defined. I know this obscures the business logic somewhat, but if performance is a significant issue, one can possibly live with such compromises.

    Of course, if this is THE critical statement to run against the b table, clustering on the stat_f field will further improve your situation.

    Cheers,

    Julius

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •