Results 1 to 4 of 4
  1. #1
    Join Date
    Aug 2003
    Posts
    13

    Unanswered: IN or EXISTS or JOIN?

    In oracle performance tuning documents, JOIN is often perferred compared with IN and EXISTS. This is because JOIN can use index to speed up data selection. If we can achieve our goal by any of the three methods, we should use JOIN.( Is this ture? )

    However sometimes JOIN is not suitable to express an EXISTS logic. See the following statements:

    (1)
    SELECT t1.name
    FROM customers t1, call_records t2
    WHERE t1.id=t2.id

    (2)
    SELECT t1.name
    FROM customers t1
    WHERE EXISTS
    (SELECT * FROM call_records t2
    WHERE t1.id=t2.id)

    (3)
    SELECT t1.name
    FROM customers t1
    WHERE id
    IN (SELECT id from call_records)

    Suppose we have indices on id column of both tables.

    Suppose call_records has millions of records and one customer can have a lot of call records. What we want is to find active customers, that is, customers appear in the call_records AT LEAST ONCE.

    In the above three SQL statements, (2) and (3) are the same and return exactly what we want. But (1) is different from (2) and (3) in that (1) will return quite a lot duplicate names. We can get the right result by removing the duplicates by using SELECT DISTINCT, which will cause a sort on the result set.

    The questions are here. Which method is more efficient, JOIN or EXISTS(IN), in this situation? And between EXISTS and IN, which is preferred?
    How and when is the index choosed to be used?

    Thanks

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1

    Re: IN or EXISTS or JOIN?

    Originally posted by crops
    In oracle performance tuning documents, JOIN is often perferred compared with IN and EXISTS. This is because JOIN can use index to speed up data selection. If we can achieve our goal by any of the three methods, we should use JOIN.( Is this ture? )

    However sometimes JOIN is not suitable to express an EXISTS logic. See the following statements:

    (1)
    SELECT t1.name
    FROM customers t1, call_records t2
    WHERE t1.id=t2.id

    (2)
    SELECT t1.name
    FROM customers t1
    WHERE EXISTS
    (SELECT * FROM call_records t2
    WHERE t1.id=t2.id)

    (3)
    SELECT t1.name
    FROM customers t1
    WHERE id
    IN (SELECT id from call_records)

    Suppose we have indices on id column of both tables.

    Suppose call_records has millions of records and one customer can have a lot of call records. What we want is to find active customers, that is, customers appear in the call_records AT LEAST ONCE.

    In the above three SQL statements, (2) and (3) are the same and return exactly what we want. But (1) is different from (2) and (3) in that (1) will return quite a lot duplicate names. We can get the right result by removing the duplicates by using SELECT DISTINCT, which will cause a sort on the result set.

    The questions are here. Which method is more efficient, JOIN or EXISTS(IN), in this situation? And between EXISTS and IN, which is preferred?
    How and when is the index choosed to be used?

    Thanks
    The three queries may work differently, but they all can make use of indexes - it's not only the JOIN query that can use an index as you say.
    Which is best depends on the relative amounts of data in the tables. If there are many customers and few call_records, and only a minority of customers have call records, then the IN will outperform the EXISTS (call_records is the driving table). If there are a small number of customers and many call_records, then the EXISTS is probably better tan the IN (customers is the driving table).

    The advantage of the JOIN query is that it does not force a particular access path for the query. The optimizer can choose to drive from either table based on the statistics it has about them. This is preferable if the distribution of data between the tables can change over time, or if you are not in a position to know it. Using IN or EXISTS is fine as long as you know that the access path is the most appropriate for the query.

    In pseudo-code, IN is like this:

    Code:
    FOR r IN (SELECT DISTINCT is FROM call_records) LOOP
      SELECT name FROM customers WHERE id = r.id
      Display name
    END LOOP
    EXISTS is like this:

    Code:
    FOR r IN (SELECT * FROM customers) LOOP
      SELECT 1 FROM call_records WHERE id = r.id AND ROWNUM=1
      IF FOUND THEN
        Display name
      END IF
    END LOOP
    JOIN could be like either of the above, or something else entirely (e.g. hash join).

  3. #3
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    Which method is more efficient will depend on your data, how your tables and indexes are configured, which optimiser your running, how your optimizer/database is configured and the version of oracle your running.

    So the best way to tune a query where performance is an issue is to try different approaches until you find the best (and look at actual stats not just the explain plan estimates AND dont forget to run the query more than once so you can see how it behaves when the data is cached and when it isnt cached). This will involve using production data and running the query on your production environment if possible (just because the query runs fine on development doesnt mean its going to run ok on production).

    So never assume that any one method is always the best.

    Alan

  4. #4
    Join Date
    Jul 2003
    Posts
    2,296
    - Open a sql session.

    - Make sure timed_statistics = TRUE (show parameter timed_statistics)

    - type: alter session set sql_trace=TRUE;

    - run all three query options

    - exit session

    - goto udump directory and find the trace file you just created (I always look by date/time to find the one I just made, is there any other way?)

    - tkprof the file: tkprof trace_file_name new_file_name

    - examine contents of your new file and determine which query ran the most efficiently

    - Pat yourself on the back at a job well done.
    - The_Duck
    you can lead someone to something but they will never learn anything ...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •