# Thread: "order by clause" - query optimization problem

1. Registered User
Join Date
Feb 2004
Posts
12

## Unanswered: "order by clause" - query optimization problem

I am facing a weird problem. Basically I have a table with huge data and I have to execute a query which involves a left outer join on this table followed by some conditions.
say i have two tables T1 and T2 my query is:

select * from (select T1.a A1, T1.b A2 from T1,T2 where T1.a = T2.b (+) and T1.b = T2.c (+) and (some conditions......) order by T1.a) where rownum<=300;

Since the driving table (T1) has around 25 lakh rows and T2 has 8 lakh rows, the above query is very time consuming [it takes around 3 minutes] .
So we tried to optimize it in the following way:

The conditions were querying table T1.

select * from (select T1.a A1, T1.b A2 from T1 where (conditions ...)),T2 where A1 = T2.b (+) and A2 = T2.c (+) and rownum<=300;

The above query generates the similar execution plan. It still uses T1 as the driving table [instead of the sub query]. Still takes 4 minutes to complete. However if I modify the query as

select * from (select T1.a A1, T1.b A2 from T1 where (conditions ...) order by T1.a),T2 where A1 = T2.b (+) and A2 = T2.c (+) and rownum<=300;

it takes result set [view] as the driving table for NESTED LOOPS operation. Now the query executes in 30 seconds.

I would like to know why "order by clause in subquery" is making such a huge difference in execution plan.

Vinay100.

2. Registered User
Join Date
Apr 2004
Location
Posts
249

## Use of the Word "Lakh"

The word "lakh" is used in Indian English to mean 100,000 (one hundred thousand)

It is not a common word in the English language and therefore, many people on web sites with a potentially global reach will not understand it.

Please use commonly used words to make it easy for you to get your message across.

Re the increase in speed when using order by:

Is it possible that column "a" is not used in the other where (conditions ...) clause for table T1? If, additionally, column "a" has an index on it, then the order by will force the use of the index, thus speeding up the query.

Ravi

3. Registered User
Join Date
Feb 2004
Posts
12
Hi Ravi,
the column a1 has an index on it. It is the primary key of the table. The original tables are different [i cannot post them] .Following is the sample test scenario:

First query [with order by clause]

select a1,a2,a3,b.b1 b1,b.b2 b2,b.b3 b3 from (select A.a1 a1,a.a2 a2,a.a3 a
3 from A a where a.a3=30 order by a.a1),B b where a1=b.b1(+) and a2 = b.b2(+) and rownum<=20;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 COUNT (STOPKEY)
2 1 NESTED LOOPS (OUTER)
3 2 VIEW
4 3 SORT (ORDER BY)
5 4 TABLE ACCESS (FULL) OF 'A'
6 2 TABLE ACCESS (BY INDEX ROWID) OF 'B'
7 6 INDEX (UNIQUE SCAN) OF 'SYS_C001663' (UNIQUE)

-------------------------------------------------------------------------
second query [ without order by clause]

select a1,a2,a3,b.b1 b1,b.b2 b2,b.b3 b3 from (select A.a1 a1,a.a2 a2,a.a3 a3 from A a where a.a3=30),B b where a1=b.b1(+) and a2 = b.b2(+) and rownum<=20;

Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE
1 0 COUNT (STOPKEY)
2 1 NESTED LOOPS (OUTER)
3 2 TABLE ACCESS (FULL) OF 'A'
4 2 TABLE ACCESS (BY INDEX ROWID) OF 'B'
5 4 INDEX (UNIQUE SCAN) OF 'SYS_C001663' (UNIQUE)

4. Registered User
Join Date
Apr 2004
Location
Posts
249
Since your query is using the column a.a3 in the where clause, it may help to create a non-unique index on this column, if feasible. Of course, creating an index on a table that has many different queries running against it may affect the other queries. Also, inserts, updates and deletes will be slightly slower.

In your explain plan, the order by is not really not speeding up the query. Run the two queries one or two more times and check the performance.

Ravi

5. Registered User
Join Date
Feb 2004
Posts
12
Ravi,
The order by clause is causing a lot of difference. If you notice, in the case with order by clause, the input to NESTED LOOP (OUTER) IS A VIEW. This contains only few rows whereas in the other case [without the clause] it is the entire table. Correct me if I am wrong.

6. Registered User
Join Date
Apr 2004
Location
Posts
249
If you look at the explain plan outputs, and read three lines from the bottom up, they are the same.

I'll guess here and say that either b1 or (b1,b2) is the uniique key of table b here.
When you sort within the query, you are ordering the data selected from "a" by the unique key columns of "b". Since the index on table b has to work on pre-sorted rows, the actual execution becomes faster.

The increase in speed you see is not the result of just the order by clause, but a result of the fact that the order by clause orders it in a way that can be used by the next join/where clause.

With or without the order by clause, the set of data that is used for the "nested loop" operation should be the same.

If you are really not interested in the ordered set, and are only interested in getting a fiixed number of rows, you may try to add an additional clause inside the first inline view (select from a). Like this:

HTML Code:
```
select a1,a2,a3,b.b1 b1,b.b2 b2,b.b3 b3
from
(select A.a1 a1,a.a2 a2,a.a3 a3
from A a
where a.a3=30 order by a.a1
and rownum <= 20 ),  -- note new clause here
B b
where a1=b.b1(+)
and a2 = b.b2(+)
and rownum<=20;

```

This should work the same as either of your queries but should be much faster.

Ravi

7. Registered User
Join Date
Feb 2004
Posts
12
I have one more point to make here. Please let me know if I am not thinking in the right direction.

For the NESTED LOOP operation in case of outer join, optimizer will choose one table as the driving table. Now for EACH ROW OF THE DRIVING TABLE it tries to match the condition in the second table.

In case of the first execution plan the driving table is a view which contains less number of rows [may be 100 as compared to the total rows in table A which is around 2,000,000].

In case of second the driving table is the entire table A, so the operation will loop over the entire table[2,000,000 rows]. So 'order by' clause makes some difference I think.

Vinay.

8. Registered User
Join Date
Apr 2004
Location
Posts
249
In both cases, the nested loop will operate only on the subset of data returned by the inline view (select ... from a where ...).

SQL is a declarative language. That is, you declare what data you want, and the DBMS gets it for you. In general, you do not specify the order of operations.

Order by has elements of procedural logic. It says, first get all the data that matches the specified criteria, then sort it.

As such, with or without an order by, the same identical subset of rows of table "a" will be returned before table "b" is processed.

In both cases, first there will be a full scan of table a. The nested loops operation will only loop over the subset of data returned by the inline select, not the full table. At least, that will be so in the case under discussion.

Ravi

9. Registered User
Join Date
Apr 2003
Location
Greenville, SC (USA)
Posts
1,155
Throw 2 cents worth in ...

If you are returning a lot of rows and sorting them ... I have a feeling that you might be doing some "disk sorts" ... Ensure that your sort_area_size and the default next size of the temporary tablespace are equal....

If your sort_area_size is small, you might look at increasing it also, but try to keep the next extent in the temporary tablespace equal to the sort_area_size

HTH
Gregg

10. Registered User
Join Date
Feb 2004
Posts
12
I tried it but got the same results.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•