If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Fetch first in combination with order by

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 07-01-08, 07:12
SilencerandLois SilencerandLois is offline
Registered User
 
Join Date: Feb 2008
Posts: 25
Fetch first in combination with order by

Hi!
If you use a "order by" statement in DB2, the result is sorted via an external sort.
When I use "fetch first x Rows only" in combination with the "order by" statement, is there also done an external sort, and afterwards the x upper tuples of the result are fetched? Or are both operators combined, and the input relation is sorted e.g. via HeapSort of size x.
This would be more efficient than using an external sort, especially for small x

Perhaps, someone has a clue!

Regards,
Martin
Reply With Quote
  #2 (permalink)  
Old 07-01-08, 08:13
ARWinner ARWinner is offline
Registered User
 
Join Date: Jan 2003
Posts: 3,575
I do believe that the sort for the ORDER BY is done first, then the FETCH FIRST is done. This can be confirmed by using Visual Explain.

Andy
Reply With Quote
  #3 (permalink)  
Old 07-01-08, 08:20
SilencerandLois SilencerandLois is offline
Registered User
 
Join Date: Feb 2008
Posts: 25
Hi Andy,
should be done by visual explain
Using fetch first doesn't change the visual representation of the access path...

Martin
Reply With Quote
  #4 (permalink)  
Old 07-01-08, 08:29
ARWinner ARWinner is offline
Registered User
 
Join Date: Jan 2003
Posts: 3,575
Well you can confirm it by running two queries like:

select * from table order by x fetch first 100 rows only
select * from table order by x desc fetch first 100 rows only

If the table has more than 100 rows then the two queries will contain different data.

Andy
Reply With Quote
  #5 (permalink)  
Old 07-01-08, 08:56
SilencerandLois SilencerandLois is offline
Registered User
 
Join Date: Feb 2008
Posts: 25
Hi Andy,
thats right what you are saying.
But I asked, if the implementation of the combination of both command (order by and fetch first) is different:
- ordering and fetching are two separate operators, which are not merged during exectution
- ordering and fetching are merged in one operator during execution


The result of both is the same, but the second one could be more efficient.

Martin
Reply With Quote
  #6 (permalink)  
Old 07-01-08, 09:31
dav1mo dav1mo is offline
Registered User
 
Join Date: Dec 2007
Location: Richmond, VA
Posts: 782
Martin,
They are 2 separate pieces of the work. First your entire result set is sorted, then you fetch the first n rows. As you said this is not the most efficient way, but DB2 has to sort in the order you requested, before it can give you the number of rows you want, even if it means sorting several million of 'em.
Dave
Reply With Quote
  #7 (permalink)  
Old 07-01-08, 09:38
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
I don't know the answer myself to your specific question, but I do know that DB2 has many, many different sort implementations for the different situations (short rows, long rows, sort by integer/string/float, sort huge number of rows based on external sort, sort in main memory only, ...). So my guess would be that DB2 is smart enough to take advantage of the FETCH FIRST if possible. I.e. DB2 may not allocate the space to hold millions of rows in the sorted result set before finding the top 100 of them. (Of course, that can make a huge difference in performance when evaluating the ORDER BY.)

Btw, the two queries above will NOT return the same results. The first gives the top 100 rows while the second gives the last 100 rows (in reverse order).
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development

Last edited by stolze; 07-01-08 at 09:52.
Reply With Quote
  #8 (permalink)  
Old 07-02-08, 08:45
umayer umayer is offline
Registered User
 
Join Date: Dec 2005
Posts: 273
DB2 may take advantage of an index to avoid a sort. If there is no index containig the columns of the ORDER BY clause, what alternative has DB2 to find the requested rows, other than to sort the complete resultset ?

If there is a FETCH FIRST nn ROWS ONLY, DB2 will implicitly add OPTIMIZE FOR nn ROWS.
Reply With Quote
  #9 (permalink)  
Old 07-02-08, 10:06
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
@umayer: I am not sure if you are asking a rhetorical question or not...

What I was referring to is the actual implementation of such a sort operation. For example, if you do a FETCH FIRST 100 ROWS ONLY on a table with 1 million rows, DB2 only has to allocate enough sort space to hold 100 rows. Of course, all 1 million rows have to be considered. But you could also pipe all rows through the sort algorithm and only keep the first 100 rows found so far - and forget all others. Those 100 rows have a much, much higher chance to fit into main memory, while all 1 million rows may not and DB2 may have to resort to an external sort algorithm that spills to disk.

You will already have a worst-case run time of O(n log k), where n is the total number of rows and k the number of rows to be fetched, and we have n << k. Compared to O(n log n) for regular sort algorithms. (Those theoretical upper boundaries are ignoring disk I/O, as we all know.)

p.s: Again, I do not know for sure whether DB2 does such an optimization. But since it is pretty obvious, my assumption is that it is implemented.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
  #10 (permalink)  
Old 07-02-08, 10:51
SilencerandLois SilencerandLois is offline
Registered User
 
Join Date: Feb 2008
Posts: 25
Hi together!
Thanks a lot for the good replies!
I think, stolze says the correc thing!
I measured the performance of the following to queries:
Code:
SELECT * FROM TEST.X1000EQ
order by a desc fetch first 10 rows only;
Code:
SELECT * FROM TEST.X1000EQ
order by a desc;
Table x1000EQ has 1000000 tuples!

I executed both queries 160 times with the following result:
The first one takes about 1.09 seconds to execute.
The second one takes about 3.57 seconds to exeucte.

As a conclusion, I would say: DB2 does optimized something!

Martin
Reply With Quote
  #11 (permalink)  
Old 07-08-08, 12:35
dav1mo dav1mo is offline
Registered User
 
Join Date: Dec 2007
Location: Richmond, VA
Posts: 782
I think you left something out of your test. Try fetching the 10 rows without the order by.
SELECT * FROM TEST.X1000EQ
fetch first 10 rows only;
That should give you a better idea of the difference. I would tend to think the increased cost of your query has nothing to do with the ordering, but with the amount of data you are retrieving, 999,990 more rows of data. I, also, think you are ordering all of the 1000000 tuples of data in your first select as 1.09 seconds to fetch 10 rows is kinda long.
Dave
Reply With Quote
  #12 (permalink)  
Old 07-08-08, 12:48
stolze stolze is offline
Registered User
 
Join Date: Jan 2007
Location: Jena, Germany
Posts: 2,662
Use "db2batch" and specify that all rows should be fetched but not written to the output.
__________________
Knut Stolze
IBM DB2 Analytics Accelerator
IBM Germany Research & Development
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On