Results 1 to 12 of 12
  1. #1
    Join Date
    Feb 2008
    Posts
    25

    Unanswered: Fetch first in combination with order by

    Hi!
    If you use a "order by" statement in DB2, the result is sorted via an external sort.
    When I use "fetch first x Rows only" in combination with the "order by" statement, is there also done an external sort, and afterwards the x upper tuples of the result are fetched? Or are both operators combined, and the input relation is sorted e.g. via HeapSort of size x.
    This would be more efficient than using an external sort, especially for small x

    Perhaps, someone has a clue!

    Regards,
    Martin

  2. #2
    Join Date
    Jan 2003
    Posts
    4,292
    Provided Answers: 5
    I do believe that the sort for the ORDER BY is done first, then the FETCH FIRST is done. This can be confirmed by using Visual Explain.

    Andy

  3. #3
    Join Date
    Feb 2008
    Posts
    25
    Hi Andy,
    should be done by visual explain
    Using fetch first doesn't change the visual representation of the access path...

    Martin

  4. #4
    Join Date
    Jan 2003
    Posts
    4,292
    Provided Answers: 5
    Well you can confirm it by running two queries like:

    select * from table order by x fetch first 100 rows only
    select * from table order by x desc fetch first 100 rows only

    If the table has more than 100 rows then the two queries will contain different data.

    Andy

  5. #5
    Join Date
    Feb 2008
    Posts
    25
    Hi Andy,
    thats right what you are saying.
    But I asked, if the implementation of the combination of both command (order by and fetch first) is different:
    - ordering and fetching are two separate operators, which are not merged during exectution
    - ordering and fetching are merged in one operator during execution


    The result of both is the same, but the second one could be more efficient.

    Martin

  6. #6
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    Martin,
    They are 2 separate pieces of the work. First your entire result set is sorted, then you fetch the first n rows. As you said this is not the most efficient way, but DB2 has to sort in the order you requested, before it can give you the number of rows you want, even if it means sorting several million of 'em.
    Dave

  7. #7
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    I don't know the answer myself to your specific question, but I do know that DB2 has many, many different sort implementations for the different situations (short rows, long rows, sort by integer/string/float, sort huge number of rows based on external sort, sort in main memory only, ...). So my guess would be that DB2 is smart enough to take advantage of the FETCH FIRST if possible. I.e. DB2 may not allocate the space to hold millions of rows in the sorted result set before finding the top 100 of them. (Of course, that can make a huge difference in performance when evaluating the ORDER BY.)

    Btw, the two queries above will NOT return the same results. The first gives the top 100 rows while the second gives the last 100 rows (in reverse order).
    Last edited by stolze; 07-01-08 at 10:52.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  8. #8
    Join Date
    Dec 2005
    Posts
    273
    DB2 may take advantage of an index to avoid a sort. If there is no index containig the columns of the ORDER BY clause, what alternative has DB2 to find the requested rows, other than to sort the complete resultset ?

    If there is a FETCH FIRST nn ROWS ONLY, DB2 will implicitly add OPTIMIZE FOR nn ROWS.

  9. #9
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    @umayer: I am not sure if you are asking a rhetorical question or not...

    What I was referring to is the actual implementation of such a sort operation. For example, if you do a FETCH FIRST 100 ROWS ONLY on a table with 1 million rows, DB2 only has to allocate enough sort space to hold 100 rows. Of course, all 1 million rows have to be considered. But you could also pipe all rows through the sort algorithm and only keep the first 100 rows found so far - and forget all others. Those 100 rows have a much, much higher chance to fit into main memory, while all 1 million rows may not and DB2 may have to resort to an external sort algorithm that spills to disk.

    You will already have a worst-case run time of O(n log k), where n is the total number of rows and k the number of rows to be fetched, and we have n << k. Compared to O(n log n) for regular sort algorithms. (Those theoretical upper boundaries are ignoring disk I/O, as we all know.)

    p.s: Again, I do not know for sure whether DB2 does such an optimization. But since it is pretty obvious, my assumption is that it is implemented.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  10. #10
    Join Date
    Feb 2008
    Posts
    25
    Hi together!
    Thanks a lot for the good replies!
    I think, stolze says the correc thing!
    I measured the performance of the following to queries:
    Code:
    SELECT * FROM TEST.X1000EQ
    order by a desc fetch first 10 rows only;
    Code:
    SELECT * FROM TEST.X1000EQ
    order by a desc;
    Table x1000EQ has 1000000 tuples!

    I executed both queries 160 times with the following result:
    The first one takes about 1.09 seconds to execute.
    The second one takes about 3.57 seconds to exeucte.

    As a conclusion, I would say: DB2 does optimized something!

    Martin

  11. #11
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    I think you left something out of your test. Try fetching the 10 rows without the order by.
    SELECT * FROM TEST.X1000EQ
    fetch first 10 rows only;
    That should give you a better idea of the difference. I would tend to think the increased cost of your query has nothing to do with the ordering, but with the amount of data you are retrieving, 999,990 more rows of data. I, also, think you are ordering all of the 1000000 tuples of data in your first select as 1.09 seconds to fetch 10 rows is kinda long.
    Dave

  12. #12
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Use "db2batch" and specify that all rows should be fetched but not written to the output.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •