Results 1 to 13 of 13
  1. #1
    Join Date
    Mar 2008
    Posts
    3

    Unanswered: Difficult query problem - can oracle hints save the day?

    Greetings! This is my first post. I found this place while googling. I have an interesting problem that maybe someone here can help me solve. I'll reduce it to a highly contrived version, so as to avoid as much noise as possible.

    Let's say I have a table with just two columns: "thingger_date" and "thingger_value". And let's say there is an index on just "thingger_date". If I run the following query:

    Code:
    select 
        thingger_date 
    from 
        the_table 
    where 
        thingger_date between :startDate and :endDate
        and rownum < 26
    This will give me the first 25 dates, sorted increasingly since it comes directly out of a range index. However, if I were to change it to get a field that is not in the index as well:

    Code:
    select 
        *
    from 
        the_table 
    where 
        thingger_date between :startDate and :endDate
        and rownum < 26
    Then, all of a sudden, there is no guarantee of ordering. The only explanation I can think of is that as oracle hashes the rowids into the table, it ignores the order it's getting the data back in.

    The question is, is there a way to override this behaviour? That is, I'd like the data coming back from the second query to be guaranteed to be sorted by date.

    Now, you might be thinking, "just get rid of the rownum and stick an 'order by' in there"... Unfortunately, sorting the full data set and just shaving off the top 25 rows would be a no-go, due to performance reasons (there might be a very large number of rows within the date range that is queried).

    Another possibility is to do a nested query, where the inner one uses just the index fields, which will guarantee the right results in this case. However, remember, this is a highly-contrived example. What if I wanted to add a "and thingger_value = 'foo bar baz hahahaha'"? Filtering on the top 25 results would not give me 25 final results, and it might be impractical to index that value field...

    Is there a way to specify the execution plan of that query in such a way that it maintains the row order that came out of that index when it retrieves the rest of the data from the table? [I hope this made sense]

  2. #2
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Try this:
    Code:
    SELECT * 
    FROM (SELECT thingger_date, other_column
          FROM the_table 
          WHERE thingger_date between :startDate and :endDate
          and rownum < 26)
    ORDER BY other_column

  3. #3
    Join Date
    Dec 2007
    Posts
    253
    Quote Originally Posted by replicon
    This will give me the first 25 dates,
    No, it will give you 25 random rows. At this point in time they happen to be the rows that you are expecting, but after time and dml that will change.
    sorted increasingly since it comes directly out of a range index.
    You are under a misapprehension here. The rows are NOT returned in a sorted order. Any order that you view is purely coincidental There is 1 (very very well documentated and oft-discussed)over-riding rule to sorting.
    If you do not have an order by clause in your statement, you do NOT have a sorted resultset. This is a rule that is immutable. It matters not one jot what you see on the screen in front of you. No Order By, No sorted resultset can be guaranteed.

    Then, all of a sudden, there is no guarantee of ordering.
    There never was.
    The only explanation I can think of is that as oracle hashes the rowids into the table, it ignores the order it's getting the data back in.
    Nope, this is because you have not included an Order By in your SQL statement.
    The question is, is there a way to override this behaviour?
    That is, I'd like the data coming back from the second query to be guaranteed to be sorted by date.
    Use an order by if you want an ordered set of results.

    Now, you might be thinking, "just get rid of the rownum and stick an 'order by' in there"... Unfortunately, sorting the full data set and just shaving off the top 25 rows would be a no-go, due to performance reasons (there might be a very large number of rows within the date range that is queried).
    Since you are not properly querying for a Top N analysis, this point is moot. To get Top N You Must Order the results.

    Another possibility is to do a nested query, where the inner one uses just the index fields, which will guarantee the right results in this case. However, remember, this is a highly-contrived example. What if I wanted to add a "and thingger_value = 'foo bar baz hahahaha'"? Filtering on the top 25 results would not give me 25 final results, and it might be impractical to index that value field...
    once again, a moot point as your initial query is not an accurate Top N analysis.
    Is there a way to specify the execution plan of that query in such a way that it maintains the row order that came out of that index when it retrieves the rest of the data from the table? [I hope this made sense]
    Use Order By. Sorry to bang on, and I'm sure that you will come back with something like "but because the index stores it in sorted ... blah blah" Sorry, No. The rule remains.
    Top N analysis is
    Code:
    SELECT col_list
    FROM (subquery ordered by the appropriate column)
    WHERE appropriate row restriction to get the top x results.

  4. #4
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by pablolee
    If you do not have an order by clause in your statement, you do NOT have a sorted resultset. This is a rule that is immutable. It matters not one jot what you see on the screen in front of you. No Order By, No sorted resultset can be guaranteed.
    Not entirely true. If the table is an index organized table the rows are stored sorted according to the primary key. Which also means that they will be returned in a sorted fashion, even if the SELECT didn't specify an ORDER BY.

    But that is the only case I know where this the case.

  5. #5
    Join Date
    Mar 2008
    Posts
    3
    So wait, what you're saying is, if I have a table with millions of rows, indexed by date, I can't get the oldest row within a large date range without doing the practical equivalent of a full index scan (in the worst case)? That sounds highly impractical... There has to be a better way.

  6. #6
    Join Date
    Dec 2007
    Posts
    253
    Not entirely true
    Yes, entirely true. Have a look here:
    The horse's mouth

  7. #7
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by replicon
    So wait, what you're saying is, if I have a table with millions of rows, indexed by date, I can't get the oldest row within a large date range without doing the practical equivalent of a full index scan
    Sure you can. If you have an index, and your ORDER BY sorts on the column that is indexed, Oracle will happily use that index. If that is the only column in your select it won't even bother reading the table, but will only read in the index values. I would assume that
    Code:
    SELECT whatever
    FROM the_table
    WHERE the_date_col = (SELECT min(the_date_col) FROM the_table)
    will make use of an index on the_date_col and will not do a FTS (unless it expects a large number of rows returned by the condition).
    And by the way: a full table scan is not always a bad thing!

    Quote Originally Posted by pablolee
    Yes, entirely true. Have a look here:
    Hmm. Tom Kyte does not speak about IOTs there. And my point is only about IOTs.
    The manual says:
    Quote Originally Posted by manual
    If an ORDER BY clause only references the primary key column or a prefix of it, then the optimizer avoids the sorting overhead, as the rows are returned sorted on the primary key columns.
    And that's precisely what I was talking about.

    Having said that: I would probably still include the ORDER BY with an IOT, just to be sure
    Last edited by shammat; 03-02-08 at 15:12.

  8. #8
    Join Date
    Dec 2007
    Posts
    253
    If an ORDER BY clause only references the primary key column or a prefix of it, then the optimizer avoids the sorting overhead, as the rows are returned sorted on the primary key columns.
    Yes, the optimizer can skip the order by. But that still does not guarantee a sorted output. I knew that Tom dealt with this directly, I just couldn't find it for my last post. here

  9. #9
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    OK, thanks.
    It seems the Oracle manual is a bit unclear here.
    I'm glad I always use the ORDER BY, so no surprised for me

  10. #10
    Join Date
    Dec 2007
    Posts
    253
    Good stuff, I've now bookmarked that thread on this box. It'll be the only thing that I post next time a see this question.

  11. #11
    Join Date
    Mar 2008
    Posts
    3
    Thanks for all the excellent responses so far!

    I get the impression that my contrived example might have been a bit... too contrived.

    Really, what it comes down to is, I have a very large table, and an index on a date field, and I want to be able to throw a web UI in front of that to paginate over the results on a date range, 25 entries at a time, without having to waste resources retrieving throw-away data.

    Maybe I need to be smarter about it and break up the query in software, so if my date range is "november to december", I only query for "nov 1" and then I could sort it and will only have to do two queries when I hit a boundary date range... it just sucks that the Oracle optimizer doesn't have a way of either figuring this out, or being told via hints how to do the right thing in this case.

  12. #12
    Join Date
    Nov 2003
    Posts
    2,935
    Provided Answers: 12
    Quote Originally Posted by replicon
    Maybe I need to be smarter about it and break up the query in software, so if my date range is "november to december", I only query for "nov 1"
    You could use a query with
    Code:
    WHERE extract(month from date_col) = 11 
    AND extract(year from date_col) = 2007
    You can create a function based index to support that query and the optimizer will use it.
    You did not give any figures so I cannot even guess how large that result set (limited for one month) will be, and if it's worthwhile further reducing its size.

  13. #13
    Join Date
    Dec 2007
    Posts
    253
    Simply search askTom.oracle.com for the term 'pagination' There are various methodologies and code samples to help you perform pagination. No need to reinvent the wheel.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •