Results 1 to 13 of 13
  1. #1
    Join Date
    Jul 2004
    Posts
    12

    Question Unanswered: Oracle Query Optimisation

    I am pulling data from Oracle into Microsoft SQL Server (using MSSQL OPENQUERY() to send a query to the Oracle "linked server").

    This job runs every 15 minutes and queries the Oracle database for numerous tables, all of which have a change-date column.

    It is important that the query is as efficient as possible (Oracle database is a large order fulfilment system) and we do not pull more rows than we need to.

    We do not need all columns from all tables, so we restrict to only pulling the columns we need, and only on rows that have recent changed-date.

    We also limit the number of rows. We store MAX(changed_date) from the previous run, and the MAX value of the first PK column [for that MAX changed_date] as the point to resume the next batch from.

    The query we send to Oracle looks like this:
    Code:
    SELECT *
    FROM
    (
    	SELECT
    		PICK_LIST_NUM,
    --		WAREHOUSE_CODE,
    		SALES_DOCUMENT_NUM,
    		SALES_ITEM_NUM,
    		STATUS,
    		PRODUCT_CODE,
    		ORDER_DESPATCH_QUANTITY,
    		DESPATCH_NUM,
    		ORDER_TYPE,
    		WAREHOUSE_CODE,
    		CHANGE_DATE
    	FROM	OracleDbName.PICK_DESPATCH_ITEM
    Code:
    	WHERE	1=1
    (1)		AND STATUS NOT IN (1, 2, 3)
    	
    (2)		AND (
    (3)		    CHANGE_DATE > 123456
    		    OR (
    			    CHANGE_DATE = 123456
    (4)			AND PICK_LIST_NUM >= 54321
    			)
    		)
    	
    (5)	ORDER BY
    		  CHANGE_DATE
    		, PICK_LIST_NUM
    		, ORDER_TYPE
    		, PRODUCT_CODE
    		, SALES_DOCUMENT_NUM
    		, SALES_ITEM_NUM
    		, WAREHOUSE_CODE
    ) X
    WHERE ROWNUM <= 10000
    Some specifics in case you have any comments:

    Columns which we do not require, such as WAREHOUSE_CODE, are included in the SELECT list to be self-documenting, but are commented out

    (1) There may be some table-specific WHERE clause exclusions, but usually not

    (2) This block constraints the results to only show rows more recent than the previous batch:

    (3) checking that the changed_date ("123456" in this example, seconds since 01-01-1970) is more recent than last time

    (4) or the same as the last cut-off changed_date but where the First PK field is greater than the last cutoff ("54321" in this example)

    (5) ORDER BY first column is changed_date (so we get a clean cutoff between batches) and then by ALL the PK columns (although, I suppose, technically only the first PK column is actually important for repeatability)

    Lastly, I use ROWNUM to limit the number of rows returned so we don't cause a bottleneck. (It is common during busy periods that the transfers do not "catch their tail" until about 2am the following morning). This does require that I used a derived table though (although, another benefit is that MSSQL OpenQuery() sometimes barfs on syntax that the transport-layer attempts to parse and cannot handle, and the normal solution to that is to wrap a derived table anyway ...)

    I have no control over indexes etc. that are available in the Oracle database, but I can make suggestions. I believe that CHANGE_DATE is indexed on most of the tables that I reference (but if there is a query that I could use to check that it would be helpful).

    There are a couple of VIEWs that we have to pull data from. These have no Change_Date column and are hopelessly slow by comparison to the above table-direct queries that we process - typically taking 4~5 minutes to pull about 50,000 rows (these views are things like Stock Level so I presume complicated behind-the-scenes checking reserved stock and forward deliveries etc., and price lists checking for prices that have "current" dates etc.) but any suggestions on improving them would be appreciated (just wriing this perhaps exposing the various Change-Dates in the VIEW would enable me to reduce the number of rows that are included, and allow us to pull a more reasonable batch size instead of having to pull all-rows-every-time)
    Last edited by kristenbm; 06-02-10 at 05:42. Reason: Cut & Paste typo

  2. #2
    Join Date
    Mar 2010
    Location
    Vienna, Austria
    Posts
    149
    My suggestions would be:

    1.) suggest a concatenated index on change_date and pick_list_number
    2.) Depending on the selectivity of the STATUS column, an index there might be useful, too
    3.) What is the outer SELECT for (if you want to limit the output to 10000 rows, you can do it in the inner SELECT) ?
    4.) Are you aware, that the "WHERE ROWNUM <= 10000" predicate only limits the output, but DOES NOT LIMIT the query (because there is an "ORDER BY" the entire result set has to be built. The "ROWNUM < whatever" only results in limiting the DELIVERY of the already selected result set to 10000 rows) ?
    5.) Consider binding variables instead of literals in (3) and (4). It will improve the parsing performance.
    "There is always an easy solution to every problem - neat, plausible, and wrong."
    -- H.L. Mencken

  3. #3
    Join Date
    Jul 2004
    Posts
    12
    1 & 2 - thanks for that. Much the same as MS SQL in that regard, so I can understand it at least!

    3) Dunno why I have a wrapper-Select. I expect I had trouble somewhere along the line trying to do it with a single SELECT. (MS SQL OpenQuery() has that problem if something [usually database drivers] tries to Parse the SQL, and "helpfully" detects a syntax error, without actually passing the whole query to Oracle to deal with. I'll try removing the wrapper.)

    4) Didn't realise that. Is there a way to LIMIT the output, retaining the ORDER BY, but getting Oracle to do the minimum work?

    e.g. Does just getting the PKs using ORDER BY on CHANGE_DATE + PK and ROWNUM < 10000, and then JOINing that back to the original table result in less work?

    5) If you have time could you provide an example, or a hint where I can look in DOCs online pls?

    Thanks.

  4. #4
    Join Date
    Mar 2010
    Location
    Vienna, Austria
    Posts
    149
    ad 4.) The order by clause MUST be applied on ALL resulting rows from the other predicate(s). There cannot be a way around it, because you have to evaluate the entire result set "in order to order".

    e.g. Does just getting the PKs using ORDER BY on CHANGE_DATE + PK and ROWNUM < 10000, and then JOINing that back to the original table result in less work?
    You could, however, use a - more or less dirty - trick:

    If Oracle can satisfy a query out of the index, it doesn't access the data itself.
    The improvement would be, that you read less blocks by only scanning the index instead of scanning the index and then access the data by the rowid delivered by the index scan.

    suppose you have a concatenated index on CHANGE_DATE, STATUS, PICK_LIST_NUM and the PK column (lets call it PK_COL), you could build an inline view (i called it "qq") and the query would be:

    Code:
    with qq as
       select PK_COL
       from   <your_table>
       where  status not in (1,2,3)
       and    change_date > <yourdate>
              or (change_date = <your_date> and pick_list_num >= <whatever>)
       where  rownum < 10000
      SELECT
          		a.PICK_LIST_NUM,
          --		WAREHOUSE_CODE,
          		a.SALES_DOCUMENT_NUM,
          		a.SALES_ITEM_NUM,
          		a.STATUS,
          		a.PRODUCT_CODE,
          		a.ORDER_DESPATCH_QUANTITY,
          		a.DESPATCH_NUM,
          		a.ORDER_TYPE,
          		a.WAREHOUSE_CODE,
          		a.CHANGE_DATE
      FROM	<your_table> a
           , qq          b
      WHERE	a.PK_COL = b.PK_COL
    	ORDER BY
        		  a.CHANGE_DATE
        		, a.PICK_LIST_NUM
        		, a.ORDER_TYPE
        		, a.PRODUCT_CODE
        		, a.SALES_DOCUMENT_NUM
        		, a.SALES_ITEM_NUM
        		, a.WAREHOUSE_CODE
    ;
    In this case, qq should only do an index range scan and the table access (including the ordering) would be performed only on the 10000 rows you joined
    (I didn't test the code, but it should work)

    ad 5)
    A part of the so called System Global Area (in short: SGA) is the library cache, where the system - once a sql statement is parsed - stores the statement and the execution plan.
    So, if the same statement is run again, Oracle does not need to parse it again, but just grabs the already calculated execution plan out of the library cache.

    Since Oracle is doing a simple string-compare you will get different parsing behavior:
    with literals the query would be parsed each time, with variables only the first time (this might not make the big difference with this query, but with complex queries, where the parsing can take seconds it matters):

    example literals:

    query above


    example variables:
    Code:
    variable vdatestr   varchar2(16);
    variable vstat1      number;
    variable vstat2      number;
    variable vstat3      number;
    variable vpln         number;
    variable vmaxrows number;
    
    begin
      :vdatestr  := '02/06/2010';
      :vstat1     := 1;
      :vstat2     := 2;
      :vstat3     := 3;
      :vpln        := 54321;
      :vmaxnum := 10000;
    end;
    
    with qq as
       select PK_COL
       from   <your_table>
       where  status not in (:vstat1, :vstat2, :vstat3)
       and    change_date > to_date(:vdatestr,'DD/MM/YYYY')
              or (change_date = to_date(:vdatestr,'DD/MM/YYYY') and pick_list_num >= :vpln)
       where  rownum < :vmaxnum
      SELECT
          		a.PICK_LIST_NUM,
          --		WAREHOUSE_CODE,
          		a.SALES_DOCUMENT_NUM,
          		a.SALES_ITEM_NUM,
          		a.STATUS,
          		a.PRODUCT_CODE,
          		a.ORDER_DESPATCH_QUANTITY,
          		a.DESPATCH_NUM,
          		a.ORDER_TYPE,
          		a.WAREHOUSE_CODE,
          		a.CHANGE_DATE
      FROM	<your_table> a
           , qq          b
      WHERE	a.PK_COL = b.PK_COL
    	ORDER BY
        		  a.CHANGE_DATE
        		, a.PICK_LIST_NUM
        		, a.ORDER_TYPE
        		, a.PRODUCT_CODE
        		, a.SALES_DOCUMENT_NUM
        		, a.SALES_ITEM_NUM
        		, a.WAREHOUSE_CODE
    ;
    In the latter case, the sql statement would not be parsed again for different values of status, change_date pick_list_num and/or rownum

    You'll find more info on bind variables in the "SQL*Plus® User's Guide and Reference" for your release on the Oracle Website (i.e. for 11.2: http://download.oracle.com/docs/cd/E...e10823/toc.htm)
    "There is always an easy solution to every problem - neat, plausible, and wrong."
    -- H.L. Mencken

  5. #5
    Join Date
    Jul 2004
    Posts
    12
    I'm familiar with both concepts - but not in Oracle, nor the Oracle syntax - so this will be a great boost, and I think I'm competent enough to implement it! Thanks very much for your help and detailed answers.

    Final question (hopefully!)

    Do you think execution plan will still be in the cache if this only runs every 15 minutes? Its probably a "how long is this bit of string" question, depending on how little!! memory the Oracle box has and how many different queries it sees, but it may be a bit pointless me changing all the code to conform if there is zero chance of a cached execution plan surviving in the cache that long

  6. #6
    Join Date
    Mar 2010
    Location
    Vienna, Austria
    Posts
    149
    Do you think execution plan will still be in the cache if this only runs every 15 minutes?
    This depends on a couple of things including the Oracle Version.

    I usually know on the whole what's going on in the lib caches of the production instances I'm reponsible for, and so will probably the DBA on your Oracle site (the name of the configuration parameter is "shared_pool_size", btw).

    If the query you posted was not just an example for something much more complex, the gain in parsing might be too small to justify rewriting an existing application, however.
    "There is always an easy solution to every problem - neat, plausible, and wrong."
    -- H.L. Mencken

  7. #7
    Join Date
    Jul 2004
    Posts
    12
    Good point. This is a representative query. Single table, WHERE clause mostly just looking for rows with a recent CHANGED_DATE; just a few of the queries have additional constraints in the WHERE clause (like the STATUS column in this example)

    The couple of queries that target VIEWs might be worth some further work to try to keep the execution plan in the cache as the VIEWs represent very complex queries on the underlying tables.

  8. #8
    Join Date
    Mar 2010
    Location
    Vienna, Austria
    Posts
    149
    ... as the VIEWs represent very complex queries on the underlying tables
    Let's keep that in proportion:

    If there is - for whatever reason - a poor plan on the view-query, we are talking factors (I've literally seen weeks turn into minutes just by setting a hint).
    When we are discussing parsing, we are talking - at least in your case - milliseconds.
    This might be a factor 1000 to the nanoseconds of memory operations, but then again, its 4-6 miliseconds - most of it I/O waits - once every 15 minutes.

    I guess, the system will be able to absorb the additional heavy workload of approx. 0.6 cpu seconds per day ... ;-)

    One of the boring truths of tuning is, that 98% of the wasted time in databases is lost because of poor data design or poorly written code.

    And fixing that is by far not as sexy as giving this undocumented configuration parameter you found in one of Jonathan Lewis' books the final twist and be the hero of the day.

    The latter happens, but rarely enough ...
    "There is always an easy solution to every problem - neat, plausible, and wrong."
    -- H.L. Mencken

  9. #9
    Join Date
    Jul 2004
    Posts
    12
    Ah ... all death, no glory ... I'm familiar with that!

    Sounds like there is a difference with MS SQL in the way that VIEWs are handled.

    In MS SQL a query on a VIEW, in effect, takes all the code from the view's definition into the new query and then an execution plan is constructed across the whole lot. If the VIEW is complex, but the QUERY simple, the construction of the execution plan can take a relatively long time.

    Have I understood correctly that in Oracle the execution plan is created for the VIEW itself?, so querying "SELECT * FROM MyView" doesn't have to build an execution plan for all the logic within the VIEW's definition, just the query of the View itself?

    If so my work is done! My query on the view, itself, is very simple.

  10. #10
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    do the records really need to be processed in order? Why not get rid of the date check and just go for the status? It would seem that you should only have new stuff being in one of those status's and all old stuff would, I guess, be a higher number. This would, also, ensure that none get left behind, when you change your date and time to look at.
    Dave

  11. #11
    Join Date
    Jul 2004
    Posts
    12
    Thanks. I need records that have changed since the last time the query was run. CHANGED_DATE is the only indicator of that.

    For some tables I only need a subset of data, hence for those there are also some specific columns I can set criteria on to narrow the query - like the STATUS column in this example. For most tables I need all records, so CHANGED_DATE is the only criteria.

    I'm not bothered to have them in order, but I do need to limit the number of rows retrieved - it is not uncommon, for example, for millions of customer records to be updated with the latest media code representing some mailshot they have been sent. Pulling all those across the WAN in one go would not be good!

    So my solution was to retrieve a "batch" ordered by CHANGE_DATE and the PK columns so that I could have a repeatable query that I could "resume" from the last record received.

    But I'm open to ideas that would perform better whilst allowing me to batch the downloads; this method is just what I have dreamt up ... happy to learn of any better methods, or methods that perform better under Oracle.

  12. #12
    Join Date
    Jul 2004
    Posts
    12
    Meant to ask this before:

    Is there some sort of LIMIT statement in Oracle? (I'm thinking of PostgreSQL / MySQL syntax)

    SELECT Col1, Col2, ...
    FROM MyTable
    ORDER BY PKCol1, PKCol2, ...
    LIMIT 10000

    or some other gadget e.g. intended for Front-end Paging, which might be processed as part of the SELECT, rather than filtering the retrieved data (as ROWNUM does)?

  13. #13
    Join Date
    Aug 2003
    Location
    Where the Surf Meets the Turf @Del Mar, CA
    Posts
    7,776
    Provided Answers: 1
    >Is there some sort of LIMIT statement in Oracle? (I'm thinking of PostgreSQL / MySQL syntax)
    NO

    closest is
    where ROWNUM < 1000
    You can lead some folks to knowledge, but you can not make them think.
    The average person thinks he's above average!
    For most folks, they don't know, what they don't know.
    Good judgement comes from experience. Experience comes from bad judgement.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •