Results 1 to 8 of 8
  1. #1
    Join Date
    Dec 2003
    Location
    Varna
    Posts
    20

    Question Unanswered: Approach to optimising aggreagtion queries (GROUP BY)

    Can anybody give me a hint - how can I aid (speed up) "GROUP BY" queries ?

    Why grouping doesn't take advantage of indexes on the grouped columns ?

    Any light is highly appreciated !
    Prob Solver

  2. #2
    Join Date
    Nov 2006
    Location
    Indianapolis
    Posts
    90
    (fast=don't use them?)

    Group by shouldn't have much to do with the query plan.

    Is there a where clause? How much data is the query going to have to go through to get the results?

  3. #3
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    You can speedup group by clauses considerably if you have an index with the same columns as the group by clause (same order) and the columns are not null. If you are grouping together large result sets (say more than hundreds of thousands of rows) you can get very big performance gains. You can see if your index is being used to elimante the sort step as in the execution plan you will see SORT GROUP BY NOSORT.

    Alan

  4. #4
    Join Date
    Dec 2003
    Location
    Varna
    Posts
    20

    Question

    Thanks for the replays, but I see that details are essencial
    so - here is a case:


    >>Oracle version:
    10g2


    >>DataBase Schema Fragment:
    create table ENTITY_IN_DOC
    ( DOC_ID INTEGER not null,
    ENTITY_ID INTEGER not null,
    ENTITY_COUNT INTEGER not null,
    DOC_DATE DATE
    );

    create bitmap index IX_ENTDOC_DOC on ENTITY_IN_DOC (DOC_ID);
    create index IX_ENTDOC_DOCDATE on ENTITY_IN_DOC (DOC_DATE, ENTITY_ID);
    create index IX_ENTDOC_ENTITY on ENTITY_IN_DOC (ENTITY_ID);
    create unique index IX_ENTDOC_ENTITY1 on ENTITY_IN_DOC (ENTITY_ID, DOC_ID);

    >>Query Plan - Estimated (Case 1):
    SELECT /*+ INDEX( ed IX_ENTDOC_DOCDATE ) */
    ed.doc_date, ed.entity_id, sum(ed.entity_count) entity_count
    FROM core_db_entity_in_doc ed
    GROUP BY ed.doc_date, ed.entity_id
    /*
    SELECT STATEMENT, GOAL = ALL_ROWS Cardinality=91288 CPU cost=6313895671 IO cost=820570 Cost=820999
    SORT GROUP BY NOSORT Cardinality=91288 CPU cost=6313895671 IO cost=820570 Cost=820999
    TABLE ACCESS BY INDEX ROWID Object name=CORE_DB_ENTITY_IN_DOC Cardinality=1150347 CPU cost=6313895671 IO cost=820570 Cost=820999
    INDEX FULL SCAN Object name=IX_ENTDOC_DOCDATE Cardinality=1150347 CPU cost=256604022 IO cost=3821 Cost=3838
    */
    >>NOTE: "CPU cost=6313895671"


    >>Query Plan - Estimated (Case 2):
    SELECT
    ed.doc_date, ed.entity_id, sum(ed.entity_count) entity_count
    FROM core_db_entity_in_doc ed
    GROUP BY ed.doc_date, ed.entity_id
    /*
    SELECT STATEMENT, GOAL = ALL_ROWS Cardinality=91288 CPU cost=1364869214 IO cost=2721 Cost=2814
    HASH GROUP BY Cardinality=91288 CPU cost=1364869214 IO cost=2721 Cost=2814
    TABLE ACCESS FULL Object name=CORE_DB_ENTITY_IN_DOC Cardinality=1150347 CPU cost=261698059 IO cost=620 Cost=638
    */
    >>NOTE: "CPU cost=1364869214"


    As I see it - envolvement of the appropriate index for the "group by" calculation makes things worse. But this is only the begining. In real world cases I must add some filtering - before grouping - which will completely prevent the usage of index for grouping.

    As of avoiding "group by" - !!!
    It's an analitical system so groupbys are 60% of what it does.
    Prob Solver

  5. #5
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    In your particular case the use of the index is much worse than the full table scan (though you should never go by the execution plan cost but instead put tracing on to get the real stats) so Oracle is doing the right thing.

    Another way to tune your query may be to create an index on (DOC_DATE, ENTITY_ID, ENTITY_COUNT) (maybe with compress 1).

    Alan

  6. #6
    Join Date
    Dec 2003
    Location
    Varna
    Posts
    20
    O.k. - I've done some more testing so:

    1) Adding "ENTITY_COUNT" (or the aggregated value column) doesn't improve the situation

    2) Making the index "IX_ENTDOC_DOCDATE" of type BITMAP brought some advantage compared to the B-Tree version:
    SELECT STATEMENT, GOAL = ALL_ROWS Cardinality=91288 CPU cost=510149567 IO cost=8445 Cost=8479
    SORT GROUP BY NOSORT Cardinality=91288 CPU cost=510149567 IO cost=8445 Cost=8479
    TABLE ACCESS BY INDEX ROWID Object name=ENTITY_IN_DOC Cardinality=1150347 CPU cost=510149567 IO cost=8445 Cost=8479
    BITMAP CONVERSION TO ROWIDS
    BITMAP INDEX FULL SCAN Object name=IX_ENTDOC_DOCDATE
    >>NOTE: "CPU cost=510149567" - better than grouping with no index usage/ BUT: "IO cost=8445" worse than no index usage

    So BITMAP version of the index didn't help too.
    Prob Solver

  7. #7
    Join Date
    Nov 2006
    Location
    Indianapolis
    Posts
    90
    I was of course joking about avoiding group bys...

    cases I must add some filtering - before grouping - which will completely prevent the usage of index for grouping.
    That's what you want to index on - your most used where clauses... to limit unnecessary i/o.

    Reading large amounts of data via indexes is expensive; at some point the full table scan becomes more cost effective. CBO tries to guess what that point is based on your init parms, stats, etc.

    Your original SQL says "look at every row in the table". In my experience FTS is in many cases a better plan than traversing an index, for large reads.

  8. #8
    Join Date
    Dec 2003
    Location
    Varna
    Posts
    20

    Lightbulb

    Originally when I posted my question - I thought that there is a streamline approach to construction of aggregating SQLs, which I simply missed. Now I start to think that it's too dependent on the situation.

    Here are two things I found - scrambling and testing SQLs:

    1) Aggregate SQLs use enormous amount of CPU cycles, but they are low in I/O load - so it's something you could live with. But still - the less - the better

    2) In a monster SQL I had to optimise - I've got positive result (speeding up) through reducing the number of tables which resultset is aggregated. The monster SQL was constructed with the idea that the less rows it aggreagtes - the better.
    My redesign was to remove some of the tables and to apply their restrictions over already aggregated result. Then aggregation was over larger resultset but with more cleaner and more streamline query plan - which proved to be more substantial.

    More than 3 joined tables has the potential to mislead the query plan builder.
    Prob Solver

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •