Results 1 to 11 of 11

Thread: group by

  1. #1
    Join Date
    Mar 2004
    Location
    Warsaw / Poland
    Posts
    73

    Unanswered: group by

    Hello,

    I got following problem:

    Code:
    SQL> select ID, max(IDName) from t group by ID;
    
    10071 rows selected.
    
    Elapsed: 00:00:00.09
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=26 Card=10071 Bytes=171207)
       1    0   SORT (GROUP BY NOSORT) (Cost=26 Card=10071 Bytes=171207)
       2    1     INDEX (FULL SCAN) OF 'TECH_NM1' (NON-UNIQUE) (Cost=26 Card=210669 Bytes=3581373)
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
              0  db block gets
           1488  consistent gets
              0  physical reads
              0  redo size
         202340  bytes sent via SQL*Net to client
           7880  bytes received via SQL*Net from client
            673  SQL*Net roundtrips to/from client
              0  sorts (memory)
              0  sorts (disk)
          10071  rows processed
    It's ok. But
    Code:
    SQL> select ID, glue(IDName) from t group by ID;
    
    10071 rows selected.
    
    Elapsed: 00:00:22.05
    
    Execution Plan
    ----------------------------------------------------------
       0      SELECT STATEMENT Optimizer=CHOOSE (Cost=26 Card=10071 Bytes=171207)
       1    0   SORT (GROUP BY) (Cost=26 Card=10071 Bytes=171207)
       2    1     INDEX (FULL SCAN) OF 'TECH_NM1' (NON-UNIQUE) (Cost=26 Card=210669 Bytes=3581373)
    
    Statistics
    ----------------------------------------------------------
              0  recursive calls
              0  db block gets
            818  consistent gets
              0  physical reads
              0  redo size
        2231606  bytes sent via SQL*Net to client
           7880  bytes received via SQL*Net from client
            673  SQL*Net roundtrips to/from client
              1  sorts (memory)
              0  sorts (disk)
          10071  rows processed
    Glue is user defined aggregate (similiar to STRAGG from AskTom). What it depends on to process SORT (GROUP BY NOSORT) or SORT (GROUP BY)? And maybe there's other reason for slowing down second query?

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    I don't know what causes the sort, but I suspect that most of the performance issues will relate to your "glue" function, which is being performed on 200,000 rows of data i.e. 10,000 rows per second, 0.0001 seconds per row! It only needs a small inefficiency in that function to have a dramatic effect on performance.

    Fancy posting the code?

  3. #3
    Join Date
    Mar 2004
    Location
    Warsaw / Poland
    Posts
    73
    Here goes the code:
    Code:
    CREATE OR REPLACE TYPE GlueImpl AS OBJECT
    (
      l_x VARCHAR2(4000), -- our string 
      STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT GlueImpl) RETURN NUMBER,
      MEMBER FUNCTION ODCIAggregateIterate(self IN OUT GlueImpl, p_value IN VARCHAR2) RETURN NUMBER,
      MEMBER FUNCTION ODCIAggregateTerminate(self IN GlueImpl, p_returnValue OUT VARCHAR2, p_flags IN NUMBER) RETURN NUMBER,
      MEMBER FUNCTION ODCIAggregateMerge(self IN OUT GlueImpl, ctx2 IN GlueImpl) RETURN NUMBER
    );
    /
    Code:
    CREATE OR REPLACE TYPE BODY GlueImpl IS 
    
    STATIC FUNCTION ODCIAggregateInitialize(sctx IN OUT GlueImpl) 
    RETURN NUMBER IS 
    BEGIN
      sctx := GlueImpl('');
      
      RETURN ODCIConst.Success;
    END;
    
    MEMBER FUNCTION ODCIAggregateIterate(self IN OUT GlueImpl, p_value IN VARCHAR2) 
    RETURN NUMBER IS
    BEGIN
      IF self.l_x IS NULL THEN
        self.l_x := p_value;
      ELSE 
        self.l_x := self.l_x || ',' || p_value;
      END IF; 
      RETURN ODCIConst.Success;
    END;
    
    MEMBER FUNCTION ODCIAggregateMerge(self IN OUT GlueImpl, ctx2 IN GlueImpl) 
    RETURN NUMBER IS
    BEGIN
      NULL;
      RETURN ODCIConst.Success;
    END;
    
    MEMBER FUNCTION ODCIAggregateTerminate(self IN GlueImpl, p_returnValue OUT VARCHAR2, p_flags IN NUMBER) 
    RETURN NUMBER IS
    BEGIN
      p_returnValue := self.l_x;
      RETURN ODCIConst.Success;
    END;
    END;
    /
    And aggregate
    Code:
    CREATE OR REPLACE FUNCTION Glue (
      p_x VARCHAR2
    ) 
    RETURN VARCHAR2 
    PARALLEL_ENABLE AGGREGATE USING GlueImpl;

  4. #4
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Well, it's hard to see how you could simplify your code much further

    So perhaps this is just an unavoidable overhead of user-defined aggregate functions, which are sure to be somewhat slower than built-in agregate functions. As I said before, your ODCIAggregateIterate function gets called 200,000 times, which works out at around 10,000 calls per second - which sounds quite impressive actually!

  5. #5
    Join Date
    Mar 2004
    Location
    Warsaw / Poland
    Posts
    73
    I've tried to remove if statement but that didn't help

    You said ,,So perhaps this is just an unavoidable overhead of user-defined aggregate functions'' - yes, of course there must be some. But thing which really interests me is the additional sort although the index is present. I think this is the problem, and I can't figure out why is it executed.

    Anyway, thank you for your time.

  6. #6
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    I agree that the only way to know for sure is to resolve the sort issue, but I don't believe it will turn out to be the cause of the problem: 20 seconds to sort 10,000 rows? I don't think so.

    How about verifying the impact of the sort by forcing the first query to do one? e.g.

    select ID, max(IDName) from t group by ID order by 2;

    If that takes 20 seconds then it is the sort; if it is still under a second, it isn't!

  7. #7
    Join Date
    Mar 2004
    Location
    Warsaw / Poland
    Posts
    73
    I've duplicated message...and I can't drop it.
    Last edited by HrabiaGrzegorz; 12-21-04 at 07:53.

  8. #8
    Join Date
    Mar 2004
    Location
    Warsaw / Poland
    Posts
    73
    I've done this right after posting previous message. And, as you said ,,20 seconds to sort 10,000 rows? I don't think so.''
    So that is problem with user defined aggregate itself. Bad news.

    But maybe there is solution to my problem, which you'll (or anyone else) suggest:
    I have a table, which holds data from external application.
    Code:
    create table t (ID NUMBER, IDType VARCHAR2(20), IDName VARCHAR2(255), IDValue VARCHAR2(645));
    One ID has some attributes - IDNames with their values IDValues (IDType isn't important). Set of IDNames can change since it depends on that external application. I need to create routine, which would enable usage of this table. Let's now try to select all of id's with their IDValues as columns:
    One approach is to use outer joins -
    Code:
    SELECT t1.id, t2.IDValue AS attr1, t3.IDValue AS attr2, ...
      FROM (
        SELECT DISTINCT id FROM t
      ) t1
        LEFT OUTER JOIN t t2 ON t.id = t2.id AND t2.IDName = attr1
        LEFT OUTER JOIN t t3 ON t.id = t3.id AND t3.IDName = attr2
        ...
    But when we need lot of attributes it isn't acceptable.
    Another approach was to (assuming ,,&'' is special char which isn't used in IDName/IDValue)
    Code:
    SELECT ID,
        DECODE(INSTR(glue(IDName || '&' || IDValue || '&'), 'attrname' || '&'), 0, NULL,
          SUBSTR(glue(IDName || '&' || IDValue || '&'), 
            INSTR(glue(IDName || '&' || IDValue || '&'), 'attrname' || '&') + LENGTH('attrname' || '&'), 
            INSTR(glue(IDName || '&' || IDValue || '&'), '&', 
              INSTR(glue(IDName || '&' || IDValue || '&'), 'attrname' || '&') + LENGTH('attrname' || '&')
            ) - INSTR(glue(IDName || '&' || IDValue || '&'), 'attrname' || '&') - LENGTH('attrname' || '&')
          )
        ) AS attrname
      FROM t
      GROUP BY ID
    But this one is very slow when we need few attributes.

    I'm almost sure this problem has already been solved many times...

  9. #9
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Yuk - the dreaded "EAV" model!!!

    What you need is a dynamic PIVOT query. I have a package called PIVOT that can do that - follow the link to my blog in my signature line to get the source of package PIVOT and PARSE (which is used by PIVOT).

    Say your table T currently contains this:
    Code:
    SQL> select * from t;
    
            ID IDTYPE               IDNAME     IDVALUE
    ---------- -------------------- ---------- ----------
             1                      COLOUR     RED
             1                      PATTERN    CHECK
             1                      SIZE       42
             2                      COLOUR     BLUE
             2                      SIZE       38
             3                      COLOUR     GREEN
             3                      PATTERN    STRIPE
    
    7 rows selected.
    What you need is a query like this:
    Code:
    SQL> select ID
      2  ,      MAX(DECODE(IDNAME,'COLOUR', IDVALUE)) as "COLOUR"
      3  ,      MAX(DECODE(IDNAME,'PATTERN', IDVALUE)) as "PATTERN"
      4  ,      MAX(DECODE(IDNAME,'SIZE', IDVALUE)) as "SIZE"
      5  from   T
      6  group by ID
      7  order by ID;
    
            ID COLOUR     PATTERN    SIZE
    ---------- ---------- ---------- ----------
             1 RED        CHECK      42
             2 BLUE                  38
             3 GREEN      STRIPE
    The PIVOT package can generate such a query dynamically, and either print the query or return a ref cursor to the result set like this:
    Code:
    SQL> var rc refcursor
    SQL> begin
      2    :rc := pivot.pivot_cursor
      3      ( 'ID'
      4      , 'IDNAME'
      5      , 'T'
      6      , 'IDVALUE'
      7      , NULL
      8      , 'MAX'
      9      );
     10  end;
     11  /
    
    PL/SQL procedure successfully completed.
    
    SQL> print rc
    
            ID COLOUR     PATTERN    SIZE
    ---------- ---------- ---------- ----------
             1 RED        CHECK      42
             2 BLUE                  38
             3 GREEN      STRIPE
    Note that the columns after ID are being generated dynamically by the function based on the current data in the table, so if we add a new IDName we get another column:
    Code:
    SQL> insert into t values (3,null,'QUALITY','A1');
    
    1 row created.
    
    SQL> begin
      2    :rc := pivot.pivot_cursor
      3      ( 'ID'
      4      , 'IDNAME'
      5      , 'T'
      6      , 'IDVALUE'
      7      , NULL
      8      , 'MAX'
      9      );
     10  end;
     11  /
    
    PL/SQL procedure successfully completed.
    
    SQL> print rc
    
            ID COLOUR     PATTERN    QUALITY    SIZE
    ---------- ---------- ---------- ---------- ----------
             1 RED        CHECK                 42
             2 BLUE                             38
             3 GREEN      STRIPE     A1
    I can't vouch for the performance, but it shouldn't be too bad I believe.

  10. #10
    Join Date
    Mar 2004
    Location
    Warsaw / Poland
    Posts
    73
    Dynamic pivot query...it was obvious ...
    Yes, this package seems to be exactly what I need. I like it. Thank you!
    Also, it does the query in ~4 sec while my proc needs ~20 sec. It's easy to choose the winner now

    You said ,,Yuk - the dreaded "EAV" model!!!'' - I've read your blog site before, I agree this is not good idea for storing data. But in this specific situation I cannot change it to create table t (id, idtype, idname1, idname2, ..., idnameN) because I don't know how many attributes will be produced by that application metioned above.

    Great thanks, Tony!

  11. #11
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Provided Answers: 1
    Yes, it could well be that your situation is actually the one where EAV makes sense! i.e. the requirements really are that generic.

    Glad you like the package.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •