If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > DB2 > Aggregating numeric rows - summary table

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 12-05-07, 12:49
scottr8 scottr8 is offline
Registered User
 
Join Date: Nov 2004
Posts: 7
Question Aggregating numeric rows - summary table

I have the following SQL which is creating large temporary table to satisfy various joins. The data in some of these tables has grown an as such what was running in serveral hours is running in several days. Does anyone see and obivious alternative to this SQL:
Code:
insert into
                product_ed_month
                (
                        customer_id
                        ,product_id
                        ,identifier
                        ,month
                        ,edition_id
                        ,sub_sessions
                        ,queries
                )
        select
                customer_id
                ,product_id
                ,identifier
                ,month
                ,edition_id
                ,count(*) as sub_sessions
                ,sum(queries) as queries
        from
        (
                select
                        st.customer_id
                        ,st.identifier
                        ,er.product_id
                        ,st.month
                        ,pe.edition_id
                        ,erss.event_record_id
                        ,count(*) as queries
                from
                        session_tmp st
                        ,event_record er
                        ,event e
                        ,event_attribute ea
                        ,attribute a
                        ,product_edition pe
                        ,event_record erss
                        ,event ess
                where
                        st.month >= '10/01/2007'
                        and st.month <= '10/30/2007'
                        and st.session_event_id = er.session_event_id
                        and er.product_id = pe.product_id
                        and er.event_id = e.event_id
                        and e.event_name = 'event_query'
                        and er.event_record_id = ea.event_record_id
                        and ea.attribute_id = a.attribute_id
                        and a.attribute_name = 'edition'
                        and rtrim(ea.attribute_value) = pe.edition_id
                        and pe.edition_id != 'UNDEF'
                        and st.session_event_id = erss.session_event_id
                        and erss.event_id = ess.event_id
                        and ess.event_name like 'event_type%'
                        and er.product_id = erss.product_id
                group by
                        st.customer_id
                        ,er.product_id
                        ,st.identifier
                        ,st.month
                        ,pe.edition_id
                        ,erss.event_record_id
        ) as q
        group by
                customer_id
                ,product_id
                ,identifier
                ,month
                ,edition_id
        ;

Last edited by gvee; 12-05-07 at 13:57. Reason: [CODE] tags added to retain formatting and read-ability
Reply With Quote
  #2 (permalink)  
Old 12-05-07, 13:18
amthomas amthomas is offline
Registered User
 
Join Date: May 2005
Location: San Antonio, Texas
Posts: 134
hehe, can you edit your post and put that in some code tags to make it easier to read? (just a suggestion)

do you have a query analyzer in whatever system you are using? That might be the fastest way to determine what your slow spots are.

I imagine there is a lot of sequential scanning going on there which could make it slow. I would be wary of all those inner joins there. But... im not to experienced in this area.
__________________
Vi veri veniversum vivus vici
By the power of truth, I, a living man, have conquered the universe
Reply With Quote
  #3 (permalink)  
Old 12-05-07, 13:36
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
looking at the GROUP BY clauses, you might consider WITH ROLLUP (if your database supports this, and i'm betting it does, because the date formats and RTRIM suggest that it's microsoft sql server) to avoid one level of nesting

and i would immediately rewrite the inner query with proper JOIN syntax

this --
Code:
and e.event_name = 'event_query'
and er.event_record_id = ea.event_record_id
and ea.attribute_id = a.attribute_id
and a.attribute_name = 'edition'
sounds suspiciously like an EAV scheme, and these are notoriously slow
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #4 (permalink)  
Old 12-05-07, 14:09
scottr8 scottr8 is offline
Registered User
 
Join Date: Nov 2004
Posts: 7
I'm using DB2 on linux version 8. I'll have to propose your suggestion to our developers.
Reply With Quote
  #5 (permalink)  
Old 12-05-07, 14:17
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
sorry, i don't think WITH ROLLUP works in DB2

would you like me to move this thread to the DB2 forum? this is the ANSI SQL forum
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #6 (permalink)  
Old 12-05-07, 14:29
scottr8 scottr8 is offline
Registered User
 
Join Date: Nov 2004
Posts: 7
DB2 does support ROLLUP when defining a SUMMARY TABLE.

If the DB2 forum would be better environment to ask this question, please move it by all means.

Thanks for your help for far.
Reply With Quote
  #7 (permalink)  
Old 12-23-07, 15:32
Peter.Vanroose Peter.Vanroose is offline
Registered User
 
Join Date: Sep 2004
Location: Belgium
Posts: 1,079
What about the following single GROUP BY ?
If I'm not mistaken, it returns the same result as the double one:
Code:
insert into product_ed_month
  (customer_id,product_id,identifier,month,edition_id,sub_sessions                    ,queries)
 select
                st.customer_id
               ,er.product_id
               ,st.identifier
               ,st.month
               ,pe.edition_id
               ,count(DISTINCT erss.event_record_id) as sub_sessions
               ,count(*) as queries
from
               session_tmp st
              ,event_record er
              ,event e
              ,event_attribute ea
              ,attribute a
              ,product_edition pe
              ,event_record erss
              ,event ess
where
             st.month BETWEEN '2007-10-01' AND '2007-10-30'
         and st.session_event_id = er.session_event_id
         and er.product_id = pe.product_id
         and er.event_id = e.event_id
         and e.event_name = 'event_query'
         and er.event_record_id = ea.event_record_id
         and ea.attribute_id = a.attribute_id
         and a.attribute_name = 'edition'
         and rtrim(ea.attribute_value) = pe.edition_id
         and pe.edition_id != 'UNDEF'
         and st.session_event_id = erss.session_event_id
         and erss.event_id = ess.event_id
         and ess.event_name like 'event_type%'
         and er.product_id = erss.product_id
group by
         st.customer_id
        ,er.product_id
        ,st.identifier
        ,st.month
        ,pe.edition_id
I would also get rid of the "RTRIM" in the WHERE condition -- this makes it stage-1 and hence indexable, while it does not change the condition.
__________________
--_Peter Vanroose,
__IBM Certified Database Administrator, DB2 9 for z/OS
__IBM Certified Application Developer
__ABIS Training and Consulting
__http://www.abis.be/

Last edited by Peter.Vanroose; 12-23-07 at 15:39.
Reply With Quote
  #8 (permalink)  
Old 01-02-08, 10:26
scottr8 scottr8 is offline
Registered User
 
Join Date: Nov 2004
Posts: 7
Thanks for your help with this. I think the application developers will be looking at restructing the SQL. Those two GROUP BY's do seem redundant.
With help from IBM support "statistical views" were created to influence the optimizier. The original explain plan deduced that there would be 500 resultant rows were in reality the were 750,000 rows so the plan generated wasn't quite correct.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On