Page 1 of 3 123 LastLast
Results 1 to 15 of 34
  1. #1
    Join Date
    Mar 2012
    Posts
    120

    Unanswered: need to speed up queries

    Hi,
    I'm using DB2 10.1 ESE on Linux.

    My application runs 3 queries that are similar and all take long time to complete.


    The queries structure is:

    select t.col2, t.col3, t.col4 from MYSCHEMA.BIG_DATA_TABLE t , SESSION.ENABLED_SYSTEMS S where t.col1=s.col1 and t.col5=1

    SESSION.ENABLED_SYSTEMS is a temporary table that is shared among all the 3 queries, it contains just an integer and has a size up to 300K rows.

    MYSCHEMA.BIG_DATA_TABLE is a different table for the 3 queries, but in all the cases it contains a big number of rows (60 million the first, 30 million the second, 20 million the third).

    In two cases all the involved columns are integer or smallint, just in the first case col3 is varchar(8000).

    I tried a "index only access" approach and created an index for each of the 3 tables like:
    create index MYSCHEMA.BIG_DATA_TABLE_IDX on MYSCHEMA.BIG_DATA_TABLE (col1,col5,col2,col3,col4)

    (in order of decreasing cardinality for the predicate columns, couldn't use include clause because the index cannot be unique)

    but the queries take too much time (from 2 to 4 minutes), I need to dramatically cut this time.

    Could you please suggest how can achieve this?

    Thank you!

  2. #2
    Join Date
    Aug 2001
    Location
    UK
    Posts
    4,650
    Have you generated explain plan to see where the bottleneck is ?

    Have you tried an index on SESSION.ENABLED_SYSTEMS ?
    Visit the new-look IDUG Website , register to gain access to the excellent content.

  3. #3
    Join Date
    Jul 2009
    Location
    NY
    Posts
    963

    Lightbulb Try the following

    select t.col2, t.col3, t.col4 from MYSCHEMA.BIG_DATA_TABLE t , SESSION.ENABLED_SYSTEMS S where t.col1=s.col1 and t.col5=1
    ==> Try the both of following queries

    1.
    Code:
    select t.col2, t.col3, t.col4 
    from (select distinct col1 from 
      SESSION.ENABLED_SYSTEMS) S
    , MYSCHEMA.BIG_DATA_TABLE  t   
    where t.col1=s.col1 and t.col5=1
    2.
    Code:
    select t.col2, t.col3, t.col4 
    from SESSION.ENABLED_SYSTEMS S
    , TABLE
    (select distinct t1.col2, t1.col3, t1.col4 
    MYSCHEMA.BIG_DATA_TABLE  t1   
    where t1.col1=s.col1 and t1.col5=1 ) T
    Lenny

  4. #4
    Join Date
    Mar 2012
    Posts
    120
    @sathyaram_s

    the index on SESSION.ENABLED_SYSTEMS didn't give particular enhancement

    about the explain: total cost is 86369.5 of which 60396.3 is spent in the MYSCHEMA.BIG_DATA_TABLE_IDX scan and only 1913.75 in the temp table scan

    @Lenny

    I'll try it asap, I'll let you know

  5. #5
    Join Date
    Mar 2003
    Posts
    280
    Your approach looks fine, is statistics up to date for the involved tables / indexes? I would aim for an index on the session table and create it with collect sampled detailed statistics

  6. #6
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    How about one or both of these examples?

    Example 1:
    Code:
    CREATE INDEX myschema.big_data_table_idx2
     ON myschema.big_data_table
     ( col5 , col1 , col2 , col3 , col4 )
    Note: the sequence of columns is different from your index MYSCHEMA.BIG_DATA_TABLE_IDX.


    Example 2:
    Code:
    SELECT col2 , col3 , col4
     FROM  myschema.big_data_table AS t
     WHERE t.col5 = 1
       AND EXISTS
           (SELECT 0
             FROM  session.enabled_systems AS s
             WHERE s.col1 = t.col1
           )
    Note: IN predicate is also possible instead of EXISTS predicate.

    .

  7. #7
    Join Date
    Aug 2008
    Posts
    147
    You can get some very useful information using the db2exfmt commnand . It give some good info , which may help speed the query. There is some usage information on : DB2 Tuning Toolkit
    ---------------------------------
    www.sqlserver-dba.com
    www.dba-db2.com

  8. #8
    Join Date
    Mar 2012
    Posts
    120
    I tried all the suggested solutions but only got an improvement of few seconds...

    About db2exfmt, I also obtained section actuals but didn't find anything interesting....

    Is there a way to know if the time is spent doing I/O or the required pages are already in the bufferpool?

  9. #9
    Join Date
    Mar 2012
    Posts
    120
    Some info that may help

    More or less the same time of the query is the same as this other one:
    select t1.col2, t1.col3, t1.col4 from MYSCHEMA.BIG_DATA_TABLE t1
    where t1.col5=1

    and also the rows returned are the very similar (20 to 30 million)

    I took the db2exfmt for this very small query and the access is done with the index proposed by tonkuma.

    Furtermore I see :

    Objects Used in Access Plan:
    ---------------------------

    Schema: MYSCHEMA Name: IND1
    Type: Index
    Last statistics update: 2013-03-16-14.58.56.384524
    Number of rows: -1
    Number of buffer pool pages: -1

    Distinct row values: No
    Tablespace name: USERSPACE1
    Tablespace overhead: 6.725000
    Tablespace transfer rate: 0.320000
    Prefetch page count: 32
    Container extent page count: 32
    Index clustering statistic: 0.997389
    Index leaf pages: 30219
    Index tree levels: 3
    Index full key cardinality: 35536675
    Base Table Schema: MYSCHEMA
    Base Table Name: BIG_DATA_TABLE
    Columns in index:
    COL5(A)
    COL2(A)
    COL3(A)
    COL4(A)
    Last edited by Robert1973; 03-16-13 at 10:42.

  10. #10
    Join Date
    Mar 2012
    Posts
    120
    Other info using the procedure in the link provided by JackVamvas.

    IXSCAN: (Index Scan)
    Cumulative Total Cost: 27673.1
    Cumulative CPU Cost: 3.06737e+10
    Cumulative I/O Cost: 15263.1
    Cumulative Re-Total Cost: 9870.04
    Cumulative Re-CPU Cost: 3.05793e+10
    Cumulative Re-I/O Cost: 0
    Cumulative First Row Cost: 14.115
    Estimated Bufferpool Buffers: 15263.5

    Arguments:
    ---------
    CUR_COMM: (Currently Committed)
    TRUE
    LCKAVOID: (Lock Avoidance)
    TRUE
    MAXPAGES: (Maximum pages for prefetch)
    15109
    PREFETCH: (Type of Prefetch)
    SEQUENTIAL,READAHEAD
    ROWLOCK : (Row Lock intent)
    SHARE (CS/RS)
    SCANDIR : (Scan Direction)
    FORWARD
    SKIP_INS: (Skip Inserted Rows)
    TRUE
    TABLOCK : (Table Lock intent)
    INTENT SHARE
    TBISOLVL: (Table access Isolation Level)
    CURSOR STABILITY


    Objects Used in Access Plan:
    ---------------------------

    Schema: MYSCHEMA
    Name: IND1
    Type: Index
    Time of creation: 2013-03-16-14.55.22.841232
    Last statistics update: 2013-03-16-14.58.56.384524
    Number of columns: 4
    Number of rows: 35536675
    Width of rows: -1
    Number of buffer pool pages: 92550
    Distinct row values: No
    Tablespace name: USERSPACE1
    Tablespace overhead: 6.725000
    Tablespace transfer rate: 0.320000
    Source for statistics: Single Node
    Prefetch page count: 32
    Container extent page count: 32
    Index clustering statistic: 0.997389
    Index leaf pages: 30219
    Index tree levels: 3
    Index full key cardinality: 35536675
    Index first key cardinality: 2
    Index first 2 keys cardinality: 493616
    Index first 3 keys cardinality: 2016901
    Index first 4 keys cardinality: 35536675
    Index sequential pages: 30218
    Index page density: 99
    Index avg sequential pages: 30218
    Index avg gap between sequences:0
    Index avg random pages: 0


    Maybe the problem can be that the index is not in the bufferpool after the first query run..... I think if I can avoid any I/O at least I can solve the problem starting from the second run of each query ...

    Please note I'm using IBMDEFAULTBP with 250000 32K pages

  11. #11
    Join Date
    Nov 2011
    Posts
    334
    Could you plz publish the output of db2exfmt and db2batch for these two queris?
    db2batch -d <dbname> -i complete -o e yes p 5 r 0 -f query file
    db2exfmt -d <dbname> -1

  12. #12
    Join Date
    Mar 2012
    Posts
    120
    attached you can find the requested output for one of the query (the other is similar)

    thanks!
    Attached Files Attached Files

  13. #13
    Join Date
    Nov 2011
    Posts
    334
    I think the step cosuming the most of time is fetch.
    * Prepare Time is: 0.059704 seconds
    * Execute Time is: 0.044291 seconds
    * Fetch Time is: 218.857541 seconds
    * Elapsed Time is: 218.961536 seconds (complete)

    fetch is a single thread operation, it coverts the data from the internal format to SQLDA format. it is a very cpu intensive operation.
    Your fetch speed is 127381 rows per second. it is fast enough.
    So if you what to improve the performance, you can :
    1、use fetch first xxx rows only
    2、put the result into a result table
    3、improve your cpu speed ( need a better one)。

  14. #14
    Join Date
    Mar 2012
    Posts
    120
    Thanks fengsun.

    I need fetching all the rows and cannot improve cpu speed.
    I'd leave an ad hoc table for the results as the ultimate option.

    What I was thinking was using a dedicated bufferpool and tablespace, so at least when I run the same query the second time I have all required data in the bufferpool and avoid any I/O.
    This way I could strongly improve the query from the second run on...

    Do you think it's a good idea?
    If so, how should I define this bufferpool? Should I use blocks? Which page size and size do I need? Which object should I put in the new tablespace?

  15. #15
    Join Date
    Nov 2011
    Posts
    334
    of course you could use a dedicated bufferpool and tablespace.
    but i think performance will not be improved so much, because the bottleneck is
    on fetch operation.
    if you want to to this ,you can :
    1、create a seperately tablespace in pagesize 32k
    2、put the table BIG_DATA_TABLE and indexes into the tablespace
    3、and defined a bufferpool large enough for the tablespace.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •