Results 1 to 8 of 8
  1. #1
    Join Date
    May 2005
    Posts
    2

    Question Unanswered: query performance issue

    Hi

    I'm having some performance issues with a select query, it takes about 8 seconds to complete, which seems way to long. Doing some testing it looks like this part takes a fair share of that time:


    Code:
    	(
    		(HI."symbol" in ('MMM', 'AAA') AND ST."country" = 'US') OR
    		(ST."ref" in ('MMM','AAA'))
    	)
    The size of the return set varies quite a bit but is often between 3 and 10,000 rows (in the test taking 8s the return set was 3 rows).

    Query
    Code:
    SELECT
    	HI.*,
    	ST."country",
    	ST."ref",
    	US."userName"
    FROM
    	HISTORIC AS HI,
    	STATIC AS ST,
    	USERS AS US
    WHERE
    	US."userName" in ('test','bla') AND
    	HI."userID" = US."userID"	AND
    	HI."staticID" = ST."staticID"	AND
    	(
    		(HI."symbol" in ('MMM', 'AAA') AND ST."country" = 'US') OR
    		(ST."ref" in ('MMM','AAA'))
    	) AND
    	HI."date" between date('2008-01-01') and date('2008-01-04')
    Anyone who can help me tweak this query?

    Regards
    Andreas

    Extra information

    Row size of tables (if that's interesting):
    Users is a mostly static table (loaded every day) with ~4,000 rows
    Static is also a static table loaded every day with ~800,000 rows
    Historic updates during the day, when I did my test it had ~1,500,000 rows, but it will grow much larger.

    Indexes are available on all tables as follows (each row is a separate index)
    Historic
    • userID
    • staticID
    • symbol
    • date

    Static
    • staticID
    • country
    • ref

    Users
    • UserName
    • userID

  2. #2
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Your choice of indexes doesn't make much sense (ok, it may not be your choice...) Try the db2advis tool to see if it comes up with better indexes.

    I'll leave it to the ever patient Herr Stolze to remind you about providing all relevant details, such as your platform and DB2 version, table definitions, etc.
    ---
    "It does not work" is not a valid problem statement.

  3. #3
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Try eliminating the OR predicate by splitting the query in two parts and linking them together with UNION.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  4. #4
    Join Date
    May 2005
    Posts
    2
    DB2 v8.1.3.96 FixPak 10 running on some version of solaris (not sure which)

    thanks, this helped a lot. I fixed the indexes mess (couldn't find db2advis but reading the log from db2expln gave me some hints) and rewrote the query as:

    Code:
    SELECT
    	HI.*,
    	ST."country",
    	ST."ref",
    	US."userName"
    FROM
    	HISTORIC AS HI,
    	STATIC AS ST,
    	USERS AS US
    WHERE
    	US."userName" in ('test','bla') AND
    	HI."userID" = US."userID"	AND
    	HI."staticID" = ST."staticID"	AND
    	ST."ref" in ('MMM','AAA')        AND
    	HI."date" between date('2008-01-01') and date('2008-01-04')
    UNION
    SELECT
    	HI.*,
    	ST."country",
    	ST."ref",
    	US."userName"
    FROM
    	HISTORIC AS HI,
    	STATIC AS ST,
    	USERS AS US
    WHERE
    	US."userName" in ('test','bla') AND
    	HI."userID" = US."userID"	AND
    	HI."staticID" = ST."staticID"	AND
    	HI."symbol" in ('MMM', 'AAA')  AND 
            ST."country" = 'US'              AND
    	HI."date" between date('2008-01-01') and date('2008-01-04')
    and now it takes less than a second.

    Is there something more I can do with the query to make it go even faster (especially when increasing the number of elements in the "in" lists)?

    also I noticed that the optimizer plan changed quite drastically when running runstats, would it be sane to run runstats on the static tables after they are created each morning?

  5. #5
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    For any table that each day is dropped and recreated, and then loaded, you should run runstats after that process. I would recommend this syntax:

    runstats on table <table-name> with distribution on key columns and detailed indexes all;

    If the data for these tables (which you drop and recreate each day) is substantially the same each time after you load them, you could pull the stats with a db2look (using option to create update statements on the stats) and simply run the update stats each day (instead of actually doing a runstats).
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  6. #6
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    Is there something more I can do with the query to make it go even faster (especially when increasing the number of elements in the "in" lists)?
    Use UNION ALL instead of UNION. It saves DB2 the burden of sorting the resultset and removing all doubles.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  7. #7
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by Wim
    Use UNION ALL instead of UNION. It saves DB2 the burden of sorting the resultset and removing all doubles.
    That will work if you know there are no duplicate rows (based on specific knowledge of the data), but it changes the symantics of the original query in this case.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  8. #8
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    That will work if you know there are no duplicate rows (based on specific knowledge of the data), but it changes the symantics of the original query in this case.
    It would be possible to secure no duplication(and not changing the symantics of the original query) by modifiyng and adding extra predicates.
    (I don't know the performance of this updated query would be better than another queries or not.)

    For predicate:
    Code:
    	(
    		(HI."symbol" in ('MMM', 'AAA') AND ST."country" = 'US') OR
    		(ST."ref" in ('MMM','AAA'))
    	)
    Let
    P := HI."symbol" in ('MMM', 'AAA')
    Q := ST."country" = 'US'
    R := ST."ref" in ('MMM','AAA')
    The predicate is:
    (P AND Q) OR R
    := (P AND Q) OR ((P OR NOT P) AND R)
    := (P AND Q) OR ((P AND R) OR (NOT P AND R))
    := ((P AND Q) OR (P AND R)) OR (NOT P AND R)
    := (P AND (Q OR R)) OR (NOT P AND R)
    both predicates of outmost OR have no duplication, because they are ANDed with P and NOT P.

    Replace P, Q, and R with actual predicates,
    result
    (
    HI."symbol" in ('MMM', 'AAA') AND
    ( ST."country" = 'US' OR ST."ref" in ('MMM','AAA') )
    )
    OR
    (
    HI."symbol" NOT in ('MMM', 'AAA') AND
    ST."ref" in ('MMM','AAA')
    )

    Query using UNION ALL would be
    Code:
    SELECT
    	.....
    FROM
    	.....
    WHERE
    	US."userName" in ('test','bla') AND
    	HI."userID" = US."userID"	AND
    	HI."staticID" = ST."staticID"	AND
    	HI."symbol" in ('MMM', 'AAA')	AND
    	( ST."country" = 'US' OR ST."ref" in ('MMM','AAA') )	AND
    	HI."date" between date('2008-01-01') and date('2008-01-04')
    UNION ALL
    SELECT
    	.....
    FROM
    	.....
    WHERE
    	US."userName" in ('test','bla') AND
    	HI."userID" = US."userID"	AND
    	HI."staticID" = ST."staticID"	AND
    	HI."symbol" NOT in ('MMM', 'AAA')	AND
    	ST."ref" in ('MMM','AAA')	AND
    	HI."date" between date('2008-01-01') and date('2008-01-04')

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •