Results 1 to 6 of 6
  1. #1
    Join Date
    Jul 2004
    Posts
    5

    Unanswered: Performance Issues with Java Prepared Statments and Timestamps

    Hello everybody,

    in our company's production environment, we are suffering from severe performance problems with our DB2 database. Our Setup is the following:

    Fujitsu Siemens Server with 4 CPUs (2GHz Xeon), 8 GB RAM
    DB2 Workgroup Edition 8.1.5
    Windows 2000 Advanced Server.

    While tuning our web application, which is written in Java, I noticed a strange issue with Prepared Statements: When we are using a Prepared Statement which has a Timestamp as a host variable, it takes significantly longer to execute the statement than without the host variable.

    For illlustration, I wrote the example program below. The Statement without the host variable takes about 100 ms in a test environment to execute 10 times. (I filled the table with 15000 rows with test data). The other statement takes more than one second to execute 10 times.

    Why is this so and what can be done???

    Any help is appreciated, I'm despreate!!!

    Greetings, Beegee.

    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.PreparedStatement;
    import java.sql.Timestamp;
    import java.util.Calendar;
    import java.util.Properties;

    public class Main {

    public static void main(String[] args) {

    try {
    System.out.println("Connecting...");
    Class.forName("com.ibm.db2.jcc.DB2Driver");
    Connection con = DriverManager.getConnection("jdbc:db2://localhost:50000/test_tcp","db2admin","db2admin");

    System.out.println("Done.");

    // PreparedStatement ps = con.prepareStatement("select id from table1 where created_at < '2004-07-22-02.20.00.000000'");
    PreparedStatement ps = con.prepareStatement("select * from table1 where created_at < ?");

    Calendar c = Calendar.getInstance();
    c.set(2004, 07, 22, 2, 20, 0);
    long ts = c.getTimeInMillis();

    long start = Calendar.getInstance().getTimeInMillis();

    ps.setTimestamp(1, new Timestamp(ts));

    for( int i = 0; i < 100; i++)
    ps.execute();

    long end = Calendar.getInstance().getTimeInMillis();

    System.out.println("Execute lasted " + (end-start) + " ms.");
    }
    catch( Exception e ) {
    e.printStackTrace();
    }
    }
    }

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    I know very little about Java, but I would try one of the following:

    1. If all 10 inserts need to have the same timestamp, then issue the following SQL statement to retrieve it:

    SELECT current timestamp from SYSCAT.DUMMY1

    2. If each insert can have its own timestamp, then use "current timestamp" instead of the host variable, or define the column as NOT NULL WITH DEFAULT and do not insert that column (DB2 will put the current timestamp in the column when a row is inserted).
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Jul 2004
    Posts
    5
    Hi Marcus,

    thanks for your answer. If I get you right, you suggest not to use timestamps as host variables.

    However, this is no valid solution for our data model. We use timestamps in a number of tables and users connect to the database through a web application.

    Therefore, we must be flexible with parameters in statements issued by the web application, since we can't hardcode any possible combination of parameters into the application.

    I don't want to emphasize the Java aspect to much, so, generally speaking, to avoid the prepare/execute model isn't a valid option either, because the application's persitence framework works with prepared statements exclusively.

    Although I didn't try with ODBC or a stored procedure yet (anything that can do prepare/execute), I don't think this is a real Java issue, but more a matter of the way DB2 manages prepared statements with host variables (which we just can't avoid).

    Again, thanks for your answer.

    Rüdiger.

  4. #4
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Looks like optimizer tricks to me. When you supply a literal timestamp the optimizer seems to be selecting a more efficient plan than in the case where the timestamp value is not known at the preparation time.

    Try comparing the execution plans for both statements.

  5. #5
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    No, I wasn't suggesting that you not use host variables for a timestamp on a query. I thought the problem was in retrieving the system date (I told you I didn't know much about Java). If your problem SQL statement is:

    select * from table1 where created_at < ?

    then in some cases DB2 will make certain assumptions about what percent of the table needs to be read to satisfy this query. If the value is actually hard-coded (instead of host variable), then DB2 has a better idea of the number of rows that might be returned.

    I assume you have an index on "created_at"? If so, then DB2 might not use the index with a less than (<) predicate unless you give it some hints that the number of rows returned will be relatively small. If the number of rows returned is relatively large (for example, half the table), and the table is not clustered on "create-at", then DB2 may not use the index on this column becasue it faster to scan the whole table without the index.

    First, make sure your table is reorged on a reasonable interval. If your table has a clustering index, then you should reorg on that index. You may have the wrong clustering index defined, but that is another matter.

    Next, make sure you do comprehensive runstats on the table and key columns with distribution of key columns. This is very important.

    For your SQL statement, you can try adding the optimize clause:

    select * from table1 where created_at < ? optimize for 1 row

    This will tell the optimizer that, even though you are using a less than predicate (<), the number or rows returned is small and that DB2 should use an index on that column (if there is one).

    You can test the access paths with the visual explain to see how it will work, with and without the optimize clause (but make sure you do the reorg and runstats with distribution).

    Let us know how it works out.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  6. #6
    Join Date
    Jul 2004
    Posts
    5

    Smile Distribution on key columns rules!

    Thanks for your most valuable insight . The distribution on key columns did the trick.

    Although our production crew does runstats where necessary on a regularly basis, there was quite a cluster ratio on on of the concerned tables.
    When I did the runstats in my test environment (after copying the data from the production database via the export/load utitlity), the performance gain was 80% and the gap between preparing the statement with and without host variables has completely vanished!

    Greetings, Beegee

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •