Results 1 to 11 of 11
  1. #1
    Join Date
    Aug 2012
    Posts
    45

    Unanswered: Best method for date comparison

    Greetings all. I have LUW 9.2 on my windows machine, and our mainframe is running z/OS on an IBM z114. In just about every table where there is a date or timestamp column, the columns are defined as CHAR(10) and CHAR(26) respectively, and I'm looking for the most efficient way to evaluate date expressions in a WHERE clause. It would seem to me it would be best to store a date value as a date format, but our system has been around since the 1990's, and I can't speak to why dates are stored as strings. I'm coming from the SQL Server world, so I don't know what you would call the DB2 equivalent of SQL Server's execution plan, but what ever it is, I don't have access to it, so I can't do any testing. I have noticed that if I need to select rows from a table where some dt_col > '2012-01-01', say, when there are many millions of rows in the table the queries seem to run faster if I use DATE(dt_col) > '2012-01-01'. I am wondering if this is because without the DATE() function, there is a difference in how DB2 does the date comparison. Today I tried the same thing with a timestamp column, and it seemed to run a bit faster if I used the TIMESTAMP() function. Now, I am also wondering if maybe some part of the access path was cached from the first couple of times I ran the query, and maybe that accounted for what I perceived to be a decrease in execution time. Using the below as an example, would anyone be willing to explain which where clause is most efficient, and why?

    Code:
    WITH Dates (TS_as_string) AS
    	(SELECT CAST('2013-04-01 00:00:00.000000' AS CHAR(26)) FROM SYSIBM.SYSDUMMY1 UNION
    	 SELECT '2013-04-01 12:00:00.000000' FROM SYSIBM.SYSDUMMY1 UNION
    	 SELECT '2013-04-01 20:00:00.000000' FROM SYSIBM.SYSDUMMY1)
    SELECT
    	TS_as_string
      FROM Dates
     WHERE TS_as_string > '2013-04-01 01:00:00.000000' --<<<< Is this okay?
     -- WHERE TIMESTAMP(TS_as_string) > '2013-04-01 01:00:00.000000' --<<<< Or is this more efficient?
     -- WHERE TIMESTAMP(TS_as_string) > TIMESTAMP('2013-04-01 01:00:00.000000') --<<<< Or maybe this?
    Thank you

    Greg

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    You are making it too complicated. Dates/Timestamps are only strings within an application program. Within an SQL statement, no need to convert them to anything.

    Code:
    create table test_table (
    test_id integer not null,
    created_ts timestamp not null,
    last_update_ts timestamp);
    
    Select test_id from test_table where last_update_ts > current_timestamp - 2 days;
    
    Select test_id from test_table where last_update_ts > '2013-04-01-00.27.00.000000';
    [always use ISO format for date and timestamp in an SQL statement if possible]
    
    Select test_id from test_table where created_ts < (last_update_ts - 7 days):
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Marcus, I think you missed a key point from the original post.
    Quote Originally Posted by gsnidow View Post
    In just about every table where there is a date or timestamp column, the columns are defined as CHAR(10) and CHAR(26) respectively, and I'm looking for the most efficient way to evaluate date expressions in a WHERE clause.
    With that said, I have to say that choosing to store dates as character values strikes me as a very poor practice. This leads me to some questions:

    • Are the character values consistently formatted (ISO preferred but consistent required)?
    • Are the columns indexed?
    • Good heavens, WHY USE CHARACTERS???

    Depending on your answers, there are a couple of ideas that I can offer but I'll hold those until I know enough to be specific!

    You have my condolences!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  4. #4
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by Pat Phelan View Post
    Marcus, I think you missed a key point from the original post.

    With that said, I have to say that choosing to store dates as character values strikes me as a very poor practice. This leads me to some questions:

    • Are the character values consistently formatted (ISO preferred but consistent required)?
    • Are the columns indexed?
    • Good heavens, WHY USE CHARACTERS???

    Depending on your answers, there are a couple of ideas that I can offer but I'll hold those until I know enough to be specific!

    You have my condolences!

    -PatP
    I am not sure he is storing them as CHAR in DB2. Those are the correct host variable definitions inside an application program.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  5. #5
    Join Date
    Aug 2012
    Posts
    45
    Quote Originally Posted by Pat Phelan View Post
    Marcus, I think you missed a key point from the original post.

    With that said, I have to say that choosing to store dates as character values strikes me as a very poor practice. This leads me to some questions:

    • Are the character values consistently formatted (ISO preferred but consistent required)?
    • Are the columns indexed?
    • Good heavens, WHY USE CHARACTERS???

    Depending on your answers, there are a couple of ideas that I can offer but I'll hold those until I know enough to be specific!

    You have my condolences!

    -PatP
    Pat, first, the answers to your questions.
    1) Yes, dates are always YYYY-MM-DD, and timestamps are always YYYY-MM-DD HH:MMS.123456 (the little face thingy is not intended, it should be "colonS"). Always 10 and 26 characters. Now, I don't have a way to check what DB2 says they are, but in our documentation, they are defined as char. And, we actually do have a few fields that were added more recently, and they are defined as date. I have not seen any timestamp columns defined as datetime. It really defies logic in my humble opinion. I mean, back in the 90's I would think every iota of disk space that could be saved would be highly coveted, and it would make more sense to store the dates as dates and not strings. But, alas, I can't change what happened almost 20 years ago.

    2) The timestamps are always indexed from what I can see. And again, I don't have the ability to go looking at the indexes, I'm only relying on the documentation, and can sort of verify by looking at the sort order of a result set when no order by is given.

    3) I could not speak to that one. There is all kinds of stuff in there that seems to defy some basic rules.

    I hope this gives you enough to go on. Thank you.

    Greg

  6. #6
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by gsnidow View Post
    Pat, first, the answers to your questions.
    1) Yes, dates are always YYYY-MM-DD, and timestamps are always YYYY-MM-DD HH:MMS.123456 (the little face thingy is not intended, it should be "colonS"). Always 10 and 26 characters. Now, I don't have a way to check what DB2 says they are, but in our documentation, they are defined as char. And, we actually do have a few fields that were added more recently, and they are defined as date. I have not seen any timestamp columns defined as datetime. It really defies logic in my humble opinion. I mean, back in the 90's I would think every iota of disk space that could be saved would be highly coveted, and it would make more sense to store the dates as dates and not strings. But, alas, I can't change what happened almost 20 years ago.

    2) The timestamps are always indexed from what I can see. And again, I don't have the ability to go looking at the indexes, I'm only relying on the documentation, and can sort of verify by looking at the sort order of a result set when no order by is given.

    3) I could not speak to that one. There is all kinds of stuff in there that seems to defy some basic rules.

    I hope this gives you enough to go on. Thank you.

    Greg
    I think you need to get the DDL for the table and post it here:
    db2look -d <database-name> -t <table-name> -e
    I think this will work on a DB2 z/OS table if accessed via a DB2 Windows/Linux client with DB2 Connect (or gateway).

    If that does not work, use the DB2 Windows Control Center or Data Studio to view the DDL (when connected to mainframe DB2).

    If neither of the above works, ask a DBA for the table DDL. Forget about "documentation".

    Also, there is no datetime in DB2, only the following:
    • date
    • time
    • timestamp


    However, it is true that DB2 z/OS Version 1 did have date or timestamp columns, but no one would have used char(10) for a date back in those days.
    Last edited by Marcus_A; 04-03-13 at 11:43.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  7. #7
    Join Date
    Aug 2012
    Posts
    45
    Quote Originally Posted by Marcus_A View Post
    Also, there is no datatime in DB2, only the following:
    • date
    • time
    • timestamp


    However, it is true that DB2 z/OS Version 1 did have date or timestamp columns, but no one would have used char(10) for a date back in those days.
    Thank you Marcus. I will see if I can get DDL. As for the datetime, I was thinking of SQL Server, my bad. And, I should have indicated datetime2, which is YYYY-MM-DD HH;MM;DD.123456 (colons intentionally replaced with semi-colons to avoid the emoticon popping up). That is what is called timestamp in DB2, I'm gathering.

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    If you can run queries against the DB2 instance you can probably get the information that we need to keep this conversation moving using:
    Code:
    SELECT TBCREATOR, TBNAME, NAME
    ,  COLTYPE, LENGTH, SCALE
       FROM SYSIBM.SYSCOLUMNS
       WHERE  'YourTableName' = TBNAME
    If you already have a linked server set up within Microsoft SQL, you can use:
    Code:
    SELECT *
       FROM  OpenQuery(LinkedServerDB2, 'SELECT TBCREATOR, TBNAME, NAME
    ,     COLTYPE, LENGTH, SCALE
          FROM SYSIBM.SYSCOLUMNS
          WHERE  'YourTableName' = TBNAME) AS z
    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  9. #9
    Join Date
    Aug 2012
    Posts
    45
    Quote Originally Posted by Pat Phelan View Post
    Code:
    SELECT TBCREATOR, TBNAME, NAME
    ,  COLTYPE, LENGTH, SCALE
       FROM SYSIBM.SYSCOLUMNS
       WHERE  'YourTableName' = TBNAME
    Pat, you are indeed correct, I am able to run the above. Also, either you or Marcus indicated above I should not rely on the documentation, and this proved to be a very apt statement. Below is the column information

    TS_ROW_CREATE TIMESTMP 10 0

    It is indeed a timestamp. That really makes me wonder about all the documentation we have out there. I also find it interesting that many of the system table are basically the same as SQL Server. I guess IBM really is the father of them all. With that question cleared up, is there any merit to my assumption that some part of the access path is cached, making it seem my queries may be running faster when I employ a TIMESTAMP() or DATE() function? At this point I'm thinking the execution time is a simple function of the number of rows, usually in the tens of millions in each table. Thanks to both of you for the insight.

    Greg

  10. #10
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Every SQL database that I've used has exactly the same answer for query speed: It depends (with a tip of the cap to Paul Randal).

    DB2 running on zOS tends to have databases that are staggeringly large by Microsoft SQL standards even today. The underlying structures (even down to VSAM and SMS) are significantly more complex, as is the underlying hardware architecture. The sheer number of "moving pieces" in this kind of puzzle make it hard to precisely predict performance. When you add the number of users/processes/etc. that typically run on a zOS machine, the complexity ramps up another notch or three.

    If you could get your local systems programmer and the lead DBA, give them some good tools to monitor/predict workload, and a lot of time to work with the system to understand how it behaves under load, then there would be a good chance that together they could give you accurate predictions of query run time. That is a pot full of time, money, etc. to invest in an answer that probably isn't all that helpful in the grand scheme of things!

    As with most databases, if you understand the plan that the database uses to process a query you may well be able to coax better performance by tweaking that query. For your CTE example, that won't help much but for queries against tables with tens of millions of rows there is some hope for improvement!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  11. #11
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by gsnidow View Post
    Pat, you are indeed correct, I am able to run the above. Also, either you or Marcus indicated above I should not rely on the documentation, and this proved to be a very apt statement. Below is the column information

    TS_ROW_CREATE TIMESTMP 10 0

    It is indeed a timestamp. That really makes me wonder about all the documentation we have out there. I also find it interesting that many of the system table are basically the same as SQL Server. I guess IBM really is the father of them all. With that question cleared up, is there any merit to my assumption that some part of the access path is cached, making it seem my queries may be running faster when I employ a TIMESTAMP() or DATE() function? At this point I'm thinking the execution time is a simple function of the number of rows, usually in the tens of millions in each table. Thanks to both of you for the insight.

    Greg
    1. For best performance do not use TIMESTAMP() or DATE() function unless you need to. Please refer to my query samples I posted above on how to use dates and timestamps within an SQL statement.
    2. The reason your doc says CHAR(10) for DATE and CHAR(26) for TIMESTAMP is that those are the correct data types and lengths when the values are returned to a application program such as COBOL, java, C, etc. But when you are working with inside of an SQL statement, you should use the native data type whenever possible and not convert them to strings and back unless absolutely required.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •