Page 1 of 2 12 LastLast
Results 1 to 15 of 21
  1. #1
    Join Date
    Jun 2012
    Posts
    9

    Unanswered: Need Help - SQL query in cleansing the data

    Hi, I have to write a query to resolve the following problem and would appreciate this form help.

    The table columns are like this:

    mem_id, enrol_dt, term_dt, updt_ts. The mem_id has multiple entries. One example is:

    mem_id enrol_dt term_dt updt_date_ts
    2001 12/05/2011 NULL 12/06/2011 09:45:30
    2001 05/05/2012 NULL 05/10/2012 10:40:30

    The term_dt has NULL on both the records. Here the term_dt with the old updt_date_ts value should become the recent enrol_dt - 1. So the term_dt for the old record would be 05/04/2012. I am kind of new with DB2, tried with simillar Oracle queries and it is not giving the result.

    I would appreciate your help.

  2. #2
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    What DB2 version/release/fix pack and platform OS are you using?

    Too little sample data.
    Please publish more sample/test data(three or more mem_id, some have one row for a mem_id, some have two or more rows for a mem_id)
    and expected results from the sample/test data.

    tried with simillar Oracle queries and it is not giving the result.
    Please publish cocrete/exact query you tried.
    Did you got error message? Or, did you got unexpected results?

  3. #3
    Join Date
    Jun 2012
    Posts
    9
    Thanks for the response. The DB2 version is V9.7.500.702. There are almost 200 mem_id's with 2 records where the term_dt on the both the records are null. I want to write a query in which the term_dt of the old record based on the updt_date_ts should become recent enrol_dt of that mem_id - 1 day. The logic I have is:
    select mem_id, term_dt from table where term_dt is NULL group by mem_id having updt_date_ts=min(updt_date_ts). Here I want to add the term_dt as enrol_dt - 1 day from the latest updt_date_ts but I am stuck in between. Thanks for your help in advance.

  4. #4
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    There are almost 200 mem_id's with 2 records where the term_dt on the both the records are null.
    Do you mean that there are another group of mem_id's(e.g. one recoerd for a mem_id, two records for a mem_id at least one term_dt is not null, more than three records for a mem_id, ...)
    or all mem_id's are with 2 records where the term_dt on the both the records are null?

    Please extract some data(I hope twenty or thirty records) for sample/test data, and publish the data and expected results from the data.
    The easy way to supply sample data may be one of these...
    (1) by a common-table-expression, like
    WITH
    sample_data(mem_id , enrol_dt , term_dt , updt_date_ts) AS (
    VALUES
    (... , ... , ... , ....)
    , (... , ... , ... , ....)
    , (... , ... , ... , ....)
    ...
    )

    (2)A CREATE TABLE statement and an INSERT statement.


    This request(or wish) is to reduce number of exchanges with you,
    by supplying sample queries which would work well and satisfy your full requirements considering most of exceptional data.



    By the way,
    I'm interesting how your query was working on Oracle.
    You wrote
    tried with simillar Oracle queries ...
    Last edited by tonkuma; 06-12-12 at 12:11.

  5. #5
    Join Date
    Nov 2011
    Posts
    334
    maybe tonkuma will give a better answner, but i think you can use the lag function to do this:
    Code:
    update ( select t.*, 
           row_number()  over ( partition by mem_id order by updt_date_ts desc) as rn,
           lag(enrol_dt) over ( partition by mem_id order by updt_date_ts desc ) - 1 days as prev_entrol_dt 
           from table t) 
     set term_dt = prev_entrol_dt
     where rn = 2

  6. #6
    Join Date
    Jun 2012
    Posts
    9
    Hi Tonkuma,

    My query never worked and I was stuck in the middle on resolving this. To answer your question, all mem_id's are with 2 records where the term_dt on the both the records are null.

  7. #7
    Join Date
    Jun 2012
    Posts
    9
    Hi Fengsun,

    Thanks for the help. Will try and update you

  8. #8
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    Quote Originally Posted by SAN2004 View Post
    Hi Tonkuma,

    My query never worked and I was stuck in the middle on resolving this. To answer your question, all mem_id's are with 2 records where the term_dt on the both the records are null.
    Did your query worked on Oracle? Or, you couldn't a solution on Oracle?

    If "all mem_id's are ... the term_dt ... are null",
    why did you showed such query in your earlier post?
    select mem_id, term_dt from table where term_dt is NULL ...
    It suggested me the existence of non-null values of term_dt.


    I want to confirm that the following two constraints are guaranteed in your table by someways.
    (1) All term_dt are null.
    (2) All mem_id's have two rows.
    Last edited by tonkuma; 06-12-12 at 13:44.

  9. #9
    Join Date
    Jun 2012
    Posts
    9
    Yes, you are right. When I query it gives the result shows mem_id and NULL values on the term_dt

  10. #10
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    You didn't answer to my questions...
    Quote Originally Posted by tonkuma View Post
    ...
    ...

    I want to confirm that the following two constraints are guaranteed in your table by someways.
    (1) All term_dt are null.
    (2) All mem_id's have two rows.
    Even if they were obvious for you, I don't know without your answer.

    And third question is
    (3)
    If For all mem_id
    (enrol_dt in newer updt_date_ts) >= (enrol_dt in older updt_date_ts), like
    Code:
    MEM_ID      ENROL_DT   TERM_DT    UPDT_DATE_TS              
    ----------- ---------- ---------- --------------------------
           2001 2011-12-05 -          2011-12-06-09.45.30.000000
           2001 2012-05-05 -          2012-05-10-10.40.30.000000
           2002 2012-04-05 -          2012-04-10-09.45.30.000000
           2002 2012-04-06 -          2012-04-11-10.40.30.000000
    then answer 'Yes'.

    or
    If For some mem_id
    (enrol_dt in newer updt_date_ts) < (enrol_dt in older updt_date_ts), like
    Code:
    MEM_ID      ENROL_DT   TERM_DT    UPDT_DATE_TS              
    ----------- ---------- ---------- --------------------------
           2001 2011-12-05 -          2011-12-06-09.45.30.000000
           2001 2012-05-05 -          2012-05-10-10.40.30.000000
           2002 2012-04-08 -          2012-04-10-09.45.30.000000
           2002 2012-04-03 -          2012-04-13-10.40.30.000000
    then answer 'No'


    Please answer to my three questions by Yes/No.
    (1) Yes/No
    (2) Yes/No
    (3) Yes/No

    These questions may be related to simplicity of queries and execution performance, in my mind.

  11. #11
    Join Date
    Feb 2008
    Location
    Japan
    Posts
    3,483
    I tried some queries using very small test data on DB2 9.7.5 on Windows/XP.

    Test data:
    Code:
    ------------------------------ Commands Entered ------------------------------
    CREATE TABLE SAN2004.test_table
    ( mem_id       INTEGER   NOT NULL
    , enrol_dt     DATE      NOT NULL
    , term_dt      DATE
    , updt_date_ts TIMESTAMP NOT NULL
    , PRIMARY KEY (mem_id , enrol_dt)
    );
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    
    ------------------------------ Commands Entered ------------------------------
    INSERT INTO SAN2004.test_table
    VALUES
      ( 2001 , '12/05/2011' , NULL , TIMESTAMP('12/06/2011' , '09:45:30') )
    , ( 2001 , '05/05/2012' , NULL , TIMESTAMP('05/10/2012' , '10:40:30') )
    
    , ( 2002 , '04/05/2012' , NULL , TIMESTAMP('04/10/2012' , '09:45:30') )
    , ( 2002 , '04/06/2012' , NULL , TIMESTAMP('04/11/2012' , '10:40:30') );
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    
    ------------------------------ Commands Entered ------------------------------
    SELECT * FROM SAN2004.test_table;
    ------------------------------------------------------------------------------
    
    MEM_ID      ENROL_DT   TERM_DT    UPDT_DATE_TS              
    ----------- ---------- ---------- --------------------------
           2001 2011-12-05 -          2011-12-06-09.45.30.000000
           2001 2012-05-05 -          2012-05-10-10.40.30.000000
           2002 2012-04-05 -          2012-04-10-09.45.30.000000
           2002 2012-04-06 -          2012-04-11-10.40.30.000000
    
      4 record(s) selected.

    Assuming
    (2) All mem_id's have two rows.
    Yes.

    Example 1: fengsun2's query.
    Code:
    ------------------------------ Commands Entered ------------------------------
    update ( select t.*,
           row_number()  over ( partition by mem_id order by updt_date_ts desc) as rn,
           lag(enrol_dt) over ( partition by mem_id order by updt_date_ts desc ) - 1 days as prev_entrol_dt 
           from SAN2004.test_table t) 
     set term_dt = prev_entrol_dt
     where rn = 2;
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    Access plan of Example 1.
    Code:
    Original Statement:
    ------------------
    update
       (select t.*, row_number() over (partition by mem_id order by updt_date_ts
               desc) as rn, lag(enrol_dt) over (partition by mem_id order by
               updt_date_ts desc) - 1 days as prev_entrol_dt
       from SAN2004.test_table t) set term_dt = prev_entrol_dt
    where rn = 2
    
    
    Optimized Statement:
    -------------------
    ...
    ...
    
    Access Plan:
    -----------
            Total Cost:             37.8931
            Query Degree:           1
    
                      Rows
                     RETURN
                     (   1)
                      Cost
                       I/O
                       |
                        2
                     UPDATE
                     (   2)
                     37.8931
                        5
                   /---+----\
                  2            4
               FETCH    TABLE: SAN2004
               (   3)     TEST_TABLE
               22.7547        Q1
                  3
             /---+----\
            2            4
         FILTER   TABLE: SAN2004
         (   4)     TEST_TABLE
         7.61338
            1
           |
            4
         TBSCAN
         (   5)
         7.60539
            1
           |
            4
         SORT
         (   6)
         7.60448
            1
           |
            4
         TBSCAN
         (   7)
         7.60195
            1
           |
            4
     TABLE: SAN2004
       TEST_TABLE
           Q2

    I tried to remove ORDER BY clause.

    Assuming
    (2) All mem_id's have two rows.
    Yes.
    (3) For all mem_id
    (enrol_dt in newer updt_date_ts) >= (enrol_dt in older updt_date_ts)
    Yes.

    Example 2:
    Code:
    ------------------------------ Commands Entered ------------------------------
    UPDATE
          (SELECT t.*
                , MIN(updt_date_ts)
                     OVER( PARTITION BY mem_id ) AS old_updt_date_ts
                , MAX(enrol_dt)
                     OVER( PARTITION BY mem_id ) AS recent_enrol_dt
            FROM  SAN2004.test_table t
          )
       SET term_dt = recent_enrol_dt - 1 DAY
     WHERE updt_date_ts = old_updt_date_ts
    ;
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    Access plan of Example 2.
    The cost is less than Example 1 and using an index(corresponding to primary key).
    Code:
    Original Statement:
    ------------------
    UPDATE
       (SELECT t.* , MIN(updt_date_ts) OVER(PARTITION BY mem_id) AS
               old_updt_date_ts , MAX(enrol_dt) OVER(PARTITION BY mem_id) AS
               recent_enrol_dt
       FROM SAN2004.test_table t) SET term_dt = recent_enrol_dt - 1 DAY
    WHERE updt_date_ts = old_updt_date_ts
    
    
    Optimized Statement:
    -------------------
    ...
    ...
    
    Access Plan:
    -----------
            Total Cost:             10.0366
            Query Degree:           1
    
                               Rows
                              RETURN
                              (   1)
                               Cost
                                I/O
                                |
                               0.16
                              UPDATE
                              (   2)
                              10.0366
                               1.32
                            /---+----\
                         0.16           4
                        FETCH    TABLE: SAN2004
                        (   3)     TEST_TABLE
                        8.82556        Q1
                         1.16
                      /---+----\
                   0.16           4
                  FILTER   TABLE: SAN2004
                  (   4)     TEST_TABLE
                  7.61337
                     1
                    |
                     4
                  FETCH
                  (   5)
                  7.60562
                     1
               /----+----\
              4             4
           IXSCAN    TABLE: SAN2004
           (   6)      TEST_TABLE
          0.0331883        Q2
              0
             |
              4
       INDEX: SYSIBM
     SQL120613054452310
             Q2

    Assuming
    (2) All mem_id's have two rows.
    Yes.
    (3) For all mem_id
    (enrol_dt in newer updt_date_ts) >= (enrol_dt in older updt_date_ts)
    No.

    Replace test data:
    Code:
    ------------------------------ Commands Entered ------------------------------
    DELETE FROM SAN2004.test_table;
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    
    ------------------------------ Commands Entered ------------------------------
    INSERT INTO SAN2004.test_table
    VALUES
      ( 2001 , '12/05/2011' , NULL , TIMESTAMP('12/06/2011' , '09:45:30') )
    , ( 2001 , '05/05/2012' , NULL , TIMESTAMP('05/10/2012' , '10:40:30') )
    
    , ( 2002 , '04/08/2012' , NULL , TIMESTAMP('04/10/2012' , '09:45:30') )
    , ( 2002 , '04/03/2012' , NULL , TIMESTAMP('04/13/2012' , '10:40:30') );
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    Example 3:
    Code:
    ------------------------------ Commands Entered ------------------------------
    UPDATE
          (SELECT t.*
                , MIN(updt_date_ts)
                     OVER( PARTITION BY mem_id ) AS old_updt_date_ts
                , MAX( CASE updt_date_ts
                       WHEN MAX(updt_date_ts)
                               OVER( PARTITION BY mem_id )
                       THEN enrol_dt
                       END
                     )
                     OVER( PARTITION BY mem_id ) AS recent_enrol_dt
            FROM  SAN2004.test_table t
          )
       SET term_dt = recent_enrol_dt - 1 DAY
     WHERE updt_date_ts = old_updt_date_ts
    ;
    ------------------------------------------------------------------------------
    DB20000I  The SQL command completed successfully.
    Access plan of Example 3.
    The cost is a little higher than Example 2 and using an index(corresponding to primary key).
    Code:
    Original Statement:
    ------------------
    UPDATE
       (SELECT t.* , MIN(updt_date_ts) OVER(PARTITION BY mem_id) AS
               old_updt_date_ts , MAX(
       CASE updt_date_ts
       WHEN MAX(updt_date_ts) OVER(PARTITION BY mem_id)
       THEN enrol_dt END) OVER(PARTITION BY mem_id) AS recent_enrol_dt
       FROM SAN2004.test_table t) SET term_dt = recent_enrol_dt - 1 DAY
    WHERE updt_date_ts = old_updt_date_ts
    
    
    Optimized Statement:
    -------------------
    ...
    ...
    
    Access Plan:
    -----------
            Total Cost:             10.0437
            Query Degree:           1
    
                               Rows
                              RETURN
                              (   1)
                               Cost
                                I/O
                                |
                               0.16
                              UPDATE
                              (   2)
                              10.0437
                               1.32
                            /---+----\
                         0.16           4
                        FETCH    TABLE: SAN2004
                        (   3)     TEST_TABLE
                        8.83261        Q1
                         1.16
                      /---+----\
                   0.16           4
                  FILTER   TABLE: SAN2004
                  (   4)     TEST_TABLE
                  7.62042
                     1
                    |
                     4
                  FETCH
                  (   5)
                  7.60562
                     1
               /----+----\
              4             4
           IXSCAN    TABLE: SAN2004
           (   6)      TEST_TABLE
          0.0331883        Q2
              0
             |
              4
       INDEX: SYSIBM
     SQL120613054452310
             Q2
    Last edited by tonkuma; 06-13-12 at 14:11. Reason: Replace Example 2(add - 1 DAY to a SET clause).

  12. #12
    Join Date
    Nov 2011
    Posts
    334
    yeah, it is good job, especially for example 2 !
    the cost of example 2 and 3 is less than my SQL。
    But how about the answer of question 2 is no。There might be more than 2 rows
    in one mem_id partition.
    I can change my sql slightly to perform that:
    Code:
    update ( select t.*,
           row_number()  over ( partition by mem_id order by updt_date_ts desc) as rn,
           lag(enrol_dt) over ( partition by mem_id order by updt_date_ts desc ) - 1 days as prev_entrol_dt 
           from SAN2004.test_table t) 
     set term_dt = prev_entrol_dt
     where rn > 1;

  13. #13
    Join Date
    Jun 2012
    Posts
    9
    Hi Tonkuma,

    Sorry for the delay in response as I was off yesterday. The enrol_dt is always equal or greater than the old enrol_dt. If the date is same then the differentiation factor is the update_date_ts. We are going to implement SCD type 2 date range once we correct the data so that in the future this problem can be avoided. Thanks a bunch for your time and dedication on this, I really mean it. I will try the queries and will get back to you.

    Thanks a lot again

  14. #14
    Join Date
    Jun 2012
    Posts
    9
    Hi,

    When I try to use the MIN(updt_date_ts) OVER ( PARTITION BY mem_id ) I am getting the following error, please help:

    ERROR: ATTEMPT TO USE NEW FUNCTION BEFORE NEW FUNCTION MODE
    SQLCODE = -4700,
    SQLSTATE=56038, DRIVER=3.63.108

  15. #15
    Join Date
    Jun 2012
    Posts
    9
    Hi,

    Looks like DB2 ZOS doesn't support OLAP functions. Can you please suggest any other solution. Thanks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •