Results 1 to 12 of 12
  1. #1
    Join Date
    May 2008
    Posts
    6

    Unanswered: Database Design - Logging History

    Hello,

    I was wondering if someone could help me with a design of a database. Everyday I run certain scripts that update the database with certain information. This information is recorded in my tables with a "date" attached to every entry. The problem with this is that this information is usually the same from day to day, and is just copied in my tables with a new "date" field.

    Although this is working for me right now, as you may imagine, the database fills up relatively quickly. I was wondering if it is possible to "track" the changes to the current data everyday, without having to copy the non-changing data everytime.

    I, however, want to be able to run queries on the data at any point in history. For example, I want to be able to view the "status" of the database in say Dec 1st, 2005.

    If you guys have any suggestions please let me know,

    Thanks

  2. #2
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by petachok
    I want to be able to run queries on the data at any point in history. For example, I want to be able to view the "status" of the database in say Dec 1st, 2005.
    This is certainly possible.
    An example:
    Your "base" table has two columns, "pk" (not changing) and "data". Rows can be inserted, deleted, and updated (just field "data").
    Create a "history" table with four columns: "pk", "timestamp", "oldval", "code".

    An insert trigger on the base table will insert the following row in the history table: (base.pk, current timestamp, NULL, 'I').
    A delete trigger will insert (base.pk, current timestamp, base.data, 'D').
    An update trigger will insert (base.pk, current timestamp, base.data, 'U').
    In both cases, "base.data" is of course the old, no longer existing value.

    From the base table today, and the history table with info from (say) three years, any point in time for any base.pk can be reconstructed by starting from the current row in base and traversing the history table (only the entries with that pk) in reverse chronological order.
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  3. #3
    Join Date
    May 2008
    Posts
    6
    Thanks Peter,

    Would you be able to suggest how one would go about "reconstructing" the data from the history table? I understand that I could create an object in a program that collects the data from the base table, then goes through the history table and updates the object. But then I wouldn't be able to use SQL to query that data, I'd have to do it from within the program.

    Is there an SQL way to reconstruct a temporary historical base table and then just query into it?

    Thanks for your help

  4. #4
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by petachok
    Would you be able to suggest how one would go about "reconstructing" the data from the history table?
    Sure.
    We have a stored procedure with a query doing exactly this.
    Actually, it stores one more type of "history" entries viz. "notification" entries (code 'N'); the stored procedure lists all pk's with their data at the time of last notification and with the current data, and this only for those pk's for which the data changed.
    If accepted at IDUG, you'll be able to hear all details in Washaw in October ;-)

    Roughly speaking, for the simplified setup in my previous posting, the query would be something like:
    Code:
    SELECT oldval, timestamp
    FROM   history_table
    WHERE  pk = :pk
      AND  timestamp >= :timestamp_requested
    UNION ALL
    SELECT data, CURRENT TIMESTAMP
    FROM   base_table
    WHERE  pk = :pk
    ORDER BY 2 
    FETCH FIRST ROW ONLY
    where : pk and :timestamp_requested are the user input, and the returned value is the corresponding data at that time. If you want a full list of pk's (and their old and current data) for the changed ones only, the following query would do:
    Code:
    SELECT t.pk, h.olddata, t.data
    FROM   history_table h INNER JOIN base_table t ON t.pk = h.pk
    WHERE  h.timestamp =
           ( SELECT MIN(timestamp)
             FROM   history_table
             WHERE  pk = t.pk
               AND  code <> 'N'
               AND  timestamp > (SELECT MAX(timestamp)
                                 FROM   history_table
                                 WHERE  code = 'N')
           )
    (Just from memory; there could be some incorrectnesses here and there...)
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  5. #5
    Join Date
    Nov 2004
    Posts
    1,427
    Provided Answers: 4
    An insert trigger on the base table will insert the following row in the history table: (base.pk, current timestamp, NULL, 'I').
    A delete trigger will insert (base.pk, current timestamp, base.data, 'D').
    An update trigger will insert (base.pk, current timestamp, base.data, 'U').
    In both cases, "base.data" is of course the old, no longer existing value.
    I have made a similar system. With the difference, that the most recent history record always stores the current data:

    Using Peter's example:
    An insert trigger on the base table will insert the following row in the history table: (base.pk, current timestamp, base.data, 'I').
    A delete trigger will insert (base.pk, current timestamp, base.data, 'D').
    An update trigger will insert (base.pk, current timestamp, base.data, 'U').
    In both cases, "base.data" is the (new) value of base_table.

    It simplifies the query a bit, as an extra UNION with the base tabel is unnecessary:
    Code:
    SELECT oldval, timestamp
    FROM   history_table
    WHERE  pk = :pk
      AND  timestamp >= :timestamp_requested
    ORDER BY 2 
    FETCH FIRST ROW ONLY
    Problem with reconstructing the situation at a certain point in history is that of performance, as you need to select that one record that comes right before the requested_timestamp, by fiddling with a number of MAX(TIMESTAMP)'s.

    This certainly becomes cumbersome if you have to get results from a number of joins between n master-detail tables: with what set of data - of that person at that time in history, with his/her name, address, financial situation, ... of that moment - was that decision taken?

    I think if the history table would be expanded with an extra field, giving the end_timestamp of the validity of that record would help a lot.

    Using Peter's example:
    Your "base" table has two columns, "pk" (not changing) and "data". Rows can be inserted, deleted, and updated (just field "data").
    Create a "history" table with five columns: "pk", "timestamp", "end_timestamp", "oldval", "code".

    An insert trigger on the base table will insert the following row in the history table: (base.pk, current timestamp, '9999-12-31 23:59:59', base.data, 'I').
    A delete trigger will
    1) insert (base.pk, current timestamp, current timestamp, base.data, 'D').
    2) update most recent record of history with same pk, set "end_timestamp" = current timestamp
    An update trigger will
    1) insert (base.pk, current timestamp, '9999-12-31 23:59:59', base.data, 'U').
    2) update most recent record of history with same pk, set "end_timestamp" = current timestamp
    In both cases, "base.data" is the (new) value of base_table.

    So any complex query containing a lot of JOINS, would include only an extra "AND :timestamp_requested BETWEEN timestamp AND end_timestamp" condition.

    Peter, what did you mean by "notification entries (code 'N')"? I didn't get that one.
    With kind regards . . . . . SQL Server 2000/2005/2012
    Wim

    Grabel's Law: 2 is not equal to 3 -- not even for very large values of 2.
    Pat Phelan's Law: 2 very definitely CAN equal 3 -- in at least two programming languages

  6. #6
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by Wim
    Peter, what did you mean by "notification entries (code 'N')"? I didn't get that one.
    This is a rather specific part of our stored procedure logic, i.e., used by the application that interrogates the history table.
    It's a way to "remember" when someone most recently called the application, so that his next call will continue from that point in time on, i.e., the application will report on any differences between the most recent "N" entry and "now".
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  7. #7
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by Wim
    ... Create a "history" table with five columns: "pk", "timestamp", "end_timestamp", "oldval", "code".
    Good idea, indeed. This complicates the triggers (since "old" history records have to be updated) but it gives indeed a much better performance for the SELECT queries on the history table. Thanks for the suggestion!

    (As always, it's a trade-off: if one has lots of base table changes (i.e., lots of trigger execution) and almost no reads from the history table, the original design is better, while in the "normal" (equally read + write of history records) situation, having the extra column logic will probably be more performant.)
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  8. #8
    Join Date
    May 2008
    Posts
    6
    Hey thanks guys, this is really helping me. I am going to go with the suggested table structure, as it's exactly what I'm looking for. Just one question though.

    I update my tables daily, running through every single primary key (and some new ones) in them. Most of the time, the updates are unnecessary since the data stays the same.

    Is there a way to check if the data is changed so I don't take any action (and activate the triggers) if it is the same.

    Or would I have to check that manually inside the program, and then forward to a particular action. (I am using DB2)

    Thanks again.

  9. #9
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by petachok
    Is there a way to check if the data is changed so I don't take any action (and activate the triggers) if it is the same.

    Or would I have to check that manually inside the program, and then forward to a particular action.
    If it's specific to that particular application, just test there (and don't execute the update when no change). If this logic applies to all potential applications, the trigger can be modified (with a WHEN condition) to only run in case old & new are different:
    Code:
    CREATE TRIGGER trname
    AFTER UPDATE OF data ON base_table
    REFERENCING OLD AS o NEW AS n
    FOR EACH ROW MODE DB2SQL
    WHEN (n.data <> o.data)
    INSERT INTO history_table VALUES (....)
    (This is DB2 for z/OS syntax. FOR LUW, remove the "MODE DB2SQL".)
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

  10. #10
    Join Date
    May 2008
    Posts
    6
    Hey,

    I implemented the structure you guys described (with start and end timestamp because that seems to make queries simpler).

    A small problem I have is with the triggers. Right now I am nesting a select within an Update, which sort of takes a while. Can you guys suggest a better way?

    Here is how my delete trigger looks:
    UPDATE hist_table SET end_timestamp=CURRENT TIMESTAMP WHERE pk=o.pk AND end_timestamp=(SELECT MAX(end_timestamp) FROM hist_table WHERE pk=o.pk);
    INSERT INTO hist_table VALUES (o.pk, o.data, CURRENT TIMESTAMP, CURRENT TIMESTAMP, 'D');

    Thanks

  11. #11
    Join Date
    May 2008
    Posts
    6
    Also, if I understand correctly, there is no need to insert the Deletion history item with both timestamp and endtimestamp equaling CURRENT TIMESTAMP. Or is there some reason for that that?

    And would this be a correct way to delete outdated items from the history table (to prevent it from growing too far):
    - Delete all records where end_timestamp < time you want to delete items from
    - Update all the records where timestamp < time with timestamp = the time past which you don't want to store history

    Thanks again guys.

  12. #12
    Join Date
    Sep 2004
    Location
    Belgium
    Posts
    1,126
    Quote Originally Posted by petachok
    Here is how my delete trigger looks:
    UPDATE hist_table SET end_timestamp=CURRENT TIMESTAMP
    WHERE pk=o.pk AND end_timestamp=(SELECT MAX(end_timestamp) FROM hist_table WHERE pk=o.pk);
    INSERT INTO hist_table
    VALUES (o.pk, o.data, CURRENT TIMESTAMP, CURRENT TIMESTAMP, 'D');
    I would rather do the following:
    Code:
    UPDATE hist_table SET end_timestamp=CURRENT TIMESTAMP
    WHERE pk=o.pk AND end_timestamp='9999-12-31';
    INSERT INTO hist_table
    VALUES (o.pk, o.data, CURRENT TIMESTAMP, '9999-12-31', 'D');
    --_Peter Vanroose,
    __IBM Certified Database Administrator, DB2 9 for z/OS
    __IBM Certified Application Developer
    __ABIS Training and Consulting
    __http://www.abis.be/

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •