Results 1 to 5 of 5
  1. #1
    Join Date
    Jun 2006
    Posts
    2

    Version Control of Data in SQL DB

    Hi, I'd like to know the best approach for version control of data stored in an SQL database. By version control I mean never losing any data saved to the database, (so updates would instead result in 'new records' with an incremented version number). Can anyone help?

    PS. Apologies if this my intentions are a little less than clear

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    A more common approach to achieve that would be to have regular tables without version numbers for the current data, and a history table per regular table populated by triggers after updates and deletions, with a timestamp of when the update/delete occured. For example:

    create table emp (empno int primary key, name varchar(30), dob date);
    create table emp_hist (empno int, name varchar(30), dob date, update_timestamp timestamp, primary key(empno, update_timestamp));

    This meets your requirements without burdening every query with references to a latest version of each "record", which would hit performance significantly.

  3. #3
    Join Date
    Jun 2006
    Posts
    2
    Thanks andrewst. This will definitely improve query performance against the 'live' data. I'm predicting large/frequent amendments; are there any approaches that combine the hard disk space saving capabilities of CVS, (i.e. only saving differences,) whilst retaining the ability to query data using standard SQL?

    -- I realise I may be asking for a lot here!

  4. #4
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    I have never heard of such a facility in SQL, no.

  5. #5
    Join Date
    Oct 2002
    Location
    Baghdad, Iraq
    Posts
    697
    Quote Originally Posted by craigmcdonnell
    Thanks andrewst. This will definitely improve query performance against the 'live' data. I'm predicting large/frequent amendments; are there any approaches that combine the hard disk space saving capabilities of CVS, (i.e. only saving differences,) whilst retaining the ability to query data using standard SQL?

    -- I realise I may be asking for a lot here!
    Maybe. First off, you need to understand the least common subsequence algorithm. <a href="http://search.cpan.org/~tyemq/Algorithm-Diff-1.1901/lib/Algorithm/Diff.pm">Here's a straightforward implementation.</a>

    Well, how it works isn't so important as what it returns, which is a diff: it tells you that to get from list A to list B you need to remove certain lines and insert others.

    So you need a table called BaseVersions, with VersionNumber, LineNumber and whatever data each line has.

    You could represent diffs in two tables, Removals (ParentVersion and LineNumber) and Insertions (ParentVersion, LineNumber and whatever data). To make insertions easier, I'd allow line numbers like 2.1 and 2.1.1. (How you insert before the first line is a bit tricky, though.)

    The query to get Version 2 would simply be BaseVersion MINUS the removals and UNION the insertions.

    You make, say, 20 queries that find intermediate versions. Then every 20 versions you write out another base version. (This is the same principle as key frames in digital video.)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •