Quote:
|
Originally Posted by craigmcdonnell
Thanks andrewst. This will definitely improve query performance against the 'live' data. I'm predicting large/frequent amendments; are there any approaches that combine the hard disk space saving capabilities of CVS, (i.e. only saving differences,) whilst retaining the ability to query data using standard SQL?
-- I realise I may be asking for a lot here!
|
Maybe. First off, you need to understand the least common subsequence algorithm. <a href="http://search.cpan.org/~tyemq/Algorithm-Diff-1.1901/lib/Algorithm/Diff.pm">Here's a straightforward implementation.</a>
Well, how it works isn't so important as what it returns, which is a diff: it tells you that to get from list A to list B you need to remove certain lines and insert others.
So you need a table called BaseVersions, with VersionNumber, LineNumber and whatever data each line has.
You could represent diffs in two tables, Removals (ParentVersion and LineNumber) and Insertions (ParentVersion, LineNumber and whatever data). To make insertions easier, I'd allow line numbers like 2.1 and 2.1.1. (How you insert before the first line is a bit tricky, though.)
The query to get Version 2 would simply be BaseVersion MINUS the removals and UNION the insertions.
You make, say, 20 queries that find intermediate versions. Then every 20 versions you write out another base version. (This is the same principle as key frames in digital video.)