Results 1 to 10 of 10
  1. #1
    Join Date
    Nov 2009
    Posts
    5

    Document Table Record Versioning

    Hey All,

    We have a table named "document" with following fields:
    documentnum, documenttypenum, description, filename, uploadedon, createdby

    We would like to add versioning capability to the document records. New functionality would involve a) creating a new document b) replacing a document with a newer version c) deleting the latest version and making the older version current d) getting the newest version of a document

    What would be the best way to structure the database table(s) so that the queries for the above operations would not just be simple but also run fast?

    Please advise!

    Thank you,
    Joe

  2. #2
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by joehansen
    What would be the best way to structure the database table(s) so that the queries for the above operations would not just be simple but also run fast?
    Couldn't you just have a timestamp in the record and then the current document is the one with the latest timestamp. If you delete this record then the next latest timestamp becomes current. This would be simple to code and pretty fast to run.

    To get the current document for id 123:
    Code:
    select doc
    from   Documents
    where  id = 123
           and timestamp_field = (
                select max( timestamp_field )
                from   Documents
                where  id = 123 )

  3. #3
    Join Date
    Nov 2009
    Posts
    5
    Thanks for the reply, Mike!

    I am torn between having one single document table with a timestamp field or having two tables, document table along with the documentarchive table.

    Wouldn't you think the SQL SELECT queries would be simpler and would perform better if I had two tables?

    Thank you,
    Joe

  4. #4
    Join Date
    Aug 2009
    Posts
    68
    Hi…
    IMO, using two tables gives no benefit. You shall have to move records back and forth from one table to the other one. Of course, any simple query on "DocumentTable" will run faster than Mike's query, because the subselect on each row. But if your query grabs only a few rows, you'll never see the difference.
    In your place, I would use Mike's design. If your queries aren't fast enough then you could consider:
    - using select distinct if possible.
    - add a boolean column to mark last version
    HTH
    Laurent

  5. #5
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    Wikipedia has a decent article on temporal database design. Usually you would add a ValidStart and ValidEnd datetime columns to reflect when the row was current. The current record has a ValidEnd datetime of some specific date in the distant future so it is easy to find.

    If you are going to have a huge number of document revisions (more than 20 per document) then I would go with an archive table, otherwise keeping it in one table is best.

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Normally I would recommend an archive table, but in your case records are not being "permanently archived", and can be change back to an active state. So, the single table may be your best option. Note though, that a single table with start and stop columns will not enable you to determine the history of when a document was in use (if it had multiple start and stop times), while an archive table would.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Nov 2002
    Location
    Jersey
    Posts
    10,322
    OK,

    I would have a Document table, With Effective and Termination Dates (null), with a version Number for the document that is incremented by 1 any time the document was modified (via a trigger)

    I would also move all data images to a history table (via a trigger) when anything on the row was modified

    MOO
    Brett
    8-)

    It's a Great Day for America everybody!

    dbforums Yak CorralRadio 'Rita
    dbForums Member List
    I'm Good Once as I ever was

    The physical order of data in a database has no meaning.

  8. #8
    Join Date
    Nov 2009
    Posts
    5
    Thank you, Brett Kaiser, blindman, MarkATrombley and Laurent for your kind replies. You all have been consistent in saying that a single table would serve my purpose best. So here's the key document table columns that I am looking at right now:

    documentkey (unique)
    documentnum
    activefrom
    activeuntil
    version (optional)
    latest (default true; optional)

    Thanks for all your help guys,
    Joe

  9. #9
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    So here's the key document table columns that I am looking at right now:

    documentkey (unique)
    documentnum
    activefrom
    activeuntil
    version (optional)
    latest (default true; optional)
    Few questions :
    • Surely the latest field is unnecessary as you'd just look for either look for records where the current date is between the from and until dates. If the field defaults to true does this mean you might have more than one record with the same documentkey being marked as the latest? If it's optional does this mean that no record for a given documentkey might be marked as the latest? Doesn't this make the code a bit complex? I still prefer the idea of just using a single date field.
    • Having from and until dates is mildly faster than just using a single date but you now have the hassle of updating the old record's until date every time you add a new version. To better choose what method to use we really should know how many documents you're planning on storing, how many updates you're expecting on each document and how many versions you're intending to keep.
    • The version field also seems unnecessary as it can simply be calculated by counting the number of records for that document key.
    • Why isn't documentnum the key?
    Just my 2c

  10. #10
    Join Date
    Aug 2009
    Posts
    68
    Something is yet unclear to me in the desired behavior: what you want is:
    -- a stack, i.e. returning to version n-2 will throw away n and n-1.
    -- an history, i.e. returning to version n-2 deletes nothing but "duplicates" n-2 to n+1

    Mike's design (and my contribution) apply to a stack. Mark's is more about an history. And your final design seems to me a mix of the two !

    If an history is wanted, I would use two tables:
    Table: DocumentSource (needed if you want to store docs in the database)
    docsourceid PK
    docTitle
    Lastupdated date (filled by trigger)
    document blob

    Table: DocumentVersion
    documentSourceFK (FK to documentSource)
    documentnum (common to all versions of the same document)
    activefrom
    activeuntil

    Last version is identified with activeuntil being null. DocumentSourceFK could be an external link if you don't store docs in the database.
    Here, each time version change, activeuntil is updated in last version record and a new DocumentVersion record is added, pointing to a new DocumentSource or an existing one.

    Hope this helps
    Laurent

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •