If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Document Table Record Versioning

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-12-09, 17:39
joehansen joehansen is offline
Registered User
 
Join Date: Nov 2009
Posts: 5
Document Table Record Versioning

Hey All,

We have a table named "document" with following fields:
documentnum, documenttypenum, description, filename, uploadedon, createdby

We would like to add versioning capability to the document records. New functionality would involve a) creating a new document b) replacing a document with a newer version c) deleting the latest version and making the older version current d) getting the newest version of a document

What would be the best way to structure the database table(s) so that the queries for the above operations would not just be simple but also run fast?

Please advise!

Thank you,
Joe
Reply With Quote
  #2 (permalink)  
Old 11-12-09, 19:39
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
Originally Posted by joehansen
What would be the best way to structure the database table(s) so that the queries for the above operations would not just be simple but also run fast?
Couldn't you just have a timestamp in the record and then the current document is the one with the latest timestamp. If you delete this record then the next latest timestamp becomes current. This would be simple to code and pretty fast to run.

To get the current document for id 123:
Code:
select doc
from   Documents
where  id = 123
       and timestamp_field = (
            select max( timestamp_field )
            from   Documents
            where  id = 123 )
Reply With Quote
  #3 (permalink)  
Old 11-12-09, 19:50
joehansen joehansen is offline
Registered User
 
Join Date: Nov 2009
Posts: 5
Thanks for the reply, Mike!

I am torn between having one single document table with a timestamp field or having two tables, document table along with the documentarchive table.

Wouldn't you think the SQL SELECT queries would be simpler and would perform better if I had two tables?

Thank you,
Joe
Reply With Quote
  #4 (permalink)  
Old 11-13-09, 10:48
Pyrophorus Pyrophorus is offline
Registered User
 
Join Date: Aug 2009
Posts: 68
Hi…
IMO, using two tables gives no benefit. You shall have to move records back and forth from one table to the other one. Of course, any simple query on "DocumentTable" will run faster than Mike's query, because the subselect on each row. But if your query grabs only a few rows, you'll never see the difference.
In your place, I would use Mike's design. If your queries aren't fast enough then you could consider:
- using select distinct if possible.
- add a boolean column to mark last version
HTH
Laurent
Reply With Quote
  #5 (permalink)  
Old 11-13-09, 11:06
MarkATrombley MarkATrombley is offline
Registered User
 
Join Date: Jul 2009
Location: Michigan
Posts: 125
Wikipedia has a decent article on temporal database design. Usually you would add a ValidStart and ValidEnd datetime columns to reflect when the row was current. The current record has a ValidEnd datetime of some specific date in the distant future so it is easy to find.

If you are going to have a huge number of document revisions (more than 20 per document) then I would go with an archive table, otherwise keeping it in one table is best.
Reply With Quote
  #6 (permalink)  
Old 11-13-09, 11:26
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Normally I would recommend an archive table, but in your case records are not being "permanently archived", and can be change back to an active state. So, the single table may be your best option. Note though, that a single table with start and stop columns will not enable you to determine the history of when a document was in use (if it had multiple start and stop times), while an archive table would.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #7 (permalink)  
Old 11-13-09, 12:21
Brett Kaiser Brett Kaiser is offline
Window Washer
 
Join Date: Nov 2002
Location: Jersey
Posts: 10,303
OK,

I would have a Document table, With Effective and Termination Dates (null), with a version Number for the document that is incremented by 1 any time the document was modified (via a trigger)

I would also move all data images to a history table (via a trigger) when anything on the row was modified

MOO
__________________
Brett
8-)

It's a Great Day for America everybody!

dbforums Yak CorralRadio 'Rita
dbForums Member List
I'm Good Once as I ever was

The physical order of data in a database has no meaning.
Reply With Quote
  #8 (permalink)  
Old 11-13-09, 17:49
joehansen joehansen is offline
Registered User
 
Join Date: Nov 2009
Posts: 5
Thank you, Brett Kaiser, blindman, MarkATrombley and Laurent for your kind replies. You all have been consistent in saying that a single table would serve my purpose best. So here's the key document table columns that I am looking at right now:

documentkey (unique)
documentnum
activefrom
activeuntil
version (optional)
latest (default true; optional)

Thanks for all your help guys,
Joe
Reply With Quote
  #9 (permalink)  
Old 11-14-09, 05:11
mike_bike_kite mike_bike_kite is offline
vaguely human
 
Join Date: Jun 2007
Location: London
Posts: 2,519
Quote:
So here's the key document table columns that I am looking at right now:

documentkey (unique)
documentnum
activefrom
activeuntil
version (optional)
latest (default true; optional)
Few questions :
  • Surely the latest field is unnecessary as you'd just look for either look for records where the current date is between the from and until dates. If the field defaults to true does this mean you might have more than one record with the same documentkey being marked as the latest? If it's optional does this mean that no record for a given documentkey might be marked as the latest? Doesn't this make the code a bit complex? I still prefer the idea of just using a single date field.
  • Having from and until dates is mildly faster than just using a single date but you now have the hassle of updating the old record's until date every time you add a new version. To better choose what method to use we really should know how many documents you're planning on storing, how many updates you're expecting on each document and how many versions you're intending to keep.
  • The version field also seems unnecessary as it can simply be calculated by counting the number of records for that document key.
  • Why isn't documentnum the key?
Just my 2c
Reply With Quote
  #10 (permalink)  
Old 11-14-09, 12:31
Pyrophorus Pyrophorus is offline
Registered User
 
Join Date: Aug 2009
Posts: 68
Something is yet unclear to me in the desired behavior: what you want is:
-- a stack, i.e. returning to version n-2 will throw away n and n-1.
-- an history, i.e. returning to version n-2 deletes nothing but "duplicates" n-2 to n+1

Mike's design (and my contribution) apply to a stack. Mark's is more about an history. And your final design seems to me a mix of the two !

If an history is wanted, I would use two tables:
Table: DocumentSource (needed if you want to store docs in the database)
docsourceid PK
docTitle
Lastupdated date (filled by trigger)
document blob

Table: DocumentVersion
documentSourceFK (FK to documentSource)
documentnum (common to all versions of the same document)
activefrom
activeuntil

Last version is identified with activeuntil being null. DocumentSourceFK could be an external link if you don't store docs in the database.
Here, each time version change, activeuntil is updated in last version record and a new DocumentVersion record is added, pointing to a new DocumentSource or an existing one.

Hope this helps
Laurent
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On