Results 1 to 13 of 13
  1. #1
    Join Date
    Jul 2009
    Posts
    32

    Database Design Question. - pic to help explain

    I working on this project code.bildr.org ( safari and firefox only) that works a little like SVN. It's a wiki for code (very different than github).

    You have a project with files and folders, and when you change a file, it saves it. Right now you can only see the history of individual files because of how it works.

    I want to be able to save and remember the state of the entire project, so when a folder is added or a file changed or deleted, I can go through the history, and flip through so I can go back to an older state when files/folders were added, changed or deleted. It's a lot like SVN in a way.

    I understand how it would work, but I cant figure out the database. Here is a pic I made to try to explain what I need.

    Click image for larger version. 

Name:	DB.jpg 
Views:	41 
Size:	100.5 KB 
ID:	9810

    I figure save states need to keep track of the files in the project, but if 2 files change and 5 do not, I want to only write new file entries for the ones that changed, and just note that this save state uses old files.

    It's weird to me because it is like saying save 1 is associated with file 1,2,3,4 but save state 2 is associated with 1,2,5,6,7. The overlap is what is throwing me through a loop.

    I thought about making a table that tracks this, but it seems incredibly wasteful as I would have tons of row entries for every save.

    Any help would be most appreciated.

    -Adam
    Last edited by ameyer; 07-10-09 at 08:02.

  2. #2
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I think you could do this with the following tables and fields :

    Author: id, name, email, pwd, etc
    Project: id, author_id, name, descr etc
    Files:id, project_id, name, file_type, file_date, status_type, author_id, txt, descr, change_comment

    If a user wants to download the live code for a project then the system would look for all the code associated with the project and just take the records with the latest date that have a live status.

    A few small points:
    • status_type would hold something like dev, beta, live. This will need a lookup table.
    • file_type would hold something like php code
    • descr would hold a description of the code or project
    • change_comment would hold a list of changes made
    You might want to :
    • implement version numbers with dates and lists of changes
    • be able to store lists of bugs against a project or file
    • store feedback by users against a project or file
    • have a picture for a project
    • store the whole file each time or you could just store the changes
    • compress files before storing in the database if required

    Mike

  3. #3
    Join Date
    Jul 2009
    Posts
    32
    Thanks Mike.

    What would happen if a user wanted to grab an old version? How would the system know what files/folders are associated with that version? Or revert to an older version?

    As for features. This system is actually only the back end of a single feature of the site. The rest is all driven on mediawiki (wikipedia's software). So commenting and users etc will all be done through that.

  4. #4
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Personally I'd just use a date ie give me the live project code as of "1 Mar 2009". You'd then just ignore files older than this date. I mentioned versioning but that's a whole new bunch of additions and wouldn't really add to the overall functionality. You could always just list the versions along with the release dates and let the user enter that date. If you do want versions then you'd need a new table:
    Version : project_id, version_name, date
    but you'd need to decide on what to do if a project leader doesn't enter new (or any) versions and whether files should be linked in the database to a version or whether it will be done by date alone.

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Right. If you associate each individual file change with a date, find the latest change for any file as of any given date is not difficult. At the point, the concepts of "save states" or "snapshots" is irrelevant, because your point-in-time recoverability supersedes them.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Jul 2009
    Posts
    32
    I guess im missing something.

    If I delete a file from a new version. Or I add one half way through, how do I know what files were part of that save state?

  7. #7
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Let's say your project consists of 3 files
    Code:
    1 Jan 2009 initial release of live code
          file_1
          file_2
          file_3
    
    on 1 Feb 2009 do a live release of just one file
          file_2
    
    on 1 Mar you release live code for other 2 programs
          file_1
          file_3
    Now if you want to download the whole project you'll pull the latest vesion of each file (1st Mar for files 1 and 3, 1st Feb for file 2).

    If you want to download the project as it stood in the middle of Feb then you'd pull the latest files at this point (1st Jan for files 1 and 3, 1st Feb for file 2).

    If you want to get rid of the file that was released in Feb then I guess you could just delete it. Personally I think it should stay there otherwise you are messing with history and bad thinks happen (ask Doctor Who).

    The code to pull the latest files for a given date might be something like this (this is just a quick guess) :
    Code:
    select file
    from   Files f.name
    where  f.project_id = $project_id
           and f.file_date = ( select max(d.file_date)
                               from   Files d
                               where  d.project_id = f.project_id
                                      and d.id = f.id
                                      and d.file_date < $supplied_date )

  8. #8
    Join Date
    Jul 2009
    Posts
    32
    I really appreciate this. Thank you.

    Quote Originally Posted by mike_bike_kite
    If you want to download the project as it stood in the middle of Feb then you'd pull the latest files at this point (1st Jan for files 1 and 3, 1st Feb for file 2).
    So that's the issue I see. What if in Jan it is files 1,2,3 but Feb is file 2,3,4
    Then I need to know not to include file 1 when looking at February's state.

    But what keeps track of what files to include, and what not to include.

  9. #9
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    If you want the ability to get rid of files then why not just include an empty file_txt field in files for this file in Feb. The SQL can be altered to just select non empty files in the outer section of the SQL and that should work for you.
    Code:
    select file
    from   Files f.name
    where  f.project_id = $project_id
           and f.txt > ""
           and f.file_date = ( select max(d.file_date)
                               from   Files d
                               where  d.project_id = f.project_id
                                      and d.id = f.id
                                      and d.file_date < $supplied_date )
    It's well worth it to just create some simple tables, insert the example data and try the SQL. It should produce the correct results for a given date - no promises though as I've only looked at the problem for a few minutes while you have hopefully been looking at it for longer.

  10. #10
    Join Date
    Jul 2009
    Posts
    32
    Ok.. I see what you are saying.

    So later on when there are thousands of non used files will it be an issue?

  11. #11
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by ameyer
    So later on when there are thousands of non used files will it be an issue?
    So far you haven't given any idea of the number of file changes you'll get per day, the size of these files or how much history you want to keep. If you want to store the contents of the files in the database and you want to keep a history for all files then so be it. You can reduce the storage requirements by compressing files before saving them or just saving the differences from one version to the next. Storing thousands of records isn't much of an issue to databases, some folks are storing billions of records. Correct indexes allow you to jump to the correct record instantly.

  12. #12
    Join Date
    Jul 2009
    Posts
    32
    Id imagine that some projects will have 10s of edits a day. The files them selves will most often be only in the hundreds of lines of code.

    Im trying to imagine how I would/could revert to an earlier state without making copies of the code with this. I was thinking I could do it this way... but I assume it is not a good idea, but im not sure why not.


    Code:
    id		save_state	save_file
    1		1		1
    2		1		2
    3		1		3
    4		1		4
    5		2		1
    6		2		5
    7		2		6
    8		2		7
    9		2		4
    10		3		8
    11		3		5
    12		3		9
    13		3		7
    14		3		10
    15		4		11
    16		4		9
    17		4		7
    18		4		12
    
    SELECT * FROM save, files WHERE save_state=$save AND save_file = file_id

  13. #13
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I'd store the complete files each time at 1st and get your system working from there. There's no point complicating the problem until you've got something working first.

    As a next stage I'd then look at the diff and ed commands in Unix shell (assuming you're on Unix) to see how you could just save the differences between one version and the next rather than storing the whole file each time.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •