Results 1 to 7 of 7
  1. #1
    Join Date
    Apr 2006
    Location
    Austin, Texas
    Posts
    3

    Archiving rows -- to move or to tag?

    I've been making web sites for some time and just moved into database-driven applications. I have searched for a good answer to a problem that must be common, but I guess that either my search skills aren't that good or there is no easy solution to this problem.

    We want to archive information. Instead of deleting it, we just want to hide it from most searches, unless the user is specifically wants also to search for retired entries.

    In this case, we have a list of software programs that our company uses and some information (how to install, troubleshoot, etc.) about each software program. Well, when a piece of software -- especially software on a server -- gets retired and we know for sure that no one anywhere will ever use this again, we want to take it off the list so no one thinks we're still using it somewhere, but keep all that work of writing its installation guides, design, and troubleshooting just in case we have a question later about it.

    We store the info in a Postgres database, and make the user-view pages of it with PHP. So, we can (1) move the row to another table, called "software_archive" or something, or (2) add a column called "status" to the table and rewrite every PHP script to show only the rows where the status is not like "inactive" or whatever.

    Each of these seems to be a chore. If I just move the row to an archive table, that seems easier at first but I would have to somehow make sure any changes to the structure of the first table is mirrored in the structure of its archive table. If I tag instead of move, well, a lot of web pages (PHP scripts) use this table, so we would have to find and change each one.

    It seems to me that modern databases would let you mark a row a "inactive" or "hidden" instead of wholly deleting it, and return it only for queries that explicitly say to show all rows, not just the "online" ones.

    What is the most common way of handling this?

  2. #2
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    IMO, it's much cleaner to simply mark the field as being archived, rather than move it to a separate table, especially since you're using a database server like Postgres. This way, your front end don't have to know the data structure (where the data is located) as it is all in the same table. Yes, you need to include the status field in your queries, or include it in your view definition.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  3. #3
    Join Date
    Apr 2006
    Location
    Austin, Texas
    Posts
    3
    Quote Originally Posted by loquin
    IMO, it's much cleaner to simply mark the field as being archived, rather than move it to a separate table, especially since you're using a database server like Postgres. This way, your front end don't have to know the data structure (where the data is located) as it is all in the same table. Yes, you need to include the status field in your queries, or include it in your view definition.
    Thank you, loquin. I agree with you. I like marking it by another field instead of moving it to another table.

    Does using a view instead of the real table slow down the query?

    Another answer I have found outside this forum is to use a function, which would include the "WHERE status = 'active'," instead of rewriting the whole query each time in the code.

  4. #4
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    Quote Originally Posted by ambient
    Does using a view instead of the real table slow down the query?
    No. But, you'll want to be sure and index the status field, else, every time you query or view, the server has to scan all the records.

    Quote Originally Posted by ambient
    Another answer I have found outside this forum is to use a function, which would include the "WHERE status = 'active'," instead of rewriting the whole query each time in the code.
    Wouldn't you still have to rewrite the queries?

    One advantage of writing a stored procedure in PostgreSQL is that it could be written to accept an optional parameter, with a default value. I would probably use a boolean parameter here with a name along the lines of ReturnAllRecords, with a default value of False.

    This way, the stored procedure would be used to return all the active records, and IF you pass the optional property of true to the SP, it would return all records, active or archived.
    Last edited by loquin; 04-11-06 at 14:32.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


  5. #5
    Join Date
    Apr 2006
    Location
    Austin, Texas
    Posts
    3
    Quote Originally Posted by loquin
    Wouldn't you still have to rewrite the queries?
    Yes, sorry. For old scripts, I would have to rewrite the query regardless. For new scripts, however, a function would keep it more elegant.
    One advantage of writing a stored procedure in PostgreSQL is that it could be written to accept an optional parameter, with a default value. I would probably use a boolean parameter here with a name along the lines of ReturnAllRecords, with a default value of False.
    I didn't know about stored procedures in PostgreSQL. I will look into its documentation about it. Thanks.

  6. #6
    Join Date
    Aug 2004
    Posts
    330
    In order to keep multiple "versions" of the same row, you would need to extend the primary key. A good candidate for this would be an effective timestamp. However, if the table has referential integrity, then you have to migrate the additional key fields to child tables and to any recursive foreign keys you have.

  7. #7
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249

    No need to change front end PHP code for existing functionality

    One possible solution is by making small changes to the database that effectively make it unnecessary to change the front end PHP code.
    • Add a column to the existing table with the status field.
    • Provide an index on it for fast lookup, if necessary.
    • Rename the table to something else, for example, tab_x becomes tab_x_old
    • Create a view called tab_x that selects "active" records only. The view should not have the status column.
    As a result of these changes, the PHP code still sees a data structure called tab_x but it is now a view with structure identical to the original database table. Now, there is no need to change the PHP code since the view only has the newer info, not the archived info.

    For the new functionality considered, when the user wants to select "all" the data, use the table instead of the view to perform the queries. Since this is a new feature, you have to change things anyway.


    That is why I prefer my front-end code to communicate using views only, never direct access to tables. That way, any changes I want to make to the table structure will not affect the front-end code, since I can make my view based on the new structures. As far as the front-end code is concerned, nothing would have changed.

    HTH

    Ravi

    Edit: Corrected the spelling pf "preefer" to "prefer" in last paragraph.
    Last edited by rajiravi; 04-13-06 at 02:06.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •