Results 1 to 15 of 15
  1. #1
    Join Date
    Mar 2011
    Posts
    6

    Unanswered: PostgreSQL sourcecode

    Hi there,

    My name is Aaron and i am new to this forum.
    I am after a couple of things but would like to take it step by step.
    Firstly i have never used PostgreSQL which complicates matters, i have
    been assigned a project which involves me investigating the deletion process within PostgreSQL.

    I would like to ask how do i work around this, Basically from my research i have found out that PostgreSQL does not securely delete data from its tables, rows, columns or database. Instead it is deleted from the user interface and kept hidden from the user till it is overwritten by bigger bytes of data.

    What i would like to achieve or do is to delete data and locate it using the source code or any other means as when data is deleted it still remains in parts of the DBMS.

    Please if anyone knows or understands what am talking please can you advise on how i can achieve this. I have not installed any version of PostgreSQL so does not really matter which version i would use till i get a way to carry out this investigation process.

    Cheers
    Aaron

  2. #2
    Join Date
    May 2008
    Posts
    277
    Quote Originally Posted by aaronenabs View Post
    Basically from my research i have found out that PostgreSQL does not securely delete data from its tables, rows, columns or database. Instead it is deleted from the user interface and kept hidden from the user till it is overwritten by bigger bytes of data.
    And where, exactly, did you discover this? If true, it would be sort of contrary to the whole point of having a database in the first place.

    What i would like to achieve or do is to delete data and locate it using the source code or any other means as when data is deleted it still remains in parts of the DBMS.
    The source code is, thankfully, freely available from their website.

  3. #3
    Join Date
    Nov 2006
    Posts
    82
    Originally Posted by aaronenabs View Post
    Basically from my research i have found out that PostgreSQL does not securely delete data from its tables, rows, columns or database. Instead it is deleted from the user interface and kept hidden from the user till it is overwritten by bigger bytes of data.
    Well it is not like that. To acheive ACID rules postgres use MVCC for keeping different versions of the same row (uncomitted a row changes are visible only in the transaction which made a change).
    When you delete a row (and commit that delete) in fact it is not aumomaticaly remove from a disk, it is only tagged as deleted. But then postgres use vacuum tool to remove such tagged rows.
    In postgres sources in a file include/utils/tqual.h there is a macro HeapTupleSatisfiesVisibility which is responsible for not showing, already tagged as deleted rows. If you modify that macro to return always true then you will see all not vacuumed data also these already tagged as deleted. But I don'y why would want to do that.

  4. #4
    Join Date
    May 2008
    Posts
    277
    It occurs to me that the sarcasm intended in my original post may not have been obvious.

    To expand on rski's point, this isn't just "hiding data from the user interface", but inherent to how PostgreSQL manages table data, maintains ACID compliance, and allows concurrent access to data. It's the very opposite of "not secure", whatever you may mean by that.

    More information's available here.

  5. #5
    Join Date
    Nov 2006
    Posts
    82
    You are right I did not notice your sarcasm.

  6. #6
    Join Date
    May 2008
    Posts
    277
    That's what I get for trying to be too clever on the interwebs.

  7. #7
    Join Date
    Mar 2011
    Posts
    6
    Rski

    thanks for that so from my understanding of what you said,

    the row is only tagged as deleted and then is actually deleted using the vaccum command? and if the vaccum command has not been run this (deleted)tagged data can be accessed.

    Like i said this is to do with a research carried out by patrick stahlberg on threats to privacy in database systems. He states that databases do not sucurely delete data.

    thanks

  8. #8
    Join Date
    Nov 2006
    Posts
    82
    But I think some other databases work that way because of preformance issues. If every delete statement should trigger hard disk operations (if you want 'securely' delete data from database you have to delete some data from hard disk files) then database performance would be very poor.

  9. #9
    Join Date
    Mar 2011
    Posts
    6
    Well from the research paper it states
    that PostgreSQL was checked on these levels,

    delete physically overwrites "No"
    delete creates free space "No"

    so i am trying to repeat this experiment but would need to get into the source code and gain an understanding into how everything works, or most things.

    Thanks

  10. #10
    Join Date
    May 2008
    Posts
    277
    Quote Originally Posted by aaronenabs View Post
    He states that databases do not sucurely delete data.
    It's no less secure than deleting anything else off a hard disk: operating systems generally delete files by simply marking the space they occupy as being "free", and for all intensive purposes, they cease to exist. However, the data itself remains on the disk until physically overwritten. The same principle applies here. Sure, there are ways you can retrieve unvacuumed rows in PostgreSQL (just as you can recover "deleted" data off of a disk), but it's not a simple matter of issuing a SELECT command. And regardless, that data would continue to be present on the physical drive even after it's been "vacuumed" by PostgreSQL (unless it's been physically overwritten).

    delete physically overwrites "No"
    delete creates free space "No"
    If you read the link I posted, you'll see that these are conscious design decisions that are explained in the documentation.

    If you're looking for disk security, then encrypt the drive and don't let malicious people have access to it.

  11. #11
    Join Date
    Mar 2011
    Posts
    6
    thanks futurity

    Yes its kind of the same as a harddisk, Well the document i read was just trying to start that there might be privacy issues within databases and pointed out how databases tag rows as deleted but do not actually delete the figures.

    Sure, there are ways you can retrieve unvacuumed rows in PostgreSQL (just as you can recover "deleted" data off of a disk), but it's not a simple matter of issuing a SELECT command.
    This there anyway you can advise me on how to achieve this or retrieve unvacummed rows, or documents thats would help me. You have really assisted me and am grateful.
    Thanks

  12. #12
    Join Date
    Mar 2011
    Posts
    6
    Hi

    I have been looking through the expriement papers and it says thatPostgreSql keeps 100% of its expired records in the DB-slack (database slack) and its trend line is superimposed on that of the expired record.

    I guess what am trying to do is to input a couple of records delete this records and try to get into the db-slack before carrying out a vacuum to see if i can retrieve the deleted data.

    Please can anyone advise me how to get into the db-slack or documents advising on how to locate it.

    thanks

  13. #13
    Join Date
    May 2008
    Posts
    277
    Do a Google search for accessing/retrieving/restoring deleted rows. The PostgreSQL mailing lists -- where many of the developers hang out -- will probably be your most useful resource.

  14. #14
    Join Date
    Aug 2009
    Location
    Olympia, WA
    Posts
    337
    Some pg contrib modules may come in handy here.
    Specifically pageinspect which allows you to load a raw page of data. From which you can access the raw data for unvacuumed deleted rows.

    And pgstattuple which will give you information about how many pages in the table and percentage of dead rows.

    But whats important here is that deleted data is certainly no more vulnerable than a live data. While it may persist for a while, depending on your db settings, a dead row is every bit a secure as a live one... and much harder to get to.

  15. #15
    Join Date
    Mar 2011
    Posts
    6
    Hi guys,

    i have done alittle bit of reserach into the matter and i found out A "easy" way to do which i want to run by urselfs.

    "(1) shutdown your database and backup your data;
    (2) change HeapTupleSatisfiesVisibility(),
    just let it return "true", which means, it will treat everything as visible,
    including deleted rows; compile the kernel;
    (3) restart your database find out the data you want - you may select them into another table;
    (4) revert the changes, and restart your database"

    what do you suggest with this approach. My problem is i am not farmiliar with postgresql so i wont even know where to look within the source code to try and retrieve these dead rows.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •