Results 1 to 14 of 14
  1. #1
    Join Date
    Feb 2010
    Posts
    11

    Unanswered: Nonrepeatable Read and Phantom Read possible?

    Hello

    I'm using postgreSQL 9.3. and I have the following two types of stored procedures (only the statements given):

    Code:
    INSERT INTO table1 (col1, col2, col3, col4, col5) 
    SELECT $1, $2, $3, $4, $5
    WHERE EXISTS (SELECT id FROM table2 q WHERE q.id = $1) RETURNING table1.id;
    Code:
    DELETE FROM table1 m WHERE m.id = (
        SELECT m.id FROM table1 m 
        WHERE   m.col1 = $1 
        AND     coalesce($2, m.col2) = m.col2
        AND     coalesce(m.col3, $3) = $3 
        ORDER BY m.date ASC 
        LIMIT 1
        FOR UPDATE
    ) RETURNING m.id, m.col1, m.col2, m.col3, m.col4, m.col5, m.col6;
    The $x represents input to the stored procedures. I ommited the function declaration.

    Is Nonrepeatable Read and Phantom Read possible with this two stored procedures?

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Based on the PostgreSQL 9.3 documentation, you haven't provided enough information to answer this question. In that case, I have to answer "It depends on the factors sited in the web page, but those side effects can probably occur".

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    Feb 2010
    Posts
    11
    What information do you need?

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Please follow the link that I provided and read the PostgreSQL documentation page. It describes the problems and conditions needed to cause them at a high level. If parts of the page are unclear, please quote the troublesome parts here and ask questions... I can't know enough about your systems to provide complete insight, but I'll do my best to answer specific questions for you!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Feb 2010
    Posts
    11
    Ok here we go.

    Lets say we have two concurrent transaction which want to delete the same row and we have Read Committed isolation level. In Read Committed isolation level each transaction sees only committed data.

    Lets say transaction A is first and starts, then transaction B comes in, then transaction A deletes the row and commits. Now, transaction B cannot commit because the row is deleted. What does transaction B do? Will it rollback and throw an Exception?

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    While other versions may be problematic, in the case of PostgreSQL 9.x there is no problem.

    If A deletes the target row before B notices that the row exists, there is no problem because B will never see the row at all.

    If A deletes the target row after B notices that the row exists, a DELETE operation that deletes no rows is not considered an error (see http://www.postgresql.org/docs/9.0/s...ql-delete.html for confirmation), so the transaction will continue without a problem.

    Either way, in PostgreSQL 9.3 both the A and B transactions proceed happily!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  7. #7
    Join Date
    Feb 2010
    Posts
    11
    Thank you very much for the explanation.

    So both transactions A and B works well but lets assume that the transactions are in a stored procedure and this stored procedures returns the row (and deletes it). Will the return value of the stored procedure which deletes a row which does no more exists just be null?

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    If the stored procedure extracts, temporarily stores, and then returns the values from the row those values won't become NULL just because the row was deleted. The values are probably invalid to the application after the row is deleted in the sense that those values no longer exist in the database, but becoming invalid won't automajikally make the values become NULL.

    A Stored Procedure is basically application code that is stored and executed within the database engine. Nearly every database engine that supports Stored Procedures is less powerful and expressive than a language like C#, PHP, or SmallTalk but the Stored Procedure can exploit its vastly greater/faster access to the database engine and the data to provide a very powerful tool to the developer willing to learn to use Stored Procedures effectively. There's no magic, and Stored Procedures can return garbage just like any other code... Nothing prevents a Stored Procedure from returning data that doesn't make sense (or maybe never did make sense) because like everything else in an application it is just code!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  9. #9
    Join Date
    Feb 2010
    Posts
    11
    So in this case it is better to use Repeatable Read so that the stored procedure will not return garbage but the transaction will fail and rollback?

  10. #10
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I'm pretty sure that I understand what you are asking, but I'm not certain.

    If I understand what you're asking, there is no isolation setting that will prevent the stored procedure from returning the invalid data. If the A and B Stored Procedures run in separate contexts they can't constrain each other unless they take explicit locks on rows (a very bad idea), and that constraint will only last while the procedures are running... Once the Stored Procedures return to their respective clients, all bets are off.

    I would resolve this conflict by combining the A and B stored procedures into a single procedure that does the work of both A and B. That doesn't prevent some other code from interfering in an obscure way, but it will drastically limit the exposure and probably completely resolve your problem.

    The biggest problem that I'm having with trying to help you is that I don't have insight into the whole process that you're doing. You really need someone who can function as an Application or System Architect. With an understanding of the whole system, these questions would be easy to answer. With only the portion that you have presented here I can't offer you solid, actionable advice.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  11. #11
    Join Date
    Feb 2010
    Posts
    11
    Thanks, that was what I was meaning.

    You say that explicitly locking rows is a bad idea. Why? As you can see in my first post I'm locking the row with "For update". Should I remove this?

    DELETE FROM table1 m WHERE m.id = (
    SELECT m.id FROM table1 m
    WHERE m.col1 = $1
    AND coalesce($2, m.col2) = m.col2
    AND coalesce(m.col3, $3) = $3
    ORDER BY m.date ASC
    LIMIT 1
    FOR UPDATE
    ) RETURNING m.id, m.col1, m.col2, m.col3, m.col4, m.col5, m.col6;

    And second question, if I'm using Repeatable Read instead of Read Committed, will the performance (throughput) be worse assuming that most operations are on different rows?
    Last edited by Helveticus; 10-20-14 at 12:49.

  12. #12
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    When you lock a row for update and then make the change and free the lock, that's not a problem. When you lock a row and return control to some other code with the row locked, there is ample opportunity for a "lunch time lock" where a lock is taken and not released for minutes or hours... This is good for data integrity, but really hard on people's patience!

    The level of transaction isolation (Read Committed versus Repeatable Read) is rarely the determining factor in terms of how long code runs. I would not worry about this until you have evidence to support that the Isolation Level is causing problems.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  13. #13
    Join Date
    Feb 2010
    Posts
    11
    So then I can just use Repeatable Read to be on the safe side if it does not affect performance that much.

  14. #14
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I'm not a trusting soul, I like to prove the things that I depend upon on a routine basis.

    Run your process at least three times using both Read Repeatable and Read Committed. For each isolation level, discard the fastest and slowest timing (to reduce the chances of an anomaly throwing off your average), then average the remaining values.

    Compare the two averages, and then you can make an informed decision driven by your own measurements. If the difference doesn't matter to you, then use whichever Isolation Level makes you happy. If the difference matters to you, then you can go ahead with confidence and know exactly why you chose one over the other!

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •