Results 1 to 5 of 5
  1. #1
    Join Date
    Feb 2014
    Posts
    4

    Unanswered: conceptual question around updates and committs

    I general question about how postgreSQL handles updates and commits in relation to performance.

    We have a jar file that's job is to process updates to patient records sent from other system. We receive messages from EMR systems that contain patient demographic information. These messages are triggered by all sort of events in their system. We have a 40% duplicate message rate and have a process that de-dups them prior to processing. Still though many of these messages contain data that is identical to the data stored in our DB but our processor runs an update and commit anyway. I want to avoid running these unnecessary updates and commits to save server resources and row locking. We currently receive about 200 of these messages a minute and can result in updates to single rows in 4 tables.

    Here is a very basic single table example of my question to try to understand the answer specifically around the update and commit functionality.

    Currently doing this, once again very basic for conceptual purposes...

    select 5000 records
    loop
    begin
    update table set column = 'data' where id = 'ID'
    commit;
    next


    I want to do something like this...

    select 5000 records
    loop
    begin
    update table set column = 'data' where id = 'ID' and column != 'data';
    commit;
    next

    or this...

    select 5000 records
    loop
    select count(id) from table where id = 'ID' and column != 'data'
    if count(id) =1 then
    begin
    update table set column = 'data' where id = 'ID';
    commit;
    else
    skip
    end
    next


    I hope this makes sense, and I look forward to the replies.

    Thanks,

    David

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    I suppose that the ultimate answer to this question depends on the version of PostgreSQL running on your server, but for more than 99.9% of the cases that I've seen:

    An UPDATE statement of the form:
    Code:
    UPDATE myTable
       SET myColumn = 'This' 
       WHERE  1 = id
          AND 'This' != myColumn;
    will produce the correct response... No change will occur at the row or the database file.

    In 100% of the PostgreSQL servers that I've seen, a COMMIT statement with no change to the rows is effectively a NO-OP... It does nothing, in a really fancy way!

    To sum this up, you might find a bizarre outlier somewhere, but odds are good that the plan you are using will produce extremely efficient and effective code. I'm not as experienced with PostgreSQL as I am with some other SQL dialects, but I can't remember seeing a version/instance of the server that would handle your proposed code inefficiently.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    Feb 2014
    Posts
    4
    Thanks for the reply. I did some real data simulations comparing....

    UPDATE myTable
    SET myColumn = 'This'
    WHERE 1 = id

    Versus

    UPDATE myTable
    SET myColumn = 'This'
    WHERE 1 = id
    AND 'This' != myColumn;

    I found that on average, the time to execute the first example where the data was updated was 183ms versus 53ms using the extended where clause to not update the data. I ran each statement 10 times 3 seconds apart and recorded the time to create the average. The second statement was a fairly consistent 50ms where the other varied from 60's to 200ish.

    Thanks,

    David

  4. #4
    Join Date
    Feb 2014
    Posts
    4
    Thanks for your reply. I wrote a reply to this two weeks ago but I dont see it posted here, strange. I think I remember seeing a message that a moderator would need to approve my reply.

    Basically I wanted to post a useful response so I posed some data on a time study I did.

    I ran this statement 10 times and got an average response time of 183ms in a live data environment.

    UPDATE myTable
    SET myColumn = 'This'
    WHERE 1 = id;

    Then I ran it this way for 10 times and got an average of 53ms

    UPDATE myTable
    SET myColumn = 'This'
    WHERE 1 = id
    AND 'This' != myColumn;

    So it was on average 3 times faster the second way. I hope that is helpful to someone out there.

    Thanks,

    David

  5. #5
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    With postgresql, you shouldn't get row locking; it uses multiple versions of the row, and reconciles the different versions, rather than locking the row, editing it, then releasing it. (Called MVCC, for Multi-Version Consistency Control)

    Are both id and myColumn indexed?

    In any event, the second approach should take less time, as it checks the result before attempting to insert potentially conflicting data. And reading data almost always takes less time than writing data. (When writing/updating data, essentially:
    1. a new version of the record is created
    2. The new data is written to the WAL (Write Ahead Log) file
    3. THEN the data is updated in the record copy
    4. the two versions of the record are reconciled.
    5. Finally, assuming the records could be reconciled,
    6. the transaction is completed.
    If the data is already there, and you don't check first, (option 1) everything takes place as if the record is OK to update, right up to step e. Then, the transaction is backed out.
    Last edited by loquin; 02-19-14 at 16:17.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •