Results 1 to 12 of 12
  1. #1
    Join Date
    Apr 2009
    Posts
    2

    Unanswered: Insert or Update in one statement ?

    Hi all of you PostgreSQL gurus

    In order to greatly boost the speed of synchronization of our stand-alone electronic medical record application, which uses PostgreSQL as a local db and regularly synchronizes with a centralized db, I am looking for a special feature, possibly a new one :

    I need an "updateinsert" SQL statement which combines the functionalities of insert and update

    It should perform an insert if the record does not yet exist in PostgreSQL and otherwise it should do an update on all fields, or vice versa.

    (These record have an unique primary key)

    Our current solution involves reading the primary keys of all local observation records into a sorted array in memory, and then doing a binary search with the primary keys of freshly downloaded observation records in that array, to decide whether they need to be inserted or updated.

    All observation records of one medical records are inserted/updated in one transaction (typically a couple of hundred observation records in several thousand medical records).

    That proved to be faster than doing a select in PostgreSQL beforehand, to decide whether to perform an insert or update.

    But if PostgreSQL could be enhanced with a new "updateinsert" statement (implemented on a low enough level to be equally performant as an update or an insert), we would stand to gain a lot of badly needed performance.

    Any takers ?

    Bart Viaene, IDEWE, Belgium

  2. #2
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Our current solution involves reading the primary keys of all local observation records into a sorted array in memory, and then doing a binary search with the primary keys of freshly downloaded observation records in that array, to decide whether they need to be inserted or updated.
    What I usually do: just fire the UPDATE with the approriate WHERE clause. If it then signals that no rows have been updated, it's safe to do the insert.
    PG will check the existence of the row during an update anyway, so there is no need for you to do it before sending the update.

    If you expecte more rows to be inserted than updated, you can change the order and catch the exception that is thrown when inserting an existing row. You need to "wrap" the insert with a savepoint though, in order to be able to continue after the (somewhat expected) error.

    But yes, I agree: an UPDATE OR INSERT would be a very nice enhancement.

  3. #3
    Join Date
    Apr 2009
    Posts
    2
    Hi shammat

    I already suggested your proposal to our developer team before, but they were reluctant to adopt it, probably because the proposed "savepoint wrapping" did not nicely fit the current way of working, and adopted the binary search approach instead.

    But it is nice to have somebody who shares my original point of view.

    I was told that MySQL features such an "UPDATE OR INSERT" statement :

    AS Workshop MySQL insert a new row or update old one

    Any chances that this feature would come to PostgreSQL, too ?

  4. #4
    Join Date
    Nov 2003
    Posts
    2,933
    Provided Answers: 12
    Quote Originally Posted by fouture
    Any chances that this feature would come to PostgreSQL, too ?
    I don't know, I don't belong to the Postgres development team.

    Their TODO list has an item for this:

    Todo - PostgreSQL Wiki

    But as there is no official roadmap one cannot tell if or when this will be integrated.

  5. #5
    Join Date
    Jul 2010
    Posts
    2
    Quote Originally Posted by shammat View Post
    I don't know, I don't belong to the Postgres development team.

    Their TODO list has an item for this:

    Todo - PostgreSQL Wiki

    But as there is no official roadmap one cannot tell if or when this will be integrated.
    Hi. I`ve allready done the work in psql. Just visit my site. Its in polish, but I hope that the code is understandable - if not please post coments. WAPP :: PostgreSQL: rozwi?zanie problemu insert_or_update

  6. #6
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    You may want to consider what is done most often the insert or the update. Whichever one would be the most successful is the one you should do first it will speed your application up by not having to look at so many rows not found msgs, if insert is what occurs the most and not having to look at so many duplicate row msgs, if update is what mostly occurs. Otherwise sit back and wait for MERGE.
    Dave

  7. #7
    Join Date
    Aug 2009
    Location
    Olympia, WA
    Posts
    337
    I generally create ETL views that compares the source data with the target table. It makes it easier to see what is happening / needs to happen and to troubleshoot. On Postgres it'll have an action column specifying if the row needs inserted, updated or deleted.

    Code:
    -- Here is some untested sample code
    CREATE OR REPLACE VIEW etl.mrg_stuff AS
    SELECT COALESCE(s.id, t.id) AS id,
      s.title, s.price,
      CASE WHEN t.id IS NULL THEN 'insert'
      WHEN s.id IS NULL THEN 'delete'
      ELSE 'update' END AS action
    FROM stuff t
    FULL OUTER JOIN etl.src_stuff s ON t.id = s.id
    WHERE t.id IS NULL OR s.id IS NULL
      OR t.price != s.price
      OR t.title != s.title
      -- for values that can be null write a function to properly compare when 1 value is null;
    
    -- And a procedure to perform the actions
    CREATE FUNCTION merge_stuff
    RETURNS text AS
    $$
    DECLARE
      v_message  TEXT;
      v_count      INT;
    BEGIN
      INSERT INTO stuff (id, title, price)
      SELECT id, title, price
      FROM etl.mrg_stuff
      WHERE action = 'insert';
    
      GET DIAGNOSTICS v_count = ROW_COUNT;
      v_message := 'merge_stuff inserted ' || v_count || ' rows\n';
    
      UPDATE stuff t
      SET price = s.price, 
        title = s.title
      FROM etl.mrg_stuff s
      WHERE t.id = s.id
        AND s.action = 'update';
    
      GET DIAGNOSTICS v_count = ROW_COUNT;
      v_message := v_message || ' updated ' || v_count || ' rows\n';
    
      DELETE FROM stuff WHERE id IN (
        SELECT id FROM etl.mrg_stuff WHERE action = 'delete'
      );
    
      GET DIAGNOSTICS v_count = ROW_COUNT;
      RETURN v_message || ' deleted ' || v_count || ' rows';
    END;
    $$ LANGUAGE 'plgpsql';
    Then when Postgres adds MERGE, hopefully in 9.1, you'd change to something like so:
    Code:
    MERGE INTO stuff t
    USING etl.mrg_stuff s ON (t.id = s.id)
    WHEN MATCHED THEN UPDATE 
    SET price = s.price, 
      title = s.title
    DELETE WHERE s.action = 'delete' 
    WHEN NOT MATCHED THEN INSERT (id, title, price)
    VALUES (s.id, s.title, s.price);

  8. #8
    Join Date
    Dec 2007
    Location
    Richmond, VA
    Posts
    1,328
    Provided Answers: 5
    that is the most costly solution of all. I think Troy Coleman ran some tests and published data back when merge became available for DB2. I think his results were just mentioned recently on the DB2 forum here as well. Try googling him or db2utor.
    Dave

  9. #9
    Join Date
    Aug 2009
    Location
    Olympia, WA
    Posts
    337
    I don't work with DB2 but on Oracle merge can be parallelized and performs quite well. We'll have to wait and see how it will perform on Postgres.

  10. #10
    Join Date
    Jul 2010
    Posts
    2
    huh artacus - it couldnt be more costful. And what worse its hard to use when you have data distingushed over few servers and you want to insert_or_update specific table in one database using data from another (eg. using php to cooperate it).

    Just check out my code and see what it does.

    btw. I`d like to see some coments to my code. It works with no problems but Im a bit scarred if there any security hole.
    Please review it

  11. #11
    Join Date
    Aug 2010
    Posts
    2

    Nice work

    Hi wit3k,

    First nice work and your solution is working fine.

    When call it with Select insert_or_update(..) all is OK but I have a question about how to call the function into a spring batchUpdate environment.
    We are using the batchUpdate because we are making a lot of insert or update.

    The batchUpdate(String [] sql) don't allow ResultSet return on query call so did you know how to call it in batchUpdate environment?

    Thanks for your response
    Ludovic

    P.S. I have write a comment on your blog but it seems not be displayed/send so I write it her

  12. #12
    Join Date
    Aug 2010
    Posts
    2
    Quote Originally Posted by Ludovic View Post
    Hi wit3k,

    First nice work and your solution is working fine.

    When call it with Select insert_or_update(..) all is OK but I have a question about how to call the function into a spring batchUpdate environment.
    We are using the batchUpdate because we are making a lot of insert or update.

    The batchUpdate(String [] sql) don't allow ResultSet return on query call so did you know how to call it in batchUpdate environment?

    Thanks for your response
    Ludovic

    P.S. I have write a comment on your blog but it seems not be displayed/send so I write it her
    Ok I have found how to used in a batchUpdate().
    Code:
    update tmpTable set id=1 where 1=(
      select insert_or_update(...));

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •