Page 1 of 2 12 LastLast
Results 1 to 15 of 21
  1. #1
    Join Date
    Mar 2012
    Posts
    120

    Unanswered: locking during inserts

    Hi all,
    I'm trying to optimize a bulk insertion of rows on DB2 10.1 ESE from a CLI application.
    I cannot use LOAD and I'm already performing one commit every 1000 inserts to optimize.
    I'm now worried about locking, so I'd like to avoid DB2 to request locks at all, since my application is the only one accessing the DB.
    Is it possible?

    Thanks

  2. #2
    Join Date
    Apr 2012
    Posts
    1,034
    Provided Answers: 18
    If your application is the only application accesssing the database, then can you explain the concern about locking?

  3. #3
    Join Date
    Oct 2009
    Location
    221B Baker St.
    Posts
    486
    I cannot use LOAD
    It may help if you explain why LOAD cannot be used.
    Last edited by papadi; 07-06-12 at 17:24.

  4. #4
    Join Date
    Mar 2012
    Posts
    120
    @db2mor
    Because I don't want DB2 to spend time on locking

  5. #5
    Join Date
    Apr 2012
    Posts
    1,034
    Provided Answers: 18
    That time spent on locking is negligible compared to your trickle-feed inserts!
    If performance is your problem, your design of logged-inserts with commit each 1000 rows will guarantee poor performance. If you cannot use load (for reasons you did not explain), then what's preventing you from either higher commit-counts, or (only if desperate) 'activate not logged initially' per transaction?

  6. #6
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by Robert1973 View Post
    @db2mor
    Because I don't want DB2 to spend time on locking
    DB2 locklists are stored in memory (unlike some other databases), so it may not be as much of an impact as you think. But if you want to decrease locking frequency, you should keep the LOCKLIST and MAXLOCKS small (not automatic) so that lock escalation to table level will occur more quickly (and no more row locking), or you could do an explicit table lock in your SQL before the inserts.

    In most cases there are many other factors that have a much larger impact on insert performance rather than worrying about row level locking.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  7. #7
    Join Date
    Mar 2012
    Posts
    120
    Thanks Marcus_A,
    I was wondering if setting the isolation level to UR for the insert statements would prevent DB2 for acquiring any lock, or does it apply only to select statements?

    I'll run a benchmark test after acquiring an explicit table lock and see if there are improvements (from what you all said I don't expect too much, but maybe it's worth a try).

    Marcus, you said there are many other factors that may impact insert performance.
    I already tried removing indexes (and after the inserts recreating them and run reorg and runstats), using transactions to commit every 1K statements, using high cache values for sequences, increasing logbufsiz and buffer pool size, but I still have to improve, so my question on locking....

    Then, I cannot use array inserts because my statement is not an inline sql statement (insert into....) but a call to a stored procedure.

    More in details, this stored procedure tries first un UPDATE on the table, then if no rows have been updated it does an INSERT into...
    Maybe it's for this reason that removing the indexes was not a good idea, because indexes are useful for the update (even if no update is done in this case) and useless for the insert, so the overall effect is no performance improvement.

    Could you please suggest any other parameter or factor that I can tune to improve performance in this case?

    By the way, I cannot use LOAD because I'm reading data from another database and I cannot do anything on that side.

    Thanks a lot

  8. #8
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Isolation level only impacts select statements.

    I am not sure what restrictions you have in putting a solution together, but there are some things you might look at:

    • Not sure what you mean by "another database". Are you using DB2 Federation? Why can't you connect to database A and do EXPORT to file and then connect to database B to do LOAD or IMPORT?
    • If you can do an IMPORT there is the INSERT_UPDATE option, which will UPDATE the row if it exists and or INSERT it if it does not exist. The EXPORT and IMPORT can actually be initiated via a stored procedure if necessary.
    • LOAD does not have MERGE option, but you can have rows rejected because of duplicate PK placed in a discard table, and then have another process use the discard table to do updates (you could use IMPORT using the INSERT_UPDATE option).
    • If you must process one row at a time without LOAD or IMPORT, then look at the MERGE SQL statement.
    • Removing all the non-PK indexes beforehand, and recreating them after the process might help. As you mentioned, you will need to keep the PK index to do MERGE (or individual UPDATE).
    • I would consider making the page size of the target table 16K or 32K, but I would not expect a huge difference in performance.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  9. #9
    Join Date
    Apr 2012
    Posts
    1,034
    Provided Answers: 18
    You mentioned in the past that this is a one-time job, is this still the case?

    If you think that an Upsert (insert/update) requirement is justification for avoiding a load then you may be mistaken.


    If the source-RDBMS has the ability to extract to either a delimited or fixed-width format text file then you can get DB2 to load that file(s) into a staging table in DB2.

    It all depends on the percentage of rows that are new, versus those that will be updated. Do you know the relative percentage of inserts versus updates?

    If a high percentage of your rows are new rows, the case for LOAD is really compelling from the performance viewpoint.

    Once the data is in a staging table, if the % of new rows is High, then you can identify all NEW rows (and perform load-from-cursor into the final table). That leaves the rows to be UPDATED, which can then be done using large units-of-work. You can add dedicated indexes just to support this action.

    If a low percentage are new rows (with most being Updates) then you can design a staging table (with addition of indexes if needed) that will allow a bulked MERGE with large units of work.

    It seems that you made a CLI based solution first, and now you're trying to force it to be something that it cannot easily be. For bulk jobs, choose the right bulking approach and the payoff is excellent.

  10. #10
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by db2mor View Post
    Once the data is in a staging table, if the % of new rows is High, then you can identify all NEW rows (and perform load-from-cursor into the final table). That leaves the rows to be UPDATED, which can then be done using large units-of-work. You can add dedicated indexes just to support this action.
    I like my idea described above better (do LOAD INSERT and use discard table for duplicate PK's, and then use discard table to do updates).
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  11. #11
    Join Date
    Apr 2012
    Posts
    1,034
    Provided Answers: 18
    Yes, I see what you mean Marcus_A (although LOAD slows down when the % of exceptions is high, if memory serves).

    But the main point is that the original-poster has a row-by-row design (fetch from source, transform, then Upsert target committing each 1000 rows), and original-poster seeks to improve performance. That design is non-performant for bulk actions, particularly for one-time bulk jobs.

    Shame that the Original-poster gave no hard facts about volumes, average row-length, nature of source RDBMS , percentage of inserts versus updates etc. Further suggestions are possible if he provided real facts...

  12. #12
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    Robert, I would suggest you just write your application in the direct way and use it, i.e. let DB2 do the regular locking and logging. Only if you really really have a performance problem with that, start to optimize it.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  13. #13
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by db2mor View Post
    Yes, I see what you mean Marcus_A (although LOAD slows down when the % of exceptions is high, if memory serves).
    I am not sure about that, but the actual deletion of rows occurs after the LOAD phase. Here are the phases of a LOAD job:

    The load process consists of four distinct phases:

    1. Load - During the load phase, data is loaded into the table, and index keys and table statistics are collected, if necessary. Save points, or points of consistency, are established at intervals specified through the SAVECOUNT parameter in the LOAD command. Messages are generated, indicating how many input rows were successfully loaded at the time of the save point.
    2. Index Build - During the build phase, indexes are produced based on the index keys collected during the load phase. The index keys are sorted during the load phase, and index statistics are collected (if the STATISTICS USE PROFILE option was specified, and profile indicates collecting index statistics). The statistics are similar to those collected through the RUNSTATS command.
    3. Delete - During the delete phase, the rows that caused a unique or primary key violation are removed from the table. These deleted rows are stored in the load exception table, if one was specified.
    4. Index copy - During the index copy phase, the index data is copied from a system temporary table space to the original table space. This will only occur if a system temporary table space was specified for index creation during a load operation with the READ ACCESS option specified.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  14. #14
    Join Date
    Mar 2012
    Posts
    120
    Quote Originally Posted by db2mor View Post
    You mentioned in the past that this is a one-time job, is this still the case?
    If you think that an Upsert (insert/update) requirement is justification for avoiding a load then you may be mistaken.
    Yes it's still the case, but it's just one case, I mean.. the same procedures will have to fit also the case in which rows are mostly updated.
    So, even if I'm having performance problems in the case of "bulk insertions", I should have this "upserts" procedures work in both cases (mostly inserts or mostly updates) since I don't know in advance which case it will be.
    And like db2mor remembers correctly, the source database is not DB2 and I have the constraint I cannot change anything on that side, this is the only reason why I said I cannot use LOAD.
    I know it's not a good design, but I have not the possibility to change it unfortunately.

    Then you also asked about volumes and average row-length, but this are variable too (depending on the initial database which can be of any size, and on the table).
    There are tables whose row-length is 100 bytes, and others whose row-length is 1 MB (in the cases where a description text is one of the fields).

    Another information that might help:
    I'm using automatic storage, I have just one big bufferpool (IBMDEFAULTBP) of 32K pagesize, and I'm using the default user tablespace (no other user tablespaces defined).
    Maybe I can have performance improvement by creating more bufferpools and more user tablespaces one for each page size?

    Also @db2mor:
    you said "higher commit-counts" could be something to try... what do you mean exactly?

    Thanks

  15. #15
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Provided Answers: 1
    Quote Originally Posted by Robert1973 View Post
    Maybe I can have performance improvement by creating more bufferpools and more user tablespaces one for each page size?
    May be, instead of changing random things, you could look at the monitor output and determine the actual performance bottleneck.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •