Results 1 to 8 of 8
  1. #1
    Join Date
    Nov 2011
    Posts
    31

    Unanswered: Identity column - setting cache value high?

    During utility loads (LUW 9.7) on large tables that have an Identify column (PK) we've noticed that load times can be substantially improved if we alter the table from the cache default value of 20 to something very high. In our case 10,000. IBM states that "When values are generated for the identity column, pre-allocating and storing values in the cache reduces synchronous I/O to the log." This we've confirmed. The only negative comment I've read is "In the event of a database deactivation, either normally or due to a system failure, all cached sequence values that have not been used in committed statements are lost (that is, they will never be used). The value specified for the CACHE option is the maximum number of values for the identity column that could be lost in case of system failure.".
    So in the event of an outage I could have a gap in generated values for my identify column. My question, there has to be some other disadvatages otherwise assuming you could live with the gap why wouldn't you always set this parameter high?

    Greatly appreciate your insight.

  2. #2
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    There is no other disadvantage. However, remember that a database is deactivated whenever there are no applications connected to it, so the gaps are not going to occur only when there is a system crash. You could explicitly activate the database to minimize this problem, but you can chew up a lot of numbers if it set too high when you are doing system maintenance and the database becomes deactivated.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  3. #3
    Join Date
    Nov 2011
    Posts
    31
    Thanks Marcus, in our case we've 'explicitly' activated the datbase. An additional question, if we're doing a total DB refresh (150+ tables) and have set the cache parameter high on all of these tables do you see any potential problem with memory utilization (i.e. DB2 has to store these values someplace)?

  4. #4
    Join Date
    Apr 2012
    Posts
    1,035
    Provided Answers: 18
    if your data-volumes are high enough , and your identity column data-type is not bigint, you can run out of sequence values earlier than you might expect... and if the sequence can cycle that might cause undesirable results.

  5. #5
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by chippib View Post
    Thanks Marcus, in our case we've 'explicitly' activated the datbase. An additional question, if we're doing a total DB refresh (150+ tables) and have set the cache parameter high on all of these tables do you see any potential problem with memory utilization (i.e. DB2 has to store these values someplace)?
    10,000 x 8 x 150 = 12 MB. But I have no idea where these are stored and how memory for the cache is allocated. You might want to open a PMR with DB2 Support to ask that question.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

  6. #6
    Join Date
    Jan 2007
    Location
    Jena, Germany
    Posts
    2,721
    I don't know either for sure, but I would assume that DB2 only stores the cache size + current/next value. Basically, we are only talking about a few bytes, which I wouldn't start worrying about.

    Interestingly, the docs only talk about "system failures". This may suggest that unused values may not cause a gap on database deactivation. However, I just tried it on 9.7 and the gap is there on regular database deactivation:
    Code:
    4 local:~/ $ db2 "create sequence s cache 10000"
    DB20000I  The SQL command completed successfully.
    5 local:~/ $ db2 "values next value for s"
    
    1
    -----------
              1
    
      1 record(s) selected.
    
    6 local:~/ $ db2 "values next value for s"
    
    1
    -----------
              2
    
      1 record(s) selected.
    
    8 local:~/ $ db2 connect reset
    9 local:~/ $ db2 terminate
    10 local:~/ $ db2 connect to test
    11 local:~/ $ db2 "values next value for s"
    
    1
    -----------
    10001
    
      1 record(s) selected.
    So you may want to use explicit activation/deactivation only.

    p.s: I also tried to reduce the cache value, but once sequence values have been cached, they are not coming back, even if the cache size is reduced.
    Knut Stolze
    IBM DB2 Analytics Accelerator
    IBM Germany Research & Development

  7. #7
    Join Date
    Nov 2011
    Posts
    31
    Thank you all so much for your help. We'd noticed that the "Load" challenges were associated with tables that are compressed.

    So, here are the observations:



    1 – If the table is not compressed, the value of identity cache has no bearing. It could be as little as 2 or as high as 100,000 and the load times are still good.

    2 – If the table is compressed, then the value of identity cache plays an important role. When the identity cache was set to 10,000 we loaded 50,000,000 rows in 8:33 minutes – which turns out to be 97,466 rows/second. With the identity cache set to 20, we loaded 2,500,000 rows in 21:29 minutes – which turns out to be 1,910 rows/second. The difference is obviously huge.

  8. #8
    Join Date
    May 2003
    Location
    USA
    Posts
    5,737
    Quote Originally Posted by chippib View Post
    Thank you all so much for your help. We'd noticed that the "Load" challenges were associated with tables that are compressed.

    So, here are the observations:



    1 If the table is not compressed, the value of identity cache has no bearing. It could be as little as 2 or as high as 100,000 and the load times are still good.

    2 If the table is compressed, then the value of identity cache plays an important role. When the identity cache was set to 10,000 we loaded 50,000,000 rows in 8:33 minutes which turns out to be 97,466 rows/second. With the identity cache set to 20, we loaded 2,500,000 rows in 21:29 minutes which turns out to be 1,910 rows/second. The difference is obviously huge.
    Just curious as to whether you attempted to do multiple load jobs at the same time on the same table (by splitting the input file), and if you did, what results were you able to achieve.
    M. A. Feldman
    IBM Certified DBA on DB2 for Linux, UNIX, and Windows
    IBM Certified DBA on DB2 for z/OS and OS/390

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •