If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > Informix > long checkpoint duration/disk flush

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 04-14-06, 08:33
ckercher ckercher is offline
Registered User
 
Join Date: Apr 2006
Posts: 6
long checkpoint duration/disk flush

Hello,

This is my first post to this forum, so please excuse me if I don't supply all the necessary information. Just let me know what additional information you need.

I am running Informix IDS v9.4FC5 on a HP9000 w/4 processors using the HP/UX 11i O/S. We rebooted our server the other day and immediately upon initialization of Informix we noticed long checkpoint durations. Our checkpoint duration for the past few months has been either 0 or 1 second. Now at peak load our checkpoint durations are around 30 seconds. My users are complaining vehemently!

I have made some changes to the configuration file based on some of the posts I read on this forum regarding "long checkpoint duration". The configuration changes have NOT corrected the problem. Please find, below, the Shared Memory configuration settings before and after my attempt to correct the problem.

What I can't understand is that NOTHING changed!...we simply rebooted the server. We did not change any Informix configuration settings, add dbspaces, etc. Does anyone have any ideas?


# Shared Memory Parameters

LOCKS 100000 # Maximum number of locks
BUFFERS 256000 # Maximum number of shared buffers
NUMAIOVPS 2 # Number of IO vps
PHYSBUFF 32 # Physical log buffer size (Kbytes)
LOGBUFF 32 # Logical log buffer size (Kbytes)
CLEANERS 128 (changed from 2) # Number of buffer cleaner processes
SHMBASE 0x0 # Shared memory base address
SHMVIRTSIZE 128000 # initial virtual shared memory segment size
SHMADD 20000 # Size of new shared memory segments (Kbytes)
SHMTOTAL 0 # Total shared memory (Kbytes). 0=>unlimited
CKPTINTVL 300 # Check point interval (in sec)
LRUS 128 (changed from 8) # Number of LRU queues
LRU_MAX_DIRTY 2.000000 # LRU percent dirty begin cleaning limit
LRU_MIN_DIRTY 1.000000 # LRU percent dirty end cleaning limit
TXTIMEOUT 0x12c # Transaction timeout (in sec)
STACKSIZE 64 # Stack size (Kbytes)
Reply With Quote
  #2 (permalink)  
Old 04-17-06, 14:08
mjldba mjldba is offline
Registered User
 
Join Date: Dec 2003
Location: North America
Posts: 139
Have you looked at your online.log file to see if anything changed?

Perhaps someone (other than yourself) with adequate permissions editted the onconfig file & bouncing the IDS instance brought those unintended changes into effect.
Reply With Quote
  #3 (permalink)  
Old 04-17-06, 14:18
ckercher ckercher is offline
Registered User
 
Join Date: Apr 2006
Posts: 6
Thank you for your reply.

There are only 2 people with adequate permissions (myself included) and we have not changed the HP/UX configuration or the Informix configuration files for well over a year. Things have been working just fine, so we didn't feel there was a need to tune the kernel or database.
Reply With Quote
  #4 (permalink)  
Old 04-19-06, 12:56
artemka artemka is offline
Registered User
 
Join Date: May 2004
Location: New York
Posts: 248
Please post you full $onconfig, online.log files and the output of onstat -p
onstat -F
Reply With Quote
  #5 (permalink)  
Old 04-19-06, 13:59
ckercher ckercher is offline
Registered User
 
Join Date: Apr 2006
Posts: 6
Requested Information

I have attached the online.log and onconfig files. Below you will find the results of the "onstat -p" and "onstat -F" commands.



onstat -p:

IBM Informix Dynamic Server Version 9.40.FC5 -- On-Line -- Up 1 days 07:02:36 -- 713280 Kbytes

Profile
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached
14706284 10531140 5983189370 99.75 838472 3262021 298477547 99.72

isamtot open start read write rewrite delete commit rollbk
3074331191 82737328 378778894 1466602748 141150282 1202813 97196 11705 0

gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs
0 0 0 0 0 0 0

ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes
0 0 0 94814.64 5168.81 301 901

bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans
1483966 0 209545907 0 0 1944 9181288 2230521

ixda-RA idx-RA da-RA RA-pgsused lchwaits
1252985 27670 8772545 10052847 16553892



onstat -F:

IBM Informix Dynamic Server Version 9.40.FC5 -- On-Line -- Up 1 days 07:03:03 -- 713280 Kbytes


Fg Writes LRU Writes Chunk Writes
0 0 391595

address flusher state data
c0000000238d5860 0 I 0 = 0X0
c0000000238d6098 1 I 0 = 0X0
states: Exit Idle Chunk Lru
Attached Files
File Type: txt onconfig.txt (10.5 KB, 115 views)
File Type: zip online.zip (716.5 KB, 51 views)
Reply With Quote
  #6 (permalink)  
Old 04-19-06, 15:26
mjldba mjldba is offline
Registered User
 
Join Date: Dec 2003
Location: North America
Posts: 139
Are you sure you changed the number of LRU's & CLEANERS at stated in your original post or did you revert back to your original onconfig settings? The onconfig you posted shows 2 CLEANERS & 8 LRU's rather than 128 of each, LRU_MAX_DIRTY of 10% and LRU_MIN_DIRTY of 5% rather that 2% and 1%.

I scanned the online log & I can't find any reference to any changes in number of LRU's or CLEANERS ... these changes would usually be found a few lines after "Informix Dynamic Server started" in the online.log.

Perhaps you're editting the wrong onconfig file or someone changed the INFORMIXDIR env variable so it's pointing to the wrong onconfig at start-up.

Looks like you're doing all chunk writes and there are no LRU writes so all page flushing is occuring only at the 5 minute checkpoint interval and at physical log flush when 75% full. No writing (LRU writes) is occuring between checkpoints so you have to find a way to increase your LRU writes.

Based on your onconfig you have 32000 buffers per LRU (256K/8) and you start flushing one LRU when 3200 (10%) are dirty and stop flushing that LRU when 1600 (5%) are left dirty.

Based on no LRU writes potentially you have between 12800 & 25600 dirty buffers to flush every 5 minutes. 256K/8 LRUs = 32000 then 32000 * .10 (start at 10%) = 3200 then 3200/2 (stop at 5%)= 1600 then 1600 * 8 LRUs = 12800 buffers and that's alot.

This document written by Informix guru Art Kagel helped me

http://www.prstech.com/tips/art_kagel_tuning_tips.shtml

Last edited by mjldba; 04-20-06 at 07:46.
Reply With Quote
  #7 (permalink)  
Old 04-19-06, 16:42
ckercher ckercher is offline
Registered User
 
Join Date: Apr 2006
Posts: 6
Since the configuration changes I documented in my initial posting did NOT decrease the checkpoint durations, we reverted to the original settings. I included the "online.log" that shows the 4 months of checkpoints before the driver installation and immediately after the driver was installed. Therefore, the configuration changes I made would not have been reflected in the log file. However, I have attached the current log file that DOES include those changes.

I will definitely read the information you referred to. My apologies for not explaining the information I provided more clearly.
Attached Files
File Type: zip online.zip (103.4 KB, 47 views)
Reply With Quote
  #8 (permalink)  
Old 04-20-06, 08:40
mjldba mjldba is offline
Registered User
 
Join Date: Dec 2003
Location: North America
Posts: 139
Tuning is far from a science ... very site specific. I would never insinuate my way is perfect but I've had good results by forcing more LRU writes (better for OLTP) than chunk writes (better for batch env) and I never see foreground writes.

I have 248000 buffers, 127 LRUs & CLEANERS, LRU_MIN_DIRTY= 0, LRU_MAX_DIRTY = 1, CHECKPOINTS every 4 hours, and PHYSFILE = 20000.

Monitor the # of buffers per LRU and the current total # of dirty buffers using the last of 3 lines of onstat -R.
You can monitor the physical log (PHYSFILE) using onstat -l looking at the 3rd & 4th line under "Physical Logging" near the top.

I let PHYSFILE automatically flush dirty buffers when it hits 75% full, LRU writes are constant and small (flushing starts when 19 buffers are dirty) and checkpoint times are usually <= 1 second (never more than 2 seconds) which is unnoticeable.

I'm using IDS 9.30.UC6 in an AIX env, and you're using IDS 9.4 in a HP env so you have greater flexibility & granularity available with your LRU_MIN_DIRTY & LRU_MAX_DIRTY parameters.

Some parameters hurt you when they're set too big, some hurt you when they're set too small, and some hurt you when they're not modified in unison so you've got to figure out what works best at your site.
Unfortunately, bouncing the engine is necessary so you'll get one chance per day to make incremental changes in a controlled fashion.

here's another site worth checking http://www.oninit.com/

good luck

Last edited by mjldba; 04-20-06 at 08:51.
Reply With Quote
  #9 (permalink)  
Old 04-20-06, 09:47
ckercher ckercher is offline
Registered User
 
Join Date: Apr 2006
Posts: 6
First, I just wanted to thank you for taking time to provide me with this valuable information. I have a few questions as a result of your last post.

You stated you have Checkpoints every 4 hours, you mean 4 minutes right? The number which defines the Checkpoint duration is in seconds.

I know what information to look at in the "onstat -R" and "onstat -l" output. But what should I be looking FOR? In your experience, what information from each of the "onstat" outputs should concern me?

Thanks again.
Reply With Quote
  #10 (permalink)  
Old 04-20-06, 10:29
mjldba mjldba is offline
Registered User
 
Join Date: Dec 2003
Location: North America
Posts: 139
You're very welcome & if I can help you a little, or point you in the right direction, then I've done my part to lend a hand. I've had no formal training aside from learning some lessons the hard way, reading documentation, and using BB resources like the two I listed, and I just found this one in some notes:

http://docs.rinet****/InforSmes/

Like I said, I let the physical log take care of large buffer flushing when it hits 75% full rather than scheduling checkpoints so I have automatic checkpoints set-up for 14,400 seconds = 4 hours. I use this method so that if system activity is light a checkpoint will take place during lunch time & the next will occur a quitting time or just beyond. I know it sounds rather unorthodox but it was a method suggested in one of the URLs & it works for me.

Juggling # of buffers, # of LRUs & CLEANERS, and LRU MIN/MAX parameters got me out of hot water with long checkpoints. Initially, I tried doing checkpoints every 2 minutes, then every one minute & results were inconsistent if activity was heavy 'cause sometimes I still had 10-15 second checkpoints.

I have 2 telnet sessions running all day; one displays the line from onstat -R
(onstat -R -r 2 | grep queued) to monitor # of dirty buffers and the total number never exceeds 1400 'cause LRU writes are taking care of flushing small number of buffers.
I use onstat -l -r 2 | grep 200035 which is the number beneath phybegin (it's unique in the whole onstat -l output) to monitor how full the physical log is.

My physical log is 20000KB, which is 5000 4k pages, and flushes will automatically occur when 3750 are dirty; I find this to be a manageable number.

There's a useful script available at the oninit site: choose Informix Database,
download (left side menu), scripts, Health Check. It's kind of basic but it may point you in the right direction
Reply With Quote
  #11 (permalink)  
Old 04-20-06, 11:17
mjldba mjldba is offline
Registered User
 
Join Date: Dec 2003
Location: North America
Posts: 139
The Art Kagel document includes this reference regarding HP:

"HP/UX PA RISC has only four shared memory segment registers. If you have to access more than four shared memory segments concurrently, your process will become very slow. Informix needs access to at least: 1-Resident segment, 1-Virtual segment, and for each process 1-Message segment. If your engine has to add more than one additional virtual segment, the CPU VPs will become bogged down. If your application is already using its own shared memory and the engine allocates even one additional virtual segment, your application will bog down switching between the engine's shared memory and its own. "

onstat -g seg will show how many "V" segments you have and, if you have more than 1, will indicate SHMVIRTSIZE may be too small, causing allocation of additional segments using SHMADD.

Last edited by mjldba; 04-20-06 at 13:52.
Reply With Quote
  #12 (permalink)  
Old 04-20-06, 15:55
ckercher ckercher is offline
Registered User
 
Join Date: Apr 2006
Posts: 6
I have found where my problem lies. However, I have no idea how to correct it. I turned on the "TRACEFUZZYCKPT" to monitor the checkpoint performance more closely. The problem is not the number of dirty buffers to be flushed. The problem is with the "dskflush()". This is the area of the checkpoint that is taking ALL the time. Only one problem...I can't find any information regarding this problem. I have no idea what to do to correct it.
Reply With Quote
  #13 (permalink)  
Old 04-21-06, 07:59
mjldba mjldba is offline
Registered User
 
Join Date: Dec 2003
Location: North America
Posts: 139
Reply With Quote
  #14 (permalink)  
Old 02-24-09, 02:04
rootdbs rootdbs is offline
Registered User
 
Join Date: Feb 2009
Posts: 51
Hi, ckercher
I think that your problem is in very small PHYSFILE - 6Mb! It is funny.
Increase it up to 200Mb or 300Mb.
How many memory on your server?
I see that you have only 512Mb buffers pool.
If you can increase it - increase it.
Reply With Quote
  #15 (permalink)  
Old 09-17-09, 02:35
lianyong lianyong is offline
Registered User
 
Join Date: Sep 2009
Posts: 1
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On