I have an Informix Server which is periodically crashing the instance due to a known bug. I am not able to upgrade to the version which the bug is fixed due to the application compatibility. I would like to know if and how I can set up an automatic notification when this occurs either via pager or email, rather than having application users call and reporting that they are getting database errors because it is down.
We currently monitor on AIX to see if the oninit process is running using grep, but the problem is that we have two informix instances running on the machine, so one could be down and the other still running which means an oninit is still running. I need to know if either goes down.
Use the alarmprogram, normally this is triggered. Check in this script for a severity of 5 (Panic, IDS crash). This is the first incoming parameter. Of course this only works if IDS still calls the alarmprogram. Most of the time this will work.
To monitor specific instances there is a dirty way to do yhe job. Create two symbolic links that point to the oninit process (for exampe oninit_inst1 and oninit_inst2). Now start instance 1 with oninit_inst1 (and instance 2 with ....).
Now you will see serveral ononit_inst1 processes en aseveral oninit_inst2 processes. With this you can monitor a specific engine.
# check if online is up
if [ -n "`$INFORMIXDIR/bin/onstat - | grep -i on-line`" ]
mail who_you_want_to <<-!EOF!
The sky is falling the Informix Engine is OFF-LINE
Since you have two instances, the cron script would need to run in both of them by including the appropriate values for INFORMIXDIR, INFORMIXSERVER, and ONCONFIG. You could code both checks into one script.
You can use the script /home/informix/etc/evidence.sh
check in oncofig file row
SYSALARMPROGRAM /home/informix/etc/evidence.sh # System Alarm program path
if in your server can send mail to outside you can set in script next parametrs
the default setings
you can chage to your address parametr RAS and set on the parametrs SEND_RAS_MAIL
can put your script in the /home/informix/etc/evidence.sh
and you not need to every second to check Online your database or not
ALARMPROGRAM is not called 100% of the time - especially a hard fault at the operating system level. You are assuming the the engine knows enough that it's going down to even call ALARMPROGRAM - and - it's not the VP that call's ALARMPROGRAM that has died.