Thank you for the response Andy.
Yes, it is already in our plan.
But the thing is, this issue is only on one server with 2 instances. Same issue on any of the databases on this server.
We have other servers and instances with the same DB level and configurations and no issue.
Only the difference in this server I can see is, all the tablespace containers are of NFS and other servers have them as local filesystem containers.
Am I missing anything else to look into?
Do the instances/databases that report this error have the fenced ID setup the same way the instances/databases that work ok (ie. was the fenced ID used when creating an instance, is it part of the same group as the instance owner ID or not...)? Compare how the fenced ID is setup, if it's used, between working and non-working systems.
Okay, I see the difference between working and non-working servers. Thanks for pointing me to the right place. Here are the details.
On working servers, the Fenced User is the same as instance owner and
on non-working, they different. However, the Fenced User and instance owner are part of same SYSADM group but not the DBCTRL group. Could that be the reason?
Any system level changes (like group changes) takes longer time in our environment, is there a way to come over this issue as instance owner?
I suspect that the fenced ID doesn't have read access on some directory and/or file owned by the instance owner (only owner had read access). Check if the fenced ID can access all directories in $DB2HOME/sqllib and $DB2HOME/sqllib/db2systm file (this is dbm cfg file). As a test, you can login as the fenced and cd to $DB2HOME/sqllib. If the permissions look ok, one workaround would be to drop and re-create the instance without using the fenced ID (if the fenced ID is not required). You can use db2cfexp/db2cfimp to save and re-import config info.
Or open a PMR to report this problem and they can check if this is some known bug / suggest another workaround.
Finally I was able to schedule for downtime, had root to drop and recreate instances and everything is back to normal. Below are the steps I did.
1. Export the connectivity configuration information to export profile. (db2cfexp)
2. Stop replication on all environments
3. Have root to perform the instance drop and recreation. (db2idrop <instname>, db2icrt -u <instname> <instname>
4. Import the connectivity configuration files generated in step#1 (db2cfimp)
5. Start replication in all the environments