Unanswered: Replication stops after server reboot????
I have transactional replication set up between two dedicated servers. Server A is the PDC and Server B is a BDC (they are both Win2000 boxes). Both the servers are brand new, and replaced the two that were running like clock work (replication wise) for the last 12 months. I never had this problem with the old servers....
When the servers are shut down (as the case was a couple of weeks back with a power failure) or just recently when they were move to another room. Both servers boot up at the same time. Server B (which is the server holding the db being replicated) boots quicker and as a result replication fails and is then 'sucessfully stopped'. Unless I am aware of the server being rebooted and can monitor this potential problem, within 2 days the logfile grows to large and everything comes to a crashing halt.
I just remove replication, truncate and shrink the log, reset replication and we're away.... BUT I really need to know why it is happening in the first place. I figure there must be a setting that I have forgotten about or something.
Have you tried restarting replication agents before removing replication? Make sure sql agent is set up as auto start. I assume the distribution db are sitting on the publisher (server B). When dist server couldn't find subscriber (box A), it will auto stop all distribution agetns, which caused the distribution db to grow.
You can set up a job monitoring log reader and dist agent to see if they are stopped. Then run sp_start_job to automatically restart them.
When the log is full everything comes to a crashing halt. That's basically the reason why. You need to either backup the log or truncate it more often, at least once a day. Shrink it if need be. You really don't have to uninstall replicate to dump the log and shrink the log file. You can truncate the log and shrink it with replication running.
Joejcheng - yes I already do that, the transaction log is backed up and truncated every hour. The problem was when replication stops the transaction log grows (because it is retaining all transactions waiting to be replicated).
Richard - thank you! I had not even thought about just restarting the jobs (doh!). However the 2 times this has happened the transaction log has been around 15GB and the database controls a production system and they cannot afford to be offline for long (I usually just get them up and running and then reset replication the following evening). Can I start the jobs and have them continue work whilst it is pushing all the transactions across - or would it just never catch up because of the size of the transaction log?
And also thank you for mentioning sp_start_job - I will most certainly put that in place.
However I would still really like to know why this is happening in the first place, but I think I may have worked it out....
I set up replication from a script created on the previous system. One of the replication owners is XXXSqlAdmin, however in the new system this user is just SQLAdmin. I have noticed the error raised by this in the past (ie XXXSqlAdmin does not exist in this db), but didn't think it would cause any problems. But now I think I was very wrong. I changed the script before I ran it last night, no errors were returned. I will monitor the DB for the next week and then we will do a test where we shut down the servers and boot server B well before we boot server A.