A couple of occasions, recently, replication has failed on a couple of boxes due to unusually high network traffic..

While these were isolated incidents, it was frustrating not to know about it as soon as it happened (ie until the App noticed it was out of synch, which can take a while)

Are there any commonly used mechanisms for setting up alerts via Email etc for when replication fails?

it's only a single, simple master-slave setup, where the master is used for live transactions, and the slave is used for reporting.