Results 1 to 6 of 6
  1. #1
    Join Date
    Feb 2010
    Posts
    75

    Red face Unanswered: Need help with setting up alerts for replication

    Hi Everyone,

    We have occasional problems with transactional replication between 2 databases, and the problem is we dont notice right away. Usually the customer notices a few days after replication has stopped, and thats no good. We want to be notified as soon as there is any problem related to replication so we can go in and figure it out.

    Ive been playing with the alerts, and it LOOKS like I should be able to do what I want, but after several hours I still cannot make any of the alerts work.

    In my test environment, I'm forcing the following error, in order to try and test an alert: error 20598 "The row was not found at the Subscriber when applying the replicated command."

    This seems to be one of our more common errors, and also its really easy to produce, so it seemed like a good candidate for testing the alert.

    First off, I noticed that the agent profile was set to "Continue on data consistancy errors", which ignores error 20598. So I changed the profile to "Default", which doesnt skip any errors.

    Next I set up a simple Job that simply writes a string to a file. I tested the job by calling it manually and it works. Once i can prove that this job is actually being called, I will change it to email me and the team.

    Then I created a new alert in the publisher database which watches for error code 20598. Its response is to call the job I mentioned above. So when the error occurs, a string should be written to that file.

    Then I forced the 20598 error to occur. I deleted a row in the subscriber DB, and then changed the corresponding row in the publisher. This causes the error to occur. In my replication monitor, I see new entries saying "Skipped 1 error(s) when applying transactions at the Subscriber." I'm not sure why it says skipped, but I know that these entries are the 20598 error that I'm trying to produce.

    Now, there was nothing written to my text file, and I can see from the job history that my job was never called. If I look at the history of my test alert, it says "never occured".

    So something is screwy here. I forced the 20598 error to occur, and I have an alert that is watching for the 20598 error, but it never fires. Can anyone see what is wrong here?

    I also created an alert in the subscriber database, watching for the same error. This alert was not called either.

    Once I get this working, I will add all the other error numbers that I am interested in knowing about, and then I will make it so I receive an email whenever anything related to replication goes bad.

    Please help! I feel like I'm close to having this working but I dont see the problem. Very frustrating!
    Thanks,

    SQL Server 2008 R2
    Last edited by BeerOclock; 08-15-12 at 12:43.

  2. #2
    Join Date
    Feb 2010
    Posts
    75
    Just wanted to report back that I've spent a few more hours trying to get this working and still no luck. I'm really becoming unimpressed with how user unfriendly SQL server can be. I see no indication anywhere in the error logs or anywhere else, as to what I'm doing wrong.


  3. #3
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    Having to make a few assumptions here. There are multiple problems which can cause a "stop". Does that mean that the distribution job stops? Is it supposed to be continuous, or on a schedule?

    On a side note, are the subscribers allowed to update the replicated data? Not sure how you would get data inconsistencies without the subscriber actively mucking with the data, or much larger problems.

    Lastly, are all three moving parts (publisher, distributor, and subscriber) all SQL 2008 R2? Or is there another version loose in the system?

  4. #4
    Join Date
    Feb 2010
    Posts
    75
    Thanks for the response, MCrowley.

    Yes I figured there would be multiple things that can stop replication. Thats why as a first step I just wanted to prove it out by just having a simple replication problem (that I can easily produce) send out a test alarm. Once I have that working, then I can dig in to the various types of alarms, and figure out under what circumstances to raise an alarm.

    We have a single publisher database and a single subscriber database, and yes everything is 2008R2. Due to the constant stream of new data, both databases have to periodically delete old records. This is where the replication problem came in. Due to an unrelated deadlock, the deletion job on the publisher database failed, whereas the deletion job on the subscriber database succeded. This subsequently caused the "row not found on subscriber" replication error as the publisher was trying to insert or update a row that no longer existed.

    I'm sure you might have some suggestions on how to improve the arcitecture here, but for now I just want to figure out how to properly set up replication alerts.

    For some reason, even after I enabled all the default replication alerts, and even created a custom alert specifically to watch for error 20598, and then I caused error 20598 to happen. Even after doing all that, it appears that none of the alerts were raised. Looking at their history, they all say "last occurred: never".

  5. #5
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,800
    Provided Answers: 11
    First suggestion would be to not do the delete job on the subscriber. The deletes will find their way to the subscriber database on their own via the replication.

    As a poor man's solution, if it is the distribution job that stops all the time, you can add a job step to send an alert that the distribution job has stopped, and someone should have a look. The distribution job is a job like any other in SQL Agent. You should not alter any of the existing steps, except to replace the "exit reporting failure" with "Go to" your alert step.

  6. #6
    Join Date
    Feb 2010
    Posts
    75
    Quote Originally Posted by MCrowley View Post
    First suggestion would be to not do the delete job on the subscriber. The deletes will find their way to the subscriber database on their own via the replication.
    Yea I think we could accomplish this by enabling "replication of deletes", which is currently disabled. The reason we can't enable it is that the subscriber database is a history database, which retains records for a longer peroids for reporting purposes.

    Even if we did find a better way to do this, we have found that sooner or later different replication issues always crop up. Rather than attempting to have a perfectly architectured database that will never have any issues, I'd rather just get notified about replication issues. It should be a quick thing to set up, and I'd like to have that working while we strive for that perfect database

    Quote Originally Posted by MCrowley View Post
    As a poor man's solution, if it is the distribution job that stops all the time, you can add a job step to send an alert that the distribution job has stopped, and someone should have a look. The distribution job is a job like any other in SQL Agent. You should not alter any of the existing steps, except to replace the "exit reporting failure" with "Go to" your alert step.
    I dont know if its the distribution job that stops. I'm pretty sure that for the "row not found at subscriber" error, SQL server basically throws up a warning and keeps on replicating. So in this case I dont think I would get the alert.

    I'll defintly try your suggestion since it makes sense. However I'd really like to also get these replication alerts to work. You know the ones I mean right? Under SQL Server Agent > Alerts there are about 15 alerts there by default (although disabled by default). It seems to me the SQL team has already thought of the most common scenarios and all I should have to do is enable them.

    Thanks again for your time & help.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •