I would like to start a discussion on the polling techniques and the database design issues that help and hinder this.
I'll be honest, my company uses polling for client updates, and we are hitting some database limitations. My purpose is to spark a discussion and hopefully learn something from it.
Yes, we are using polling, and No, we can not replace it with event based mechanism, since there is a large number of applications involved, and original system design was not flexible enough to allow easy accross-the-board changes. Then there is a small issue of recompiling, changing and testing all those applications.
Currently, we have a central table (let's call it MAIN_REC) that stores all the system activity. This table populates another table (let's call it MAIN_UPDATE) through the (i/u/d) triggers. The new table contains the primary key on the MAIN_REC and the datetime field when this record was last updated. This last_update field is populated by evaluating the getdate() while populating the MAIN_UPDATE table.
The applications that monitor the table for updates check the datetime field to determine what records have been updated. There is an index on that field.
Now, everything works fine, as long as you use page or table locking. Just for refference, this is on Sybase, but I think this problem applies to other RDBMs. Anyways, when we switched the table locking schema to row locking (datarows on Sybase), we noticed that the our polling cycle would sometime skip some of the updates.
Our investigation turned up the following:
1. The timestamp stored in the datetime field was executed prior to actual writing of the record to the table.
2. When updating multiple records in a transaction, the timestamp on the various records in the MAIN_UPDATE will cover the entire range from the beginning to the end of the transaction (as it should).
3. A transaction that would insert a record into the update table would not lock the index on the table, which would allow other rows to be inserted through other processes at the same time.
4. Our polling logic relied that the writing queue is a FIFO. It accounted for the time discrepancy between evaluating getdate and actually saving the record to the disk by using the timestamp of the last retreived record as the beginning timestamp for the next poll.
Some of the points above state some of the assumptions that our client applications made, and those assumptions were holding until we needed to switch to data row locking, for performance reasons.
Our problem occured when one process would save multiple records into the MAIN_REC (which would trigger updates to MAIN_UPDATE) using the transaction, while other processes would update single records in MAIN_REC (which would create an entry in MAIN_UDATE). If the transaction has not been committed at the time of the polling cycle, but the update was, the polling logic would skip the records in the transactions, fully or partially, depending on the timeline.
If this is not too clear, which is quite possible, instead of understanding my jibberish, please describe any good database polling techniques that you know about that support multiple producers and consumers of the updates.
Is there any chance that your queue reader still uses DBLibrary? You mentioned that some of this code is quite old, and DBLibrary doesn't respect row level locking at all (except for the last few versions of DBLibrary which partially supported IRL). The "skipping" effect was a common problem at the time of switchover.
One solution you can use to minimize this is to build two different indicies, a clustered index on id and changedate (to spread changes more or less randomly through the table), and another on changedate alone (to allow easier sequential processing of queue entries). Note that this will significantly relieve your problem, but it won't fix it!
To fix the problem, you'll probably have to go back to page level locking, then find a way to speed the processes up enough that the contention is tolerable. One good way to handle this is to make the queue reader use optimisitic concurrency.
Yes, we are using the old DB library (Open Client 10.0.4), but the problem we are facing is a limitation on the database server. My guess is that row locking is used to increase the performance on the multiple clients accessing the same table, specially when those tables have a lot of trggers, dependencies and indexes. For example, we moved to row locking to reduce the overall server deadlocking and increase the speed of access to those table, including concurrent writing and reading of the table. I guess the key there is concurrent writing and reading (indexes are not being locked). In other words, you get some, you loose some.
If anyone is interested, I can post some SQL that will demonstrate the problem.
In the short term, we did go back to the page locking scheme.