I would like to start a discussion on the polling techniques and the database design issues that help and hinder this.
I'll be honest, my company uses polling for client updates, and we are hitting some database limitations. My purpose is to spark a discussion and hopefully learn something from it.
Yes, we are using polling, and No, we can not replace it with event based mechanism, since there is a large number of applications involved, and original system design was not flexible enough to allow easy accross-the-board changes. Then there is a small issue of recompiling, changing and testing all those applications.
Anyways...
Currently, we have a central table (let's call it MAIN_REC) that stores all the system activity. This table populates another table (let's call it MAIN_UPDATE) through the (i/u/d) triggers. The new table contains the primary key on the MAIN_REC and the datetime field when this record was last updated. This last_update field is populated by evaluating the getdate() while populating the MAIN_UPDATE table.
The applications that monitor the table for updates check the datetime field to determine what records have been updated. There is an index on that field.
Now, everything works fine, as long as you use page or table locking. Just for refference, this is on Sybase, but I think this problem applies to other RDBMs. Anyways, when we switched the table locking schema to row locking (datarows on Sybase), we noticed that the our polling cycle would sometime skip some of the updates.
Our investigation turned up the following:
1. The timestamp stored in the datetime field was executed prior to actual writing of the record to the table.
2. When updating multiple records in a transaction, the timestamp on the various records in the MAIN_UPDATE will cover the entire range from the beginning to the end of the transaction (as it should).
3. A transaction that would insert a record into the update table would not lock the index on the table, which would allow other rows to be inserted through other processes at the same time.
4. Our polling logic relied that the writing queue is a FIFO. It accounted for the time discrepancy between evaluating getdate and actually saving the record to the disk by using the timestamp of the last retreived record as the beginning timestamp for the next poll.
Some of the points above state some of the assumptions that our client applications made, and those assumptions were holding until we needed to switch to data row locking, for performance reasons.
Our problem occured when one process would save multiple records into the MAIN_REC (which would trigger updates to MAIN_UPDATE) using the transaction, while other processes would update single records in MAIN_REC (which would create an entry in MAIN_UDATE). If the transaction has not been committed at the time of the polling cycle, but the update was, the polling logic would skip the records in the transactions, fully or partially, depending on the timeline.
If this is not too clear, which is quite possible, instead of understanding my jibberish, please describe any good database polling techniques that you know about that support multiple producers and consumers of the updates.