Concurrency problem: Use row locking or some other method?
I am working on an application that regularly dumps various URLs into a table. Elsewhere, another process then routinely checks for new rows, fetches each URL, and then deletes the row. I'd make the part of the app that does the URL fetching to be multithreaded. However, I don't know how I am going to avoid concurrency issues. What's the best way to approach this problem?
I want to make sure that each URL is fetched only once, plus ensure that if an HTTP error occurs (or a similar error) while fetching the URL that the row is not deleted from the table.
(I'm using plpgsql sprocs (PostgreSQL) and Python on the app end if anyone cares. But this seems to more of a general architectural question)
I don't know PostgreSQL, however the RDBMS might be relevant. For example, my RDBMS of choice is SQL Server and my first thought would be to issue a delete statement with the OUTPUT clause - this would deletes rows and returns the deleted rows as a result set in a single operation.