Unanswered: How does no autocommit increase scalability?
The following quote comes from the following link which talks about eBay's scale-out architecture using autocommits. Why might they do that, when everything I've read, such as for both MySQL and its JDBC connector, says that autocommits are not good for performance, causing more writes than necessary.
It is interesting that their new architecture basically gives up on transactional databases. They say eBay has "absolutely no client side transactions", "no distributed transactions", and "auto-commit for [the] vast majority of DB writes". Instead, they apparently use "careful ordering of DB operations". It sounds like mistakes happen in this system, because they mention running "asynchronous recovery events" and "reconciliation batch" jobs, which, I assume, means asynchronous processes run over the database repairing inconsistencies.
If that is the case, how is eBay using autocommits to their advantage, and is all other knowledge on the topic wrong to suggest that autocommits are all bad?
There is often a dichotomy between performance and accuracy. Getting the right answer is difficult, and sometimes people seem to think that getting an answer is good enough, whether the answer is right, wrong, or irrelevant.
I would have expected more from a company like EBay than this, but based on personal experience from years ago with mid-level management at both EBay and PayPal this might be how they handle data integrity. Using a mental model something like "get the data now, we'll get it right someday" which frightens me.
In its purest form, autocommit basically removes the database from ensuring data integrity at all, and assumes that the application will manage integrity entirely on its own. This has the advantage of greatly improving performance by ensuring that the application will never need to wait on the database because of locks, etc. That also means that unless the application carefully manages everything that accesses the database (including backup jobs), then there is no way to guarantee data integrity.
I'm not sure I understand, because every statement occurs within a transaction, regardless of autocommit?
So let's say the majority of update scenarios update only one table - in these cases, isn't leaving the autocommit on fine? And in cases where they need to update multiple tables transactionally, they could do an explicit request to start a transaction then commit - are there any problems with this approach?
The same goes for reads from multiple tables - so long as it is one statement, then wouldn't autocommit's transactional boundaries be before and after that statement, which is exactly what you want?