Hi... I am very interested in developing a small cluster at home for playing around with.
Since I will be having 12 nodes for my active web servers (PHP/Python/Ruby/Apache) I need some database power to keep up with the onslaught of load it will face. I could go for a large quad procesing machine but that will cost way too much.
So clustering seems the ideal solution, however, of course clustering databases isn't simple. I been looking at OpenAFS and OpenGFS to see whether they are suitable but I am not too sure that they are. Has anyone utilised them for database clustering before?
I understand there is replication, however I want to have all nodes able to accept database changes. I currently only use MySQL and am familiar with that, and am aware there can only be one master and multiple slaves which feed off the changes applied to the master only.
The only way to date I can think of to make a clustered database system is to use a message passing system between them. So when a database receives a query which modifies the database, it will send a pre-parsed and compiled SQL query to the other servers to ensure they get the update. If any server fails to update itself, it will make itself unavailable, and redirect it's requests to other machines or notify the load balancer to not include it until it has caught up with the other machines then make itself available again.
This design ensures all nodes of the cluster are identical in configuration and makes it easy to add more machines, perhaps even diskless mounts for it's appliaction side and an internal HDD for the database.
The problem here is that there is still not much improvement for database changes, except the fact that the parsing and compilation of the query is already done.
My other idea was XML databases. By using a collection of files, you can distrubute them around a large distributed network filesystem. This however, is subject to serious data loss should one node crash hard. This could be overcome by making the node doing the writing put it on another few machines aswell... say (1/(n machines) * 100)% of the machines in the cluster just in case. The odds of all those machines crashing being pretty slim.
As far as I can imagine, an XML database won't have to put a lock on every file in the cluster therefore nearly all the files are available for reading/writing whenever they want.
So that is what i have been thinking, onw if anyone has had experience or has been thikning abo distributed database systems (whether they be RDBMS, OO or XML) I'd love to hear what you got.
One more thing, I would like to set a series of requirements I would expect from a cluster of databases.
- no one single point of failure of the whole cluster
hehe, this is subject to change... and I would like to hear about other's requirements too