Hello all,

It's good to be posting here for the first time. To quickly set the scene for my problem:

We're building a web app which will either be a) a big flop - no scaling issue or b) a huge success with a massive scaling problem (to the extent of sites like youtube, friendster, myspace etc). There won't really be much of a middle ground.

I'm fairly good with relational database design but I've never really had to deal with scaling issues due to what I've been building (small online shops, forums etc) so generally speaking, a single database server has been more than enough.

I've been looking in to database partitioning to scale things out horizontally - this sounds like the best way forward for our application as we will have a massive user base and once it's up and running there will be little we can do to stop the growth of the database so I like the idea of just adding on database servers as required without any complicated db clusters.

My questions are:

1. Is MySQL a good choice? I know about the new built in disk partitioning, but I'm talking about partitioning across multiple server - e.g. server a has users 1 to 1mill, server b 1mill to 2mill etc. I would happily consider PostGreSQL or other open source dbs. To some extend I would consider SQL Server but if we're talking 20/30 db servers this could get pricey.

2. How can I reconnect/reassemble data over partitions? Do I need to use replication or something like that?

3. What's the best way to know where my data is? The are certain parts of data that I can give a key to, to identify the server it's stored on. However, with something like users I really want to split them evenly across servers so when they login with a username/password - it could be on any of the servers. Is it best to maintain a separate lightweight db as an index, or hit each DB until I find it?

4. Has anyone got any experience partitioning a database at a later date? E.g. can I wait for it to become a problem and then migrate my nice single db to a multiple db style architecture. I'm interested to know the sort of headache this could course me when I have lots of people using the database!

Any help, pointers or links to further information would be greatly appreciated.