Objective: Build a Mysql database structure flexible enough to handle from 10
to 100 million members, and more if needed.
Description: Online dating portal with enhanced matchmaking and
browsing capabilities. The site allows members to create a profile with
detailed info. Additionally, each member can post their profile in several
clubs (adult, dating, alternative, etc...)
Hardware: RAID 5 and RAID 0 DUAL XEON 1.6Ghz 2GIG of RAM servers.
Structure: One main update database server, 3 replication select database
Situation: Unlike hotmail or other high capacity database sites. A dating site
should provide an extensive search tool, which would go through all its
members. The challenge is to create an database that will be
scalable and fast enough. The problem is how can we search over 100 million
Database structure: Please find attached in this email the current database
structure. This structure needs to be altered and scaled to accomodate up to
100 million members, and more if needed.
For a live demonstration of the site, please visit http://demo.iwantu.com!
Details: Our objective is to design a DB structure that will be able to handle
up to, and over, 10-100 million users. Each user will have his main user
information stored in "iwantu_user", his main profile information in
"iwantu_profile" and for each club profile in "iwantu_clubX" where X
designated which club it is (1=dating, 2=adult etc...)
The solution we have now consists of splitting each of these tables into
multiple smaller tables, ie:
Always keeping the most recent information in table numbered "1". The
problem arises when we are searching for users whose profiles contain
specific criteria. How do we go about getting the information from all
the needed tables when we don't know in which table the user is.
If we use a centralized table to store in which tables each user has his
information, we come across problems where we get 100 results from the
club tables, then from those results we narrow them further with the
profile tables...we might end up with 2 results, when we need to display
a certain number of results per page in the search.
Consulting: Please provide us with an estimate of the number of hours and
total amount required to come up with a plan.
You are using the wrong tool for this job. While it is *possible* to split data vertically as your schema suggests, the subsequent complexity introduced to the application code and the severe performance hit is, IMHO, unacceptable. I think other knowledgeable DBA's or database developers would agree that this situation should be avoided at all costs. A similar approach (chronological or segmented table naming schemes) sometimes makes sense for archival data storage, but never for live data. Your schema also contains a few other relational and normalization pitfalls, but that's another story...
If you really must support 100 million users you have no choice but to step up to Oracle, Informix, DB2 or a similar heavy-duty RDBMS. Postgres might do the trick - it can theoretically handle your requirements but I have no direct or anecdotal evidence that it can (or cannot) perform acceptably with huge tables.
Considering your deadline, a realistic compromise might be to abstract your database calls throughout the application so that you can migrate your backend later on (I'm assuming that you don't need to support 100 million users right away) as your user base grows. This would allow you start off with a relationally correct schema under MySQL, would streamiline your application code and would dramatically increase the system's response time. When you begin to run up against MySQL's capacity / performance limitations, you can move to a new database with minimal downtime (maybe a few minutes) if it's planned correctly.
Unfortunately, I'm booked solid at the moment so I can't really offer my services for this project but I am always interested in talking to potential future clients. I'm a former senior DBA / Developer for a leading national newspaper and now a freelance web application / database / ecommerce developer. I *might* be able to design a proper schema for your system within your time contraints. Email me at firstname.lastname@example.org if you're interested.
btw, you're hardware setup sounds great. I'm not sure if you can fully exploit its advantages with MySQL, but it will definitely yield a dramatic performance gain, to say the least.