Hi All, New to this forum but been around lurking for a while
I am planning a new product and have some DB problems i am hoping you may be able to shed some light on
Currently the front end is laravel using a mysql db, I use phpresque and Redis to queue tasks for the system and then laravel writes to the db once the task is complete
The problem i will be facing soon is that my current db size is around 3.5 gb. this is for data (in related tables) on 26000 urls.
My end expectation is to contain data on approx 300 million urls
This would (if my calculations are correct) be around 40TB of data that needs to be in the db
From what i have read mysql is out at this point.
What i would like to hear from everyone is your thoughts on what db would firstly be able to manage a db of this size.
Secondly i want the front end of the site to be able to serve this data in a reasonable timeframe. for this my plan was that when a url finishes processing (it gathers data about the url from approx 4 different api's plus our own) that the total data set that is in a few different tables should be collated and then inserted into a sort of front end DB perhaps mongodb or evenb solr. but im unsure as to their performance at this kind of size.
So if you had to deal with a Relational db of 40tb how would you advise to go forward or what should i be reading about and once i have this data available in some sort of db how would you map this to a front end db which could return results to a webpage within a reasonable timeframe.
So far i have seen that possibly postgres can handle this size, sharded mongodb could possibly do the frontend but reindexing knocks it offline and if i want to do any kind of (whatever the equivalent to a mysql) query on a db this size in mongo the performance might be lacking so would possibly have to use hadoop with mapreduce to run these size queries.
Im a bit of a noob with big data so any help is much appreciated and I'm glad to be part of the forum!!