Quote:
|
Originally Posted by bboy_vitasoy
But querying around 10,000 records for one page isn't heavy on the server?
At what point should I start worrying about its load on the server, 10,000 records? 20,000?
|
Everything of course depends on the data volume, data size, hardware specs and your load. It is probably not a problem to do something like SELECT AVG(something) FROM whatever WHERE some_id=some_value, not even if there are thousands of rows that must be scanned each time the query runs. However, if you have a lot of different IDs (in your example, that would probably be product_id, so it depends on the number of products) and each of them have thousands of ratings, it might become a problem. In such a case, the server might need to read rows from disk often since it cannot cache them all. On the other hand, the problem might not be as big as you think. Consider a situation in which you have 500 products and 25,000 "ratings" for each product. The table simply holds pairs of (product, rating) values. Possibly you want a user id or something in there too, to enforce a maximum of one vote per user and product.
Okay, let's see:
The table contains (product, rating, user); product and user are BIGINTs (8 bytes each) and rating is TINYINT (1 byte). 17 bytes per row + some overhead. Say 50 bytes per row including indices. It's very likely that less space is required.
50 bytes x 500 products x 25,000 ratings per product = 600 MB. (I hope this is a correct answer; please correct me if not!) I think it's very likely that you can perform at least one SELECT AVG... query per second with such a setup and data volume on relatively modern hardware.
Quote:
|
Originally Posted by bboy_vitasoy
And if it does become a problem, what are some standard solutions to this problem?
|
Well, as you suggested: summary tables. You would probably want to (or at least I would!) have triggers for inserts, deletes and updates to make sure that the summary table(s) is kept consistent at all times. If you have a high SELECT/INSERT ratio, summary tables are less costly than reading thousands of rows each time. Instead, the query simply finds a row by primary key and reads the "value" column – in almost no time at all.