Unanswered: Easing server load for many-record queries
Hi, I'm thinking of starting a new website where people submit their own ratings of products. For example, users would give ratings of 8/10 or something.
Then I'd want to calculate averages of all the submitted ratings. But if there are thousands of ratings, won't it be lots of load on the server to calculate averages of thousands of records?
Is there a less load-intensive method of this?
I was thinking maybe having another table that holds the accumulated total of ratings. So if there are 5 people who rate the product, say their ratings are 9, 10, 10, 9, 8, then this other table will store the cumulative total (9+10+10+9+8) and the number of ratings (5). Then when I want to calculate the average, it will be easy on the server. Except every time someone adds a rating, I have to add that to the cumulative total, and the number of ratings.
Is that a viable solution? Is there a better way?
Thank you very much for any guidance you can offer. I'm new to database design.
But querying around 10,000 records for one page isn't heavy on the server?
At what point should I start worrying about its load on the server, 10,000 records? 20,000?
Everything of course depends on the data volume, data size, hardware specs and your load. It is probably not a problem to do something like SELECT AVG(something) FROM whatever WHERE some_id=some_value, not even if there are thousands of rows that must be scanned each time the query runs. However, if you have a lot of different IDs (in your example, that would probably be product_id, so it depends on the number of products) and each of them have thousands of ratings, it might become a problem. In such a case, the server might need to read rows from disk often since it cannot cache them all. On the other hand, the problem might not be as big as you think. Consider a situation in which you have 500 products and 25,000 "ratings" for each product. The table simply holds pairs of (product, rating) values. Possibly you want a user id or something in there too, to enforce a maximum of one vote per user and product.
Okay, let's see:
The table contains (product, rating, user); product and user are BIGINTs (8 bytes each) and rating is TINYINT (1 byte). 17 bytes per row + some overhead. Say 50 bytes per row including indices. It's very likely that less space is required.
50 bytes x 500 products x 25,000 ratings per product = 600 MB. (I hope this is a correct answer; please correct me if not!) I think it's very likely that you can perform at least one SELECT AVG... query per second with such a setup and data volume on relatively modern hardware.
Originally Posted by bboy_vitasoy
And if it does become a problem, what are some standard solutions to this problem?
Well, as you suggested: summary tables. You would probably want to (or at least I would!) have triggers for inserts, deletes and updates to make sure that the summary table(s) is kept consistent at all times. If you have a high SELECT/INSERT ratio, summary tables are less costly than reading thousands of rows each time. Instead, the query simply finds a row by primary key and reads the "value" column – in almost no time at all.