Results 1 to 4 of 4
  1. #1
    Join Date
    Feb 2006
    Posts
    5

    Unanswered: Easing server load for many-record queries

    Hi, I'm thinking of starting a new website where people submit their own ratings of products. For example, users would give ratings of 8/10 or something.

    Then I'd want to calculate averages of all the submitted ratings. But if there are thousands of ratings, won't it be lots of load on the server to calculate averages of thousands of records?

    Is there a less load-intensive method of this?
    I was thinking maybe having another table that holds the accumulated total of ratings. So if there are 5 people who rate the product, say their ratings are 9, 10, 10, 9, 8, then this other table will store the cumulative total (9+10+10+9+8) and the number of ratings (5). Then when I want to calculate the average, it will be easy on the server. Except every time someone adds a rating, I have to add that to the cumulative total, and the number of ratings.

    Is that a viable solution? Is there a better way?

    Thank you very much for any guidance you can offer. I'm new to database design.

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    A lot depends on what hardware and database engine you are using, but my guess is that you'd have nothing to worry about with less than ten thousand responses to any one question.

    -PatP

  3. #3
    Join Date
    Feb 2006
    Posts
    5
    I'm on a Linux hosting plan, using MySQL.

    But querying around 10,000 records for one page isn't heavy on the server?
    At what point should I start worrying about its load on the server, 10,000 records? 20,000?

    And if it does become a problem, what are some standard solutions to this problem?

  4. #4
    Join Date
    Apr 2004
    Location
    Europe->Sweden->Stockholm
    Posts
    71
    Quote Originally Posted by bboy_vitasoy
    But querying around 10,000 records for one page isn't heavy on the server?
    At what point should I start worrying about its load on the server, 10,000 records? 20,000?
    Everything of course depends on the data volume, data size, hardware specs and your load. It is probably not a problem to do something like SELECT AVG(something) FROM whatever WHERE some_id=some_value, not even if there are thousands of rows that must be scanned each time the query runs. However, if you have a lot of different IDs (in your example, that would probably be product_id, so it depends on the number of products) and each of them have thousands of ratings, it might become a problem. In such a case, the server might need to read rows from disk often since it cannot cache them all. On the other hand, the problem might not be as big as you think. Consider a situation in which you have 500 products and 25,000 "ratings" for each product. The table simply holds pairs of (product, rating) values. Possibly you want a user id or something in there too, to enforce a maximum of one vote per user and product.

    Okay, let's see:

    The table contains (product, rating, user); product and user are BIGINTs (8 bytes each) and rating is TINYINT (1 byte). 17 bytes per row + some overhead. Say 50 bytes per row including indices. It's very likely that less space is required.

    50 bytes x 500 products x 25,000 ratings per product = 600 MB. (I hope this is a correct answer; please correct me if not!) I think it's very likely that you can perform at least one SELECT AVG... query per second with such a setup and data volume on relatively modern hardware.

    Quote Originally Posted by bboy_vitasoy
    And if it does become a problem, what are some standard solutions to this problem?
    Well, as you suggested: summary tables. You would probably want to (or at least I would!) have triggers for inserts, deletes and updates to make sure that the summary table(s) is kept consistent at all times. If you have a high SELECT/INSERT ratio, summary tables are less costly than reading thousands of rows each time. Instead, the query simply finds a row by primary key and reads the "value" column – in almost no time at all.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •