I have tried to search for a solution but I'm not quite sure how to formulate my search query to get the right response so I'm asking a question now instead. Hopefully someone can either answer or point me in the right direction (article or useful search terms =)
I have some experience in web development (PHP, ASP, etc.) and I've been using MySQL for quite some time for databases with primitive information.
Now I'm working on a game (AstroFighter.net) and I plan to record a high level of details for players, such as time played, use of weapons (broken down to each weapon), kills, deaths, kill by weapon type, powerups collected... well, you get the point =)
I feel it is important that I design the database properly and the most stressing decision at the moment (I think) is that I need to figure out how to treat when a game server uploads the match results. The server will post a form to the PHP page that will extract all relevant players and game statistical figures and insert it into the database.
I will also have clients posting selected match data to the web server to validate the match result reported by the game server and to increase the trustworthyness of the data in a step towards preventing cheating.
Since clients will assist to validate the data I need to store the entire match result for some time... and now finally to my question. I have three scenarios in mind:
I keep a simple match table with fields that identify the match that took place (e.g. timestamp, host IP, session key etc.)
I then create a statistical database with very few fields, e.g. "id", "match_id", "player_id", "stat_id", "count".
The match_id refers to the match that took place. The player_id refers to the player a particular entry belongs to. The stat_id refers to what type of statistics it is (e.g. deaths, kills, laser shots fired, laser shots hit, etc.) and the counter contains the actual number.
This would mean, that if I collect stats from 32 players, and I look at about 300 statistics for each player - it would be 9600 rows added to the table for every match... that sound like a lot? Here is where my inexperience comes in. Is that an OK number of records with it being such a light weight table - only integers with foreign keys to ids.
I can calculate the size fairly easy (I think) but not necessarily what the impact would be to perform so many cross references to match_id, player_id, stat_id for each record when I want to transfer the stats to each player in the end.
Just one single match table containing varchar(255) fields for each statistic. This would be a nightmare to maintain because I have to create a database field for each type of statistic (about 300).
The idea would be to store a serialized array in the field for statistics, e.g.: kills: 12;43;12;55;12;5... where I also record which played ID the index of the array belongs to.
This would create only one table, but it would be lots of table entries and lots of unused varchar(255) fields.
I don't like this idea to be honst... tell me it is terrible please =)
Is there "proper" way to treat statistical information such as this that I am totally missing? =)
I hope the above makes sense - and sorry for the lengthy post. It's difficult to explain without a fair amount of text.
Your 1st design is a variation of EAV (entity-attribute-value). Though it seems like a generalization, it's really putting unrelated numbers together in a table. Also, the volume of records you're expecting makes it an inefficient design. I strongly recommend AGAINST this design.
Your 2nd design has good and bad aspects. The good part is that the distinct measurements of matches will have their own fields, so you can easily query and aggregate over them. I know you don't want to define 300 fields, but this is what SQL databases are optimized for.
The bad part is the varchar(255), and your serialized array. There's no reason for a varchar, the value of a statistic is in its numerical value. If you're going to distinguish the fields, make them integers or longs where you can (i.e. wherever you have fixed precision), singles or doubles where you have to.
However, rather than one table, I would look for related statistics, and try to make a moderate number of tables with a few fields each. For example, laser shots fired, laser shots hit, laser kills, are all related. Death by various means might be another table, and so on.
The serialized array should be properly stored as a link table, i.e. one with two columns, killer and victim, which is also the primary key.
Ideally, you should profile various designs under actual use conditions, ask it to explain queries, etc, in order to determine where to index and optimize. In MySQL, try the different storage engines. MyISAM is pretty efficient at insertions and table scans, you could use it as an input engine, and process it are regular intervals to maintain an InnoDB table for querying purposes. This could mean submissions may not reflect immediately, but do you really need them to?
For efficiency, you may need to duplicate some information, e.g. you may need to store totals in addition to events. If so, use transactions and test it properly under actual use conditions.