I'm designing a database for my new project which is a monitoring system. For the last 2 weeks I have been trying to figure out what is the best way to set up the database. I have designed a couple small databases before so I know the basics but have no experience with large databases.
How the monitoring system will work, is that there will be a client application on servers that runs checks every 5 minutes, and sends the data over HTTP to a PHP site using POST, The website than reads the data and inserts/updates the database accordingly. The amount of checks sent in one HTTP request can vary between 30 to 100(possibly more) per server. Now at the start there will be about 200 server that will be running these checks therefore it is going to be about 10000 checks being sent every over the course of 5 minutes. Soon after the program has been designed, the service will be sold to other companies therefore possibly the amount of servers sending checks is going to increase to over a 1000, and possibly even over 10,000(if it does get that high I will think of having 3 databases each running a different number of the servers such as, DB0 Company/Server information DB1 Check for ServerID 1-2000, DB2 Checks for ServerID 2001-4000 etc).
Also the logging for checks is to be available if the client is willing to pay for it. Therefore this could be from no logging up to a month's worth of logging.
The database I'm using is MySQL. And so far what I have come up with is the following:
ID - tinyint(4) - PK
Name - varchar(50)
Description - text
Type - enum('247','Daily')
Table - Checks
ID - int(11) - PK
CheckNum - tinyint(4) // Used to distinguish between the different checks of the same checktype
ServerID - mediumint(9) //This is ID of the server which the check was run
CheckTypeID - tinyint(4) // This is the ID of the type of check
Num - smallint(6) //This is used for logging purposes
Date - datetime // The date and time that the check was run
Pass - smallint(1) // This is a bool value whether the check has passed or not
IP - varchar(16) // the IP of the server that it was received from
Table - CheckInfo
CheckID - int(11) - PK // ID of the check to associate this info with
InfoValue - tinyint(2) - PK //ID of the check info description
Value - text // The actual value returned by the check
Table - CheckInfoDescriptions
CheckType - tinyint(4) - PK // ID of the check type to crate description for
InfoValue - tinyint(2) - PK // ID of the check info description
Description - varchar(100) // Description text
My main problem is regarding how the checks and the check info are recorded.
The reason the CheckInfo table is there is because the check can have a number of settings. For example, the ping check will have 2 values Destination and Response. The 'drive space change check' has Total space, Free space and Change(the change since the last time it was run).
Is the way I designed the tables correct? I have the following design ideas in mind
Option 1 - The way the current database is structured, and run a
WHERE CheckNum = [value]
ORDER BY Date DESC
and then use ID returned to update the Checks and CheckInfo tables.
I have a feeling this would be slow with the number of checks being made every 5 minutes.
Option 2 - Have a separate table for each check type, eg Table - Ping, Table - DriveSpaceChange, etc. As far as I understand, creating multiple tables should be avoided?
Option 3 - Have one table for checks, and all the information about each check stored in 1 string of format InfoValue=Value&InfoValue2=Value2, and than doing all the work on the web server. Would this take a big amount of load from the database server? Or is this not a suggested way of doing things? I have never seen anyone do something like this possibly because it is not recommended? An example would be the following row in Checks table,
ID=1,CheckNum=1,ServerID=1,CheckTypeID=1,Num=0,Dat e="2008-12-25 12:56:00", Pass=1,IP="10.10.10.4", Info="Destination=192.168.1.2&Response=Reply From 192.168.1.2 in 0ms"
As you probably noticed I'm not experienced at all in the database design area, but ANY help will be appreciated.
EDIT: Corrected a couple spelling mistakes, and added option numbers for easier referrals
Now at the start there will be about 200 server that will be running these checks therefore it is going to be about 10000 checks being sent every over the course of 5 minutes. Soon after the program has been designed, the service will be sold to other companies therefore possibly the amount of servers sending checks is going to increase to over a 1000, and possibly even over 10,000
So what is it? 2,000 per minute? 2 million? You've got to decide what you're designing for.
Is the way I designed the tables correct?
You're using MySQL. So you pretty much threw correctness out the window.
Now, will it be fast enough to handle the number of transactions you're trying to throw at it?
I don't know. I'd say write yourself a test script for each of the different formats. See the max load you can get with each. Whichever one is fastest, go with that. Also look at how big your tables get. Hang on to that test suite, because someone (maybe even a client) is going to come along and say "I read on a forum that the squirrel-nut-tree design is waaay faster." They can't argue with empirical data.
My intuition is that with MySQL and what you're trying to do, I'd jam it all in one big table; just put all the info types in nullable columns. That's just the way that particular package works. Other DBMS's do a better job at joining and managing multiple tables. Your schema is an EAV model and that model is probably the worst way to do it. (No offense, everyone tries an EAV at some point.)