I hope I can get a bit of advice regarding database design. I'm a complete amateur at this, but being the only one with some skills, it's fallen to me to do the implementation. Hopefully I can explain my question in an organized fashion:
Information to track
There are five (though there will soon be more) surveys. Each survey has some slight variation. In short, the survey questions can be divided into three categories:
Biographical information (name, address, etc.)
Survey questions (e.g. "do you like filling out surveys?", and "do you own any pets?")
Item checkboxes (e.g. "Which of the following pets do you own Cat__ Dog__ Llama __ Platypus __)
The biographical information fields will never change and are identical across surveys.
The survey questions may change in the future, but right now half are all identical across all surveys and half vary between surveys, though there is some overlap.
The item checkboxes will definitely change over time. There is some overlap between surveys (i.e. "cat" appears on all surveys at the moment, but some have "cat" and "dog" while others have "cat" and "llama").
How it's set up now
There is a main table to collect the common biographical information, and a sub-table for each individual survey.
tbl_main has all biographical fields and all questions that are identical on all surveys
tbl_survey1 ... 2 ... 3 ... N have fields for those questions that vary and for the checkboxes of the respective survey. There is also a foreign key that relates the record to tbl_main.
This seems somehow inelegant, but I'm not quite proficient enough to say why. It feels wrong to have several fields repeating across sub-tables. What if I want a query that, for example, pulls all people in California who don't like surveys and have cats or llamas? I think the problems with that are somewhat self evident: I'd need to know/remember the fields of all tables in order to know which to include in the query.
But I'm not sure what the best way to redesign the structure would be (hence, why I'm posting here).
I'm leaning towards just two tables: One table with the unchanging, permanent biographical information in it. Then a second table with a field for each question, regardless of whether the question repeats across surveys. However... wouldn't doing it this way mean that over time, the second table will grow and grow and grow in the number of fields?
Or should each question get its own table? That is, a tbl_cats with the only field being the primary key of tbl_main to relate it to a responder. Each one-field table would get populated by a responder-ID. However... wouldn't that mean I'll eventually end up with hundreds of tables? Some complicated queries might be a bit, well, extra complicated, no?
So... are any of the above methods the "proper" way to design this structure? Is there another, better way I should go about this?
Oh, if it makes a difference, this database is built with PHP and MYSQL, interacting through the web.
Thanks for any help you may have—even if it's just an encouraging word!
I think you'll probably also need to look at Survey identification. A table which holds data about surveys at the survey level. An ID. A Description. Author(s). Date. Etc.
Then, you can define a 1-to-many relationship between surveys and biographical record.
a many-to-many relationship between surveys and questions
and, probably a one-to-many relationship between biographical record and responses
Ref the database tutorials here for some general info on RI and normalization.
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert
As far as storing survey identification information, I certainly don't want to seem obtuse but I can't recognize how such a table would fit into eventual queries.
At the moment, there is a "source" field in the main table, and it is part of the immutable biographical information. The code for source is pregnant with information—that is, each letter and digit that makes up the code has meaning. If we ever need to look at the information by source (or sources), that field will be the beginning of a WHERE clause.