I am currently in the process of (re)designing an application for a client, part of it is a survey system.
Fairly basic Question/Answer type details, but there are a number of issues I am running into.

1) While all the questions are multiple choice (this was a challenge to get them to agree to), the survey needs to allow for multiple responses to the same question (i.e tick all that apply).
2) The size of the member base (the people taking the survey) is rather large (about 800k and growing).
3) The questions are (somewhat) dynamic (i.e, an administrative user can add/change questions)

I am having trouble consolidating in an efficient manner the
difficulties arising from these few requirements.
My ideas so far have been:

The basic question/answer is fairly straight forward (but I am still open to suggestion):

Question
--------
id integer
text varchar

Answer
------
idea integer
question_id integer
text varchar

But I have a few different ideas on how to store the responses:

a) One Row per user:

user_id - integer (references user table)
question1 - integer (reference answer_id)
question2 - integer (reference answer_id)
...
etc
(one columns for each question)

The 2 problems I see with this design are:
- It means the admin user adding questions is changing the database schema
- Handling multiple answers per question isn't easy.

As a solution to the second part, I have considered making the id's of the answers binary numbers (or bits) (and _not_ unique across all question) and using a bitfield to store the response. I think some binary arithmetic could then be used to get the valid responses but, I don't know how efficient this would be for searcing the database, I don't think it could be indexed.

b) One Row Per User Per Question:

This is the (approx) current way the database is currently designed, and while this is the way I would have originally done it (as there is a many-many relationship), seeing the speed problems with the current system has made me think there might be a better way.

user_id - integer (references user table)
question_id - integer (references question_id) (probably redundant, if answer id is unique)
answer_id - integer (references answer_id)

The problem with this is primarily, as I mentioned, speed.

This part of the system is much more read intensive than write/update intensive, most of these records are only updated 3/4 times a year (per user, when the survey is re-issued) and is only inserted when a new member joins (about 200-300 a day).

But the queries used to read this data are significantly complex (they were taking 10-30 _mins_ on our old hardware with the current system, new hardware helped alot, 10-60 secs)
For example they might say, give me all members who answered 'Yes' to question1 and 'Three or more' to question 3 (but the criteria usually consists of 5-6 parameters)
Given this structure, each criteria requires a self-join on the response table (with some of the more complex queries, I have seen 300+ lines of SQL).


(Just some quick stats that might give you some idea about usage. Its currently running on MySQL)

Uptime: 50 days 2 hours 40 min 42 sec

> show status like 'key%requests';
Key_read_requests 64758388117
Key_write_requests 80976783

Members joined today: 263
There are 130 questions in the system averaging about 3 answers each.


So my question is (after all that), which of these two approaches do you think would be better in this system, or is there another method I am missing?