What I have:
3 surveys
Australia: 170 questions 125 completed surveys
Canada: 158 questions 150 completed surveys
USA: 183 questions 95 completed surveys
All answers are a rating 1 - 5 with the execption of 4 which includes (6)Other with a text field for the respondent to Specify
75% of the questions appear in all 3 surveys the other 25% are unique to the country.
I can simply lay it out as 3 "spread sheets" where there is 1 question for 1 field, however I have been told it would be better using a relational table layout.
I have many ideas of how to lay it out, but I think this is best that I can come up with, any opinions or better ideas would be greatly appreciated
Idea #1
CREATE TABLE respondents(r_RowID PRIMARY INTEGER KEY, id VARCHAR)
CREATE TABLE questions(q_RowID PRIMARY INTEGER KEY, question_number VARCHAR)
where the question number also identifies the survey i.e. 'au_001' or 'ca_001'
CREATE TABLE answers(a_RowID PRIMARY INTEGER KEY, answer INTEGER, other VARCHAR)
which would have 1 - 5 as data for the first 5 entries then 6, 6, 'Grass' - 7, 6, 'Weeds' and so on for all those that answered (6)Other and specified
CREATE TABLE relationship(r_RowID INTEGER, q_RowID INTEGER, a_RowID INTEGER)
Idea #1a (meld the questions, and split out the countries)
CREATE TABLE country(c_RowID PRIMARY INTEGER KEY, name VARCHAR)
and now the questions that are repeated in all 3 surveys can be added only once and the ones that are unique to a country can be added at the end
CREATE TABLE questions (q_RowID PRIMARY INTEGER KEY, question_number INTEGER)
then the relationship table would be
CREATE TABLE relationship(r_RowID INTEGER, Q_RowID INTEGER, a_RowID INTEGER, c_RowID INTEGER)
---------------------
I will need to do a lot of statistical analasys on all the data, individual countries, across countries and things like - find the average response for question #100 from people from Australia and USA who answered '4' to question #3 and '2' to question #57, but didn't answer '1' to question #65
Thank you for your input.