If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > MySQL > db design for many column

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-03-03, 09:51
punchard_don punchard_don is offline
Registered User
 
Join Date: Nov 2003
Posts: 2
db design for many column

Greetings all. I've just joined this forum, and here I am already asking questions.

I have a row-column text-file to convert to a web-searchable database. It looks like this:

user-1 user-record-1 col1 col2 col3... col50
user-1 user-record-2 col1 col2 col3... col50
...
user-1 user-record-500 col1 col2 col3... col50
user-2 user-record-1 col1 col2 col3... col50
user-2 user-record-2 col1 col2 col3... col50
...
user-2 user-record-500 col1 col2 col3... col50

There are 20,000 users. Each user has 500 user-records of 50-columns. Each column may contain a single digit variable from 1 to 9, or be empty.

As-is, that makes:

- 500x50x10=25,000 possible unique user-record/column/variable entries per user. Another table lists a unique 5-digit code for each of these possible combinations, but is not necessarily used. The number of filled-in columns generally ranges from 5,000 to 15,000.

- 20,000x500=10-million-records x 50-column table of single-digit or blank entries.

90% of queries will be of the form

SELECT COUNT(*) WHERE user-record = 'y' AND column-a = 'b'

where a, x, b are specified by the user through an html form.

My concern is optimized database architecture. I see four possibilities

1. An as-is copy.

2. Putting the 500-user-records x 50-columns per user into one single record per user of 2,500 columns, each of which may contain a 1-9 digit or be blank. Result is a 20,000 x 2,500 table containing 1-9.

3. Using the 25,000 unique combination identifier codes to create one column per possible combination and just enter "1" in corresponding columns. Result is a 20,000 x 25,000 table containing blank or "1".

4. A single "column codes" column per record, filled with a list of the 5-digit codes of all columns that were filled-in. Result is a 20,000 x 2 (primary key + code list) table containing lists of up to ~15,000 5-digit codes.

Not asking people to give me the answer. Would just like a quantitative method to determine the optimal choice, or at least an idea of the critical factors. Thanks, all.
Reply With Quote
  #2 (permalink)  
Old 11-03-03, 11:28
bstjean bstjean is offline
Registered User
 
Join Date: Sep 2002
Location: Montreal, Canada
Posts: 219
Re: db design for many column

Quote:
Originally posted by punchard_don
My concern is optimized database architecture. I see four possibilities

1. An as-is copy.

2. Putting the 500-user-records x 50-columns per user into one single record per user of 2,500 columns, each of which may contain a 1-9 digit or be blank. Result is a 20,000 x 2,500 table containing 1-9.

3. Using the 25,000 unique combination identifier codes to create one column per possible combination and just enter "1" in corresponding columns. Result is a 20,000 x 25,000 table containing blank or "1".

4. A single "column codes" column per record, filled with a list of the 5-digit codes of all columns that were filled-in. Result is a 20,000 x 2 (primary key + code list) table containing lists of up to ~15,000 5-digit codes.

Not asking people to give me the answer. Would just like a quantitative method to determine the optimal choice, or at least an idea of the critical factors. Thanks, all.
Perhaps if you would explain what this database is trying to represent we could help you more. I really doubt this flat table (with multiple records for the same entity) is optimal at all. Besides, a table with 50 columns is often a flag for poor normalization.

At first glance, you could have a separate table for those 50 columns with a FK pointing to the userid. But as I said, if you could explain what you're trying to do and send us the DESCRIBE for the tables we'd be able understand your problem better and provide more help
Reply With Quote
  #3 (permalink)  
Old 11-03-03, 13:54
punchard_don punchard_don is offline
Registered User
 
Join Date: Nov 2003
Posts: 2
I'm afraid there is no DESCRIBE statement yet because no tables exist yet - just the original flat-file.

How else to describe it.....

Imagine this: The database holds students' school exam results. By the end of their scholastic career, each person has taken 500 exams. Every exam had 50 multiple-choice questions numbered 1 to 9. We have records for 20,000 students of the form:

student1, exam1, answer1, answer2,..., answer50
student1, exam2, answer1, answer2,..., answer50
...
student1, exam500, answer1, answer2,..., answer50

student2, exam1, answer1, answer2,..., answer50
student2, exam2, answer1, answer2,..., answer50
...
student2, exam500, answer1, answer2,..., answer50

Now we want to compile statistics on who answered what. What percentage of total students who got question x in exam y right, also got question b of exam c right? In other words

SELECT COUNT(*) WHERE response-to-queston-x-on-exam-z=y AND response-to-queston-b-on-exam-d=c

and we want to do this over the web letting researchers specify the questions and answers they query on.

We could:

1. Design the database exactly as the flat-file and take the intersection of

SELECT COUNT(*) WHERE (exam=z AND answeri=y)
with
SELECT COUNT(*) WHERE (exam=d AND answerj=c)

2. Append all the records for a given student to make one 2,500 column record per student and search

SELECT COUNT(*) WHERE exami_answerj=y AND examb_answerc=d

3. Give each of the possible 25,000 answers to each multiple-choice question a unique 5-digit identifier and its own column, then just signify if that answer was given by a "1" in its column, and search

SELECT COUNT(*) WHERE ansxxxxx AND ansyyyyy

4. Your suggestion, make a separate table for each possible answer with studentID as the key, then take interstection of searches.

5. (4) makes me think of doig the same but only with question numbers, leaving the multiple-choice answer in the 2,500 tables and searching for the desired answer

But as I said, I'm hoping for some method or analysis by which to make/justify the choice. "I think XXX would be best" won't impress my boss much. Is there a way to choose - preferably quantitatively? Or does one just have to build, benchmark, and repeat?

Thanks again.
Reply With Quote
  #4 (permalink)  
Old 11-03-03, 17:44
aus aus is offline
Registered User
 
Join Date: Oct 2003
Location: Denver, Colorado
Posts: 137
I think that an answer table that looks like this would be good:

CREATE TABLE answers (
userid INT NOT NULL,
examnum INT NOT NULL,
answernum INT NOT NULL,
value CHAR(1) NOT NULL
);
ALTER TABLE answers ADD INDEX useridx(userid);
ALTER TABLE answers ADD INDEX examidx(examnum);
ALTER TABLE answers ADD INDEX ansidx(answernum);
ALTER TABLE answers ADD INDEX validx(value);

This way your SQL statemets would be simple and you would avoid the excessive number of columns. This will give you a normalized database and better performance. Your queries will look like this:

SELECT COUNT(*) FROM answers
WHERE examnum = 1 AND answernum = 2 AND value = 'b';

Having an index on each of the columns will allow very fast lookups. It would be unreasonable to create indices on each of the answer columns in the many-column table.

Getting your data into that structure from the flat-file will take a little bit of work. I would import it into the database into a table that looks exactly like the flat-file, then you can issue a series of fifty queries that look like this:

INSERT INTO answers
SELECT userid, examnum, %i, answer%i FROM tmpanswers;

Quote:
Originally posted by punchard_don
I'm afraid there is no DESCRIBE statement yet because no tables exist yet - just the original flat-file.

How else to describe it.....
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On