If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > database design newbie question

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 01-07-04, 08:35
mercinary mercinary is offline
Registered User
 
Join Date: Jan 2004
Posts: 7
database design newbie question

Disclaimer: I apologize if I have a lack of understanding, as I only have book-learned experience in databasing. With that being said....

I am on a team to re-design a database. The database contains information about a test plan (testing hardware and software). There is one feature of the current database that I find rather undesirable as far as database design is concerned, and I want to change it. Problem is, I don't know the best way to do so.

There is currently a table (subtest) which has standard fields such as the subtest description, and subtest id, but it also has fields for the project that the subtest is applicable to. This ends up looking something like so:

id ------- desc ----- etc ----- program1 ----- program2 -----program3 ............ programX
001 ----- blah -------- blah -------- yes ------- no ------- yes .......................



These fields right now keep getting added on with each new incoming program. Eventually, I see this getting out of hand. Additionally, as it is now, whenever a new program field is introduced, someone must manually go in and check off which subtests are "applicable" to that new program.

I assume it would make more sense (database design wise) to take those program fields out, consider the "program" attribute of the "subtest" entity to be a multivariable attribute, and make a single table containing subtest id, and the applicable programs....like so:

id program_name
001 program1
001 program3



But with around 5000 subtests, this seems like the above table could get very long, and very difficult to read.


Does anyone have any thoughts about how this might be best implemented? Any help would be GREATLY appreciated!
Reply With Quote
  #2 (permalink)  
Old 01-07-04, 09:17
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
your design change proposal is correct

better 5000 rows than 5000 columns

the sql is a lot simpler, too
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 01-07-04, 09:34
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
The 5000 columns option violates 1st normal form. As rudy says, better 5000 rows. It's readable with SQL.
__________________
visit: relationary

Last edited by certus; 01-07-04 at 09:43.
Reply With Quote
  #4 (permalink)  
Old 01-07-04, 09:41
mercinary mercinary is offline
Registered User
 
Join Date: Jan 2004
Posts: 7
confusion

I may have confused you. There would not be 5000 columns. There would be 5000 subtests i.e:

subtest_id -----
001
002
003
.
.
.
.
.
5000

Rather, I was saying that there ends up being these "applicable" fields in the table, which grow more and more (right now there is around 9, but that will only grow larger and larger). These fields are shown in my first post by "program1" "program2" etc.

The problem I see with my solution is that the multivariable attribute table "Programs" would get insanely long. For example, if half of the subtests were applicable to all 9 programs, you would have something like so:

subtest_id ----- program_name
001 Program1
001 Program2
001 Program3
001 Program4
001 Program5
001 Program6
001 Program7
001 Program8
001 Program9
002 Program1
002 Program2
002 Program3
002 Program4
002 Program5
002 Program6
002 Program7
002 Program8
002 Program9

And on and on and on. Thus, this table would get very long.

Is this more clear as to what my problem is?

Thanks for the quick replys by the way!!! I hope to hear back soon.

-Joshua Dion
Reply With Quote
  #5 (permalink)  
Old 01-07-04, 09:49
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
it's still a better design

for example, write the sql to count how many each subtest has

if you have denormalized the programs "sideways" on the row, then you will have to do something like this:

select subtest_id
, case when program1='yes' then 1 else 0 end
+ case when program2='yes' then 1 else 0 end
+ case when program3='yes' then 1 else 0 end
+ ...
from ...
group by subtest_id

whereas with a normalized design, it's

select subtest_id
, count(*)
from ...
where program='yes'
group by subtest_id

furthermore, that will not need maintenance, whereas the "sideways" queries will all need to be modified when you add another program
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #6 (permalink)  
Old 01-07-04, 09:54
andrewst andrewst is offline
Moderator.
 
Join Date: Sep 2002
Location: UK
Posts: 5,171
Re: confusion

So you are looking at maybe 5000 subtests * 10 projects, i.e. 50,000 rows.

That is still SMALL for a database (unless this is MS Access?). Tables with 100,000,000s of rows are big(gish).

Don't worry about it!
__________________
Tony Andrews
http://tinyurl.com/tonyandrews
Reply With Quote
  #7 (permalink)  
Old 01-07-04, 10:05
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
Okay, I see where you're going.

You need three entities:

Program(ProgramID, ProgramName)

Subtest(SubtestID, SubtestName)

ProgramSubtest(ProgramID, SubtestID, IsPassed)

ProgramID and SubtestID are surrogate keys.
SubtestName is the name I gave for your numeric identifer for subtest
IsPassed is the name I gave for your yes/no values.

You can't escape having 5000 rows somewhere, but you can normalize and have one unique list of Programs, one unique list of Subtests and match Program to Subtest in ProgramSubtest. Your SQL will be much cleaner this way and after populating it the first time with existing data all later additions, modifications and deletions will be much smoother.
__________________
visit: relationary
Reply With Quote
  #8 (permalink)  
Old 01-07-04, 10:09
mercinary mercinary is offline
Registered User
 
Join Date: Jan 2004
Posts: 7
ok

First off, thanks to everyone for the quick replys!

It sounds like from r937's that I am right about the current design being undesirable, and my suggested change might be better conceptually as well as far as querys are concerned.

I'm going to add in a twist here that I didn't add in before (for simplicity sake).

Currently, two of the "applicable" fields refer to programs. The other 7 refer to sections. Now, the other fundemental problem I am seeing is that the sections listed in these columns are ALSO table values in the table "sections". To show you an example:

TABLE SECTIONS:
section_id ----- section_name
001 ------------ sec1
002 ------------ sec2
003 ------------ sec3

TABLE SUBTESTS
subtest_id ----- subtest desc ----- etc ----- program1 ----- program2 ----- sec1 ----- sec2 ----- sec3
001 ------------- blah -------------- blah ---- yes ------------ no ------------- no -------- yes ------ yes


This cannot be "correct" design either, can it? I see no point in having a sections table when the sections are used as attribute fields like this. Am I missing something?

Thanks once again!

-JD
Reply With Quote
  #9 (permalink)  
Old 01-07-04, 10:11
mercinary mercinary is offline
Registered User
 
Join Date: Jan 2004
Posts: 7
Talking additionally....

I don't know if it makes a difference, but "yes/no" only refer to if the subtest is applicable to a given section or a given program (not if the test has passed/failed).

details....

-JD

Last edited by mercinary; 01-07-04 at 10:13.
Reply With Quote
  #10 (permalink)  
Old 01-07-04, 10:23
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
Program(ProgramID, ProgramName)

ProgramSection(ProgramSectionID, ProgramID, ProgramSectionName)

Subtest(SubtestID, SubtestName)

ProgramSectionSubtest(ProgramSectionID, SubtestID, IsApplicable)

In this case all Programs will have at least one section.
__________________
visit: relationary
Reply With Quote
  #11 (permalink)  
Old 01-07-04, 10:43
mercinary mercinary is offline
Registered User
 
Join Date: Jan 2004
Posts: 7
Thanks...

Cetrus,

Thanks for the info. That might work, although it is a bit off from what I had in mind.

Does anyone else have any thoughts?

My last little gotcha to add into this mess is that we would like to add yet another applicability entity that handles platforms. To make things a little clearer, hear is a look at how the hierarchy looks:

*Multiple Programs (program1, program2) These are added quarterly.

*Within each program, there are mutiple sections of testing (section1, section2, section3)

*Below sections are the Tests (I haven't discussed the tests table because I feel that I have that part worked out.....basically ignore this for now).

*Below the test table, we have subtests (the heart of everything). When a new program starts getting tested, currently a new field is added to the subtest table for that program. As previously mentioned, there are also fields for sections, but these sections rarely (if ever) change, so we can pretty much forget about worrying about new sections being added.

*A subtest can be tested against any number of platforms that are introduced in a program. We don't have anything implemented for this yet. I think this should somehow be similar to what we discussed above as far as applicability with programs. (such as another table containing subtest_id & platform_name)


I hope I'm not losing everyone along the way. I was trying to clarify, but I probably made things more confusing then ever.

I'll say it again, I REALLY appreciate all the help I am getting, as I want to design this database well. Help from the experts is what I need most!

-JD

Last edited by mercinary; 01-07-04 at 11:05.
Reply With Quote
  #12 (permalink)  
Old 01-07-04, 11:29
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
Explain platforms further.
__________________
visit: relationary
Reply With Quote
  #13 (permalink)  
Old 01-07-04, 11:33
mercinary mercinary is offline
Registered User
 
Join Date: Jan 2004
Posts: 7
platforms

Platforms are various model numbers that software is designed to run on. Each project has new model numbers (new hardware) and new software that runs on each of those. The software and hardware are slightly different accross platforms. Because of this, a subtest might be applicable to one platform but not to another.

Follow?

-JD
Reply With Quote
  #14 (permalink)  
Old 01-07-04, 12:27
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
This is a reach.

Program(ProgramID, ProgramName)

ProgramSection(ProgramSectionID, ProgramID, ProgramSectionName)

Subtest(SubtestID, SubtestName)

Platform(PlatformID, PlatformName)

ProgramSectionSubtest(ProgramSectionID, SubtestID, PlatformID, IsApplicable)

I used PlatformName to mean Model Number
__________________
visit: relationary
Reply With Quote
  #15 (permalink)  
Old 01-07-04, 12:41
certus certus is offline
Registered User
 
Join Date: Dec 2003
Location: Canada
Posts: 710
This is beginning to become larger that a forum of this nature provides. Each time I give a solution I am fed new entities and new rules. I don't appreciate it.

mer·ce·nar·y ( P ) Pronunciation Key (mūrs-nr)
adj.
Motivated solely by a desire for monetary or material gain.

That's not what motivates the people in this forum. Earn your own bread.
__________________
visit: relationary

Last edited by certus; 01-07-04 at 12:43.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On