Page 1 of 2 12 LastLast
Results 1 to 15 of 19
  1. #1
    Join Date
    Jan 2004
    Posts
    7

    database design newbie question

    Disclaimer: I apologize if I have a lack of understanding, as I only have book-learned experience in databasing. With that being said....

    I am on a team to re-design a database. The database contains information about a test plan (testing hardware and software). There is one feature of the current database that I find rather undesirable as far as database design is concerned, and I want to change it. Problem is, I don't know the best way to do so.

    There is currently a table (subtest) which has standard fields such as the subtest description, and subtest id, but it also has fields for the project that the subtest is applicable to. This ends up looking something like so:

    id ------- desc ----- etc ----- program1 ----- program2 -----program3 ............ programX
    001 ----- blah -------- blah -------- yes ------- no ------- yes .......................



    These fields right now keep getting added on with each new incoming program. Eventually, I see this getting out of hand. Additionally, as it is now, whenever a new program field is introduced, someone must manually go in and check off which subtests are "applicable" to that new program.

    I assume it would make more sense (database design wise) to take those program fields out, consider the "program" attribute of the "subtest" entity to be a multivariable attribute, and make a single table containing subtest id, and the applicable programs....like so:

    id program_name
    001 program1
    001 program3



    But with around 5000 subtests, this seems like the above table could get very long, and very difficult to read.


    Does anyone have any thoughts about how this might be best implemented? Any help would be GREATLY appreciated!

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    your design change proposal is correct

    better 5000 rows than 5000 columns

    the sql is a lot simpler, too
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    The 5000 columns option violates 1st normal form. As rudy says, better 5000 rows. It's readable with SQL.
    Last edited by certus; 01-07-04 at 10:43.

  4. #4
    Join Date
    Jan 2004
    Posts
    7

    confusion

    I may have confused you. There would not be 5000 columns. There would be 5000 subtests i.e:

    subtest_id -----
    001
    002
    003
    .
    .
    .
    .
    .
    5000

    Rather, I was saying that there ends up being these "applicable" fields in the table, which grow more and more (right now there is around 9, but that will only grow larger and larger). These fields are shown in my first post by "program1" "program2" etc.

    The problem I see with my solution is that the multivariable attribute table "Programs" would get insanely long. For example, if half of the subtests were applicable to all 9 programs, you would have something like so:

    subtest_id ----- program_name
    001 Program1
    001 Program2
    001 Program3
    001 Program4
    001 Program5
    001 Program6
    001 Program7
    001 Program8
    001 Program9
    002 Program1
    002 Program2
    002 Program3
    002 Program4
    002 Program5
    002 Program6
    002 Program7
    002 Program8
    002 Program9

    And on and on and on. Thus, this table would get very long.

    Is this more clear as to what my problem is?

    Thanks for the quick replys by the way!!! I hope to hear back soon.

    -Joshua Dion

  5. #5
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    it's still a better design

    for example, write the sql to count how many each subtest has

    if you have denormalized the programs "sideways" on the row, then you will have to do something like this:

    select subtest_id
    , case when program1='yes' then 1 else 0 end
    + case when program2='yes' then 1 else 0 end
    + case when program3='yes' then 1 else 0 end
    + ...
    from ...
    group by subtest_id

    whereas with a normalized design, it's

    select subtest_id
    , count(*)
    from ...
    where program='yes'
    group by subtest_id

    furthermore, that will not need maintenance, whereas the "sideways" queries will all need to be modified when you add another program
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  6. #6
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171

    Re: confusion

    So you are looking at maybe 5000 subtests * 10 projects, i.e. 50,000 rows.

    That is still SMALL for a database (unless this is MS Access?). Tables with 100,000,000s of rows are big(gish).

    Don't worry about it!

  7. #7
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Okay, I see where you're going.

    You need three entities:

    Program(ProgramID, ProgramName)

    Subtest(SubtestID, SubtestName)

    ProgramSubtest(ProgramID, SubtestID, IsPassed)

    ProgramID and SubtestID are surrogate keys.
    SubtestName is the name I gave for your numeric identifer for subtest
    IsPassed is the name I gave for your yes/no values.

    You can't escape having 5000 rows somewhere, but you can normalize and have one unique list of Programs, one unique list of Subtests and match Program to Subtest in ProgramSubtest. Your SQL will be much cleaner this way and after populating it the first time with existing data all later additions, modifications and deletions will be much smoother.

  8. #8
    Join Date
    Jan 2004
    Posts
    7

    ok

    First off, thanks to everyone for the quick replys!

    It sounds like from r937's that I am right about the current design being undesirable, and my suggested change might be better conceptually as well as far as querys are concerned.

    I'm going to add in a twist here that I didn't add in before (for simplicity sake).

    Currently, two of the "applicable" fields refer to programs. The other 7 refer to sections. Now, the other fundemental problem I am seeing is that the sections listed in these columns are ALSO table values in the table "sections". To show you an example:

    TABLE SECTIONS:
    section_id ----- section_name
    001 ------------ sec1
    002 ------------ sec2
    003 ------------ sec3

    TABLE SUBTESTS
    subtest_id ----- subtest desc ----- etc ----- program1 ----- program2 ----- sec1 ----- sec2 ----- sec3
    001 ------------- blah -------------- blah ---- yes ------------ no ------------- no -------- yes ------ yes


    This cannot be "correct" design either, can it? I see no point in having a sections table when the sections are used as attribute fields like this. Am I missing something?

    Thanks once again!

    -JD

  9. #9
    Join Date
    Jan 2004
    Posts
    7

    Talking additionally....

    I don't know if it makes a difference, but "yes/no" only refer to if the subtest is applicable to a given section or a given program (not if the test has passed/failed).

    details....

    -JD
    Last edited by mercinary; 01-07-04 at 11:13.

  10. #10
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Program(ProgramID, ProgramName)

    ProgramSection(ProgramSectionID, ProgramID, ProgramSectionName)

    Subtest(SubtestID, SubtestName)

    ProgramSectionSubtest(ProgramSectionID, SubtestID, IsApplicable)

    In this case all Programs will have at least one section.

  11. #11
    Join Date
    Jan 2004
    Posts
    7

    Thanks...

    Cetrus,

    Thanks for the info. That might work, although it is a bit off from what I had in mind.

    Does anyone else have any thoughts?

    My last little gotcha to add into this mess is that we would like to add yet another applicability entity that handles platforms. To make things a little clearer, hear is a look at how the hierarchy looks:

    *Multiple Programs (program1, program2) These are added quarterly.

    *Within each program, there are mutiple sections of testing (section1, section2, section3)

    *Below sections are the Tests (I haven't discussed the tests table because I feel that I have that part worked out.....basically ignore this for now).

    *Below the test table, we have subtests (the heart of everything). When a new program starts getting tested, currently a new field is added to the subtest table for that program. As previously mentioned, there are also fields for sections, but these sections rarely (if ever) change, so we can pretty much forget about worrying about new sections being added.

    *A subtest can be tested against any number of platforms that are introduced in a program. We don't have anything implemented for this yet. I think this should somehow be similar to what we discussed above as far as applicability with programs. (such as another table containing subtest_id & platform_name)


    I hope I'm not losing everyone along the way. I was trying to clarify, but I probably made things more confusing then ever.

    I'll say it again, I REALLY appreciate all the help I am getting, as I want to design this database well. Help from the experts is what I need most!

    -JD
    Last edited by mercinary; 01-07-04 at 12:05.

  12. #12
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Explain platforms further.

  13. #13
    Join Date
    Jan 2004
    Posts
    7

    platforms

    Platforms are various model numbers that software is designed to run on. Each project has new model numbers (new hardware) and new software that runs on each of those. The software and hardware are slightly different accross platforms. Because of this, a subtest might be applicable to one platform but not to another.

    Follow?

    -JD

  14. #14
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    This is a reach.

    Program(ProgramID, ProgramName)

    ProgramSection(ProgramSectionID, ProgramID, ProgramSectionName)

    Subtest(SubtestID, SubtestName)

    Platform(PlatformID, PlatformName)

    ProgramSectionSubtest(ProgramSectionID, SubtestID, PlatformID, IsApplicable)

    I used PlatformName to mean Model Number

  15. #15
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    This is beginning to become larger that a forum of this nature provides. Each time I give a solution I am fed new entities and new rules. I don't appreciate it.

    mer·ce·nar·y ( P ) Pronunciation Key (műrs-nr)
    adj.
    Motivated solely by a desire for monetary or material gain.

    That's not what motivates the people in this forum. Earn your own bread.
    Last edited by certus; 01-07-04 at 13:43.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •