Page 1 of 2 12 LastLast
Results 1 to 15 of 27
  1. #1
    Join Date
    May 2009
    Posts
    10

    Question To first Normal Form or Not?

    Hi all!

    Here's a question for those wiser in the database design than I...is there ever justification in bending the rules with regards to First Normal Form?
    Here's the story that led me to wondering about this...

    THE SITUATION
    I'm designing PHP/MySQL application to help those who look after people with cancer.
    The aim is to collect information about what they needed care for. The hope is to build up information over time to inform decisions to improve their care.

    FORM ENTRY
    I used one of those "code generators" to build the a PHP form to enter the information required via listboxes & checkboxes. This is the (truncated) database it spat out:

    Table:admission_details
    --------------------------------------
    #idadmission_details (primary key)
    +patient_dob (date)
    +idconsultant (foreign key to the consultant table)
    +date_admission (date)
    +date_discharge (date)
    +site_appendix (TINYINT)
    +site_caecum (TINYINT)
    +site_ascending_colon (TINYINT)
    +site_hepatic_flexure (TINYINT)
    ....etc...

    THE IDEA
    So you can see... aside from having a primary key field inthe 1st field, fields 3-4 collects some other basic data.
    From Field 6 onwards, we're collecting information on "Where the cancer was" (via the site_* fields). Specifically, at the "PHP front end", these fields are checkboxes. If there was a cancer in say the appendix, it would be stored as "1" in the database. If not, the field will be NULL.
    You therefore can have multiple fields with "1" if you have multiple sites with cancer. If you have no sites at all with cancer, all site_* fields will have NULL in them.

    THE PROBLEM
    The problem I can see in the database design is the site_* recurring fields - this is in voilation of The First Normal Form right? Thing is, if I try to separate this out into a separate "Site" table, it breaks the nice single entry form you get with the schema described above.

    THE SOLUTION?
    There's only so many bits to the human body. So whilst the site_* fields do recur, there won't be any more (or not much) additional site_* columns required in the database design. That being the case, whilst it isn't strictly in the first normal form, would this database still be acceptable in terms of efficiency of operation ie: search & retrieval? Or if I stick with this, will I be asking for trouble when MySQL trolls through thousands of those records querying people who had site_appendix = 1 and site_caecum IS NULL etc?

    Looking forward to hearing your insight into this.

    Thanks!
    Kev

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Normalise to first normal form.
    Imagine the query to "Show all the people with cancer".
    You can still have your nice single entry form.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  3. #3
    Join Date
    May 2009
    Posts
    10
    Thanks for the reply pootle flump, I appreciate you reading through the post!

    By "Normalise to First Normal Form" I interpret that as meaning you think I should have a "Site" table and move all the site_* fields there? My problem with that is that I'm finding it a touch hard to reverse engineer the code to "import" that into the form and summary table (I'm still starting with regards to learning PHP & MySQL)..but that's a problem outside of the scope of this discussion.

    What is in scope however is your valid point about - Imagine the query to "Show all the people with cancer". In that case, I would run the following:
    Code:
    SELECT 
    site_colon_ascending_colon, site_colon_hepatic_flexure, site_colon_transverse_colon, 
    site_colon_splenic_flexure, site_colon_descending_colon, site_colon_sigmoid_colon, 
    site_colon_rectosigmoid, site_rectum
    FROM admission_details
    WHERE
    site_colon_ascending_colon = 1 OR site_colon_hepatic_flexure = 1 OR
    site_colon_transverse_colon = 1 OR site_colon_splenic_flexure = 1 OR
    site_colon_descending_colon = 1 OR site_colon_sigmoid_colon = 1 OR
    site_colon_rectosigmoid = 1 OR site_rectum = 1
    ;
    I've tested it, it works..but would that end up choking MySQL server with thousands of records? Typically however, queries wouldn't run into that sort of length as most users will limit it to records within a certain period.

    Thanks in anticipation of your insight.

  4. #4
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    This isn't a performance issue.
    You have to write that query. You also have to alter the schema of the table, and any queries, and any forms, whenever a new cancer site is added.
    This is also a single query - there will be many more.
    The problems of repeating groups are well documented. Database design and presentation are unrelated - your form should be thoroughly decoupled from the database, and certainly should never influence the database design.
    I can't help with the PHP.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Just FYI, the way I do this in other FEs is simply a "sub form" (whatever that might be) which is a list of all the sites (drawn from data in a table, not the schema), allowing the user to tick the relevant sites. Adding a new site is trivial (just an entry in a table). Determining who has cancer is trivial. Establishing who has cancer in three or more sites is trivial. etc.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  6. #6
    Join Date
    May 2009
    Posts
    10
    Ok..point taken. Stick to First Normal Form - is the problems with repeating columns isolated to:
    1) Having to add a new column when new "repeating fields" are needed and the HDD storage space associated with that?
    2) Having to recode the queries to allow for these new columns (though you still have some recoding to do in the case of the new field being a record in a second table instead of a column in the original first table)
    ?

    Basically I want to know to problems associated with repeating groups vs the time it'll back me to retool the program (this also affects the csv output of the program so it goes deeper than just data entry & table presentation).

    BTW..do you mean "Form Entries" in "..other FEs is simply a "sub form".."

  7. #7
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Nothing to do with HDD either.
    This is the best database design article I know on the internet.
    The Relational Data Model, Normalisation and effective Database Design

    There should be no additional coding required, so point 2 is invalid.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  8. #8
    Join Date
    May 2009
    Posts
    10
    Wow..thanks for the reference. Looks like good reading (long unfortunately..but much good reading is).

    Ok...as much as I expected..I'll just have to kick the code (and myself) around until it submits to working with First Normal Form then.

    Thanks again for your feedback on this..much as I would have liked to get away with not having a sites table, looks like I have to. Appreciate your willingness to share you experience.

    Now I just need to find way of downloading all the PHP & MySQL knowledge I need in one shot (I wish)...

  9. #9
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by KevT
    The problem I can see in the database design is the site_* recurring fields - this is in voilation of The First Normal Form right?
    Not really true. Any SQL table which does not permit nulls and has a key on it can be regarded as being in 1NF (ie it accurately represents some relation).

    1NF doesn't prohibit any particular attributes from a table but there are plenty of other reasons why your proposed design probably isn't a good one. The more attributes you use to represent similar information in one table the greater the risk that you'll need redundant code in queries and applications to deal with those attributes. Redundancy of data and resulting anomalies may be a problem too.

    Anith Sen wrote a great article on 1NF just recently:
    Facts and Fallacies about First Normal Form

  10. #10
    Join Date
    May 2009
    Posts
    10
    Thanks for your input dportas. The more experience on the issue, the better.

    Quote Originally Posted by dportas
    The more attributes you use to represent similar information in one table the greater the risk that you'll need redundant code in queries and applications to deal with those attributes. Redundancy of data and resulting anomalies may be a problem too.
    That's what I was talking about when I wrote about "site_* recurring fields" in the DB design I quoted as being in violation of 1NF. HOWEVER.....

    Quote Originally Posted by dportas
    Anith Sen wrote a great article on 1NF just recently:
    Facts and Fallacies about First Normal Form
    ..thanks for the reference. Interesting article, especially the section "The ambiguity of Repeating Groups". Whilst I still hold to the idea that the "site_* recurring fields" is a 1NF violation as I believe you and pootle flump do..that section got me wondering. Anith's example where you have date attributes looking like repeating groups when actually, they were logically applying to different concepts, that's what got me.
    In my case I wonder if you could argue: sure, logically the "site_appendix" & "site_caecum" fields are referring conceptually to "sites" for cancer and therefore should be placed in their own "sites" table. However, you could also argue that they are also in some ways independent of each other conceptually since the fields are there to record the presence (value=1) or abscence (value=NULL) of something in an area and you don't necessarily need to think of "sites" as being the thing that those fields share in common?

    I think perhaps that the above is just extending Anith's example too far? I think I still hold that I need to have a separate "sites" table.

    Thing is though, each record in the admission_details table does need to detail if there was (value=1) or wasn't (value=NULL) a cancer for each & every site. Therefore, if I were to have a "sites" table as well as a "admission_details" table, I will need a "site_positive" table to link the 1st 2 in terms of recording if cancer was present at each & every site?
    That is to say that the table "sites" looks like this...

    Code:
    Table: sites
    -----------
    idsites (primary key)
    site_name (VARCHAR eg: site_appendix etc.)
    And the "site_positive" table (marks off if a site was noted to have a cancer in a particular admission) will look like this:

    Code:
    Table: site_positive
    -------------------
    idadmission_details (foreign key from "admission_details" table)
    idsites (foreign key from "sites" table)
    The abscence of any records with idadmission_details=1 in the "site_positive" table indicates that no cancers were present for the admission identified as idadmission_details=1. Similarly, a record in the "site_positive" table with idadmission_details=2 & idsites=1 means that for the admission identified as idadmission_details=2, a cancer in idsite=1 was present.

    How's that design?

  11. #11
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by KevT
    Whilst I still hold to the idea that the "site_* recurring fields" is a 1NF violation as I believe you and pootle flump do..
    My point (and Aniths) is that a set of columns is NOT a repeating group. A repeating group means more than one value occuring in a single attribute position, which is impossible in SQL. Columns in a table therefore do not violate 1NF just because they have similar content or names.

    1NF is violated only if the table permits nulls or duplicate rows.

  12. #12
    Join Date
    May 2009
    Posts
    10
    Quote Originally Posted by dportas
    My point (and Aniths) is that a set of columns is NOT a repeating group. A repeating group means more than one value occuring in a single attribute position, which is impossible in SQL.
    Right. Your last sentence is the important once since I couldn't see how you could actually have more than one value in a single attribute position - it's impossible No problems there then...

    Quote Originally Posted by dportas
    1NF is violated only if the table permits nulls or duplicate rows.
    I was still going to interpret a NULL value as informative ie: "not present" as opposed to 1 which is "present". Still bad form? If so, what do you think of the 3 table design I propose in my last post?

    Thanks again.

  13. #13
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by KevT
    I was still going to interpret a NULL value as informative ie: "not present" as opposed to 1 which is "present
    I don't think that's a good idea at all. Why not represent the information with a value instead of a null? Nulls aren't values. Nulls are generally much more complex to deal with, which is why their use should preferably be minimised or avoided altogether.

    Your second, three-table design seems much more sensible to me.

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    impossible, eh? well, technically, you are right, i suppose
    Code:
    INSERT
      INTO products
         ( prod_name
         , prod_price
         , prod_categories )
    VALUES
         ( 'widget'
         , 9.37
         , '12,42,183' )
    there is more to normalization than slavish adherence to technicalities and arcane rules

    technically, the prod_categories value in this example is a single value, so it does not violate 1NF

    in practical terms, this design is b0rked

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by r937
    technically, the prod_categories value in this example is a single value, so it does not violate 1NF
    in practical terms, this design is b0rked
    Likely you are right: with the purpose that I expect you intended for the prod_categories column that's probably not a good design. As I already said, the fact that a table is in 1NF doesn't necessarily mean the design is "good enough" nor does it mean you should abandon common sense in your design.

    It's important to be clear about what 1NF means and does not mean. 1NF sets out a most basic underlying principle for relational database design. There's nothing slavish about applying it. In a relational system we wouldn't even have to worry about it but in a SQL system it's important and useful in order to avoid certain serious pitfalls. That's why it doesn't do any good to confuse the issue with irrelevant diversions.

    In any case your example isn't much of an explanation. Can you explain concisely just what rule you are applying that makes it undesirable to have the table in your example and others like it? I suspect you'll find a reliable definition very difficult to come up with or explain - genuinely arcane in fact. It may well turn out that you have a useful general principle in mind, but it has nothing to do with Normalization.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •