Results 1 to 7 of 7
  1. #1
    Join Date
    Nov 2006
    Posts
    3

    Basic question re database normalization

    Hi,

    I am working with a sharp but relatively inexperienced programmer on a PHP/MySQL database project. I'm no database expert but have some knowledge of database design and structure, although much of my knowledge is probably 20 years out of date.

    The project we're working on is content management and display of images and text on a website. Integrity and accuracy of the data is crucial. There are a bunch of tables that have to be accessed to display data on a given page, particularly when the user has requested a multi-tiered sort.

    I was taught (20 years ago) that data should be fully normalized, i.e., you should not store the same data (birth date or legal name, for example) in multiple tables to eliminate the possibility of synchronization problems (data being updated in one table but not in another, due to scripting error or write problem or whatever.)

    The programmer is arguing that most of the arguments for normalization are dated (storage and processing speed is no longer a big issue) and that data integrity is not an issue either.

    He says it's easier to code the application, and it's obviously faster to retrieve the data if he can copy data from the master record to other records multiple times. I can see that it will be much less load on the server to open one table and retrieve data and output it than to open 5 or 10 tables to assemble the data to be output.

    Speed and processor load is, I think, a relatively minor issue. The server is a fast dedicated box and will have maybe a couple thousand users at a time absolute tops. We are currently using a fully normalized database with several hundred simultaneous users and are not having any speed issues.

    My issue is in terms of proper database design. From a data integrity perspective, is it still standard procedure to fully normalize critical data, reducing the likelihood of coding errors that could cause sync problems with the same data in multiple tables? Are a lot of programmers now duplicating data between tables to speed access?

    Any thoughts/comments much appreciated.

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    it does indeed sound like your guy is inexperienced

    he's right that storage and processing speed are no longer a "big" issue, but to denormalize for performance before a performance issue has appeared is just plain naive

    but data integrity not an issue? sorry, but it always is, and there's no "maybe" about it

    you either have data integrity, or you update your résumé

    if "a lot of programmers" are "duplicating data to speed access," my suspicion is that they don't know how to write an efficient join query
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Nov 2006
    Posts
    3
    Thanks so much for your helpful info.

    One clarification: He was saying that data integrity wasn't important, he said that there wasn't any decrease in data integrity by writing the same info (and potentially having to update said info) in multiple tables.

    While I can agree with that theoretically, I also know that writing the same data in multiple places is introducing more potential points of failure and/or sync problems, more likelihood of scripting bugs, etc., if you have to update multiple places where the data resides, particularly given that there are likely going to be multiple scripts updating the tables.

    I was looking to see if my viewpoint on that issue is "outdated" since I haven't really looked at database theory since the course I took 20+ years ago.

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    theoretically, you can obtain the number of cows in a herd by counting their legs and dividing by four, but i wouldn't do it that way

    sounds like you're not going to win any arguments with this guy -- of course, he's right, if you have redundant data all over the place, but always update every bit of it carefully, there won;t be any problem

    however, your viewpoint is not outdated
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Nov 2006
    Posts
    3
    Thank you very much.

    I just like to make sure before holding to a viewpoint that my information isn't outdated, and since it isn't, either the coder will get with the program, or we'll be looking for a new coder

    I really, really appreciate your insight.

    - Chip

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    I have to second R937's opinion, emphatically.

    It is possible to keep information in many places updated. It is possible to never miss a single update in every script you write. Given enough time and sufficient thrust, you can make a pig fly, but that is still the exception rather than the rule.

    I insist that databases be normalized first, then only denormalize under a few very unusual circumstances. The programming staff still rails at this, they'd rather have things "easier" with columns duplicated as they see fit, and placed where they need fewer joins. The business analysts struggled at first, but now that they've seen the difference between properly normalized and non-normalized database designs, and have come to realize that there are lots fewer "fire-drills" with the normalized designs, the busines side is actually insisting on normalization even when that increases the up-front coding costs!

    -PatP

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Anyone with a modicum of experience with databases knows that data integrity is of utmost importance, and that normalization if the most effective means of maintaining data integrity.
    Since your colleague apparently does not know this, he obviously lacks even moderate database experience.
    Since he lacks database experience, he has no business dictating how the database will be designed.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •