If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Basic question re database normalization

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 11-05-06, 20:06
calchip calchip is offline
Registered User
 
Join Date: Nov 2006
Posts: 3
Basic question re database normalization

Hi,

I am working with a sharp but relatively inexperienced programmer on a PHP/MySQL database project. I'm no database expert but have some knowledge of database design and structure, although much of my knowledge is probably 20 years out of date.

The project we're working on is content management and display of images and text on a website. Integrity and accuracy of the data is crucial. There are a bunch of tables that have to be accessed to display data on a given page, particularly when the user has requested a multi-tiered sort.

I was taught (20 years ago) that data should be fully normalized, i.e., you should not store the same data (birth date or legal name, for example) in multiple tables to eliminate the possibility of synchronization problems (data being updated in one table but not in another, due to scripting error or write problem or whatever.)

The programmer is arguing that most of the arguments for normalization are dated (storage and processing speed is no longer a big issue) and that data integrity is not an issue either.

He says it's easier to code the application, and it's obviously faster to retrieve the data if he can copy data from the master record to other records multiple times. I can see that it will be much less load on the server to open one table and retrieve data and output it than to open 5 or 10 tables to assemble the data to be output.

Speed and processor load is, I think, a relatively minor issue. The server is a fast dedicated box and will have maybe a couple thousand users at a time absolute tops. We are currently using a fully normalized database with several hundred simultaneous users and are not having any speed issues.

My issue is in terms of proper database design. From a data integrity perspective, is it still standard procedure to fully normalize critical data, reducing the likelihood of coding errors that could cause sync problems with the same data in multiple tables? Are a lot of programmers now duplicating data between tables to speed access?

Any thoughts/comments much appreciated.
Reply With Quote
  #2 (permalink)  
Old 11-05-06, 22:09
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
it does indeed sound like your guy is inexperienced

he's right that storage and processing speed are no longer a "big" issue, but to denormalize for performance before a performance issue has appeared is just plain naive

but data integrity not an issue? sorry, but it always is, and there's no "maybe" about it

you either have data integrity, or you update your résumé

if "a lot of programmers" are "duplicating data to speed access," my suspicion is that they don't know how to write an efficient join query
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #3 (permalink)  
Old 11-05-06, 23:23
calchip calchip is offline
Registered User
 
Join Date: Nov 2006
Posts: 3
Thanks so much for your helpful info.

One clarification: He was saying that data integrity wasn't important, he said that there wasn't any decrease in data integrity by writing the same info (and potentially having to update said info) in multiple tables.

While I can agree with that theoretically, I also know that writing the same data in multiple places is introducing more potential points of failure and/or sync problems, more likelihood of scripting bugs, etc., if you have to update multiple places where the data resides, particularly given that there are likely going to be multiple scripts updating the tables.

I was looking to see if my viewpoint on that issue is "outdated" since I haven't really looked at database theory since the course I took 20+ years ago.
Reply With Quote
  #4 (permalink)  
Old 11-06-06, 04:01
r937 r937 is offline
SQL Consultant
 
Join Date: Apr 2002
Location: Toronto, Canada
Posts: 19,524
theoretically, you can obtain the number of cows in a herd by counting their legs and dividing by four, but i wouldn't do it that way

sounds like you're not going to win any arguments with this guy -- of course, he's right, if you have redundant data all over the place, but always update every bit of it carefully, there won;t be any problem

however, your viewpoint is not outdated
__________________
r937.com | rudy.ca
please visit Simply SQL and buy my book
Reply With Quote
  #5 (permalink)  
Old 11-06-06, 04:11
calchip calchip is offline
Registered User
 
Join Date: Nov 2006
Posts: 3
Thank you very much.

I just like to make sure before holding to a viewpoint that my information isn't outdated, and since it isn't, either the coder will get with the program, or we'll be looking for a new coder

I really, really appreciate your insight.

- Chip
Reply With Quote
  #6 (permalink)  
Old 11-06-06, 09:00
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
I have to second R937's opinion, emphatically.

It is possible to keep information in many places updated. It is possible to never miss a single update in every script you write. Given enough time and sufficient thrust, you can make a pig fly, but that is still the exception rather than the rule.

I insist that databases be normalized first, then only denormalize under a few very unusual circumstances. The programming staff still rails at this, they'd rather have things "easier" with columns duplicated as they see fit, and placed where they need fewer joins. The business analysts struggled at first, but now that they've seen the difference between properly normalized and non-normalized database designs, and have come to realize that there are lots fewer "fire-drills" with the normalized designs, the busines side is actually insisting on normalization even when that increases the up-front coding costs!

-PatP
Reply With Quote
  #7 (permalink)  
Old 11-06-06, 09:28
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
Anyone with a modicum of experience with databases knows that data integrity is of utmost importance, and that normalization if the most effective means of maintaining data integrity.
Since your colleague apparently does not know this, he obviously lacks even moderate database experience.
Since he lacks database experience, he has no business dictating how the database will be designed.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On