Results 1 to 15 of 15
  1. #1
    Join Date
    Dec 2004
    Posts
    4

    Normalization dogma?

    As a web programmer I have noticed, that the traditional concepts of database normalization, aren't the best solution for a particular problem as most of it is aimed to the conservation of storage space. Today with the cheap storage space, I have noticed that the main problem in web programming is speed (the script-timeout and user patience problem), so I have been forced to introduce some "sins" in my databases in order to speed up reports (yes sure I can always recomend the new ultra fast server to my customer but as my software is designed as the customer uses the hosting of his/her choose...).
    I will like to know if other programmers/analysts have more information or experience with that

    T. Fell

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    my experience is that you run serious risks violating normalization

    update anomalies, and the increased processing necessary to avoid them

    maintenance issues, and designs that are difficult to change

    the list is long

    you're right, on the web performance is most important, but denormalization is usually not the answer
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Hi Tomas,

    You say,
    traditional concepts of database normalization, aren't the best solution for a particular problem as most of it is aimed to the conservation of storage space.
    That is a serious misunderstanding of the purpose of normalization. Space savings, if it occurs as a result of normalization, is a by-product of normalization and not its main purpose.

    As Rudy (r937) alluded to in his reply, normalization provides much, much more than space savings. Some of the benefits are:

    1. Data integrity.
    2. One fact in one place.
    3. Ability to modify programs easily.
    If performance for generating reports is a concern, then you should look at other options such as having a separate schema for reports, but based on the normalized transaction database, loaded frequently (once a day, or once a week, etc., depending on your requirements.)

    I have implemented a highly normalized database (at least third normal form) with users in Ireland and the database in Toronto, Canada. It, too, had a web based UI, with a small pond (the Atlantic ocean) separating the UI from the server. Yet, performance was never an issue.

    Do look at your performance and see if you really need to de-normalize. Please remember that de-normalizing without understanding its risks can be very dangerous, as Rudy pointed out.

    Ravi

  4. #4
    Join Date
    Jul 2003
    Location
    Australia
    Posts
    217
    Quote Originally Posted by Tomas Fell
    As a web programmer I have noticed, that the traditional concepts of database normalization, aren't the best solution for a particular problem as most of it is aimed to the conservation of storage space. Today with the cheap storage space, I have noticed that the main problem in web programming is speed (the script-timeout and user patience problem), so I have been forced to introduce some "sins" in my databases in order to speed up reports (yes sure I can always recomend the new ultra fast server to my customer but as my software is designed as the customer uses the hosting of his/her choose...).
    Hi Thomas, I disagree with you. As Ravi said, the main purpose of normalization is to avoid update anomaly and preserve data integrity by putting each fact in ONLY ONE PLACE.

    Remember, denormalization means that whenever a "functionally dependent" data is changed, your program has to search for all occurrences and then update all of them.

    This will have to happen in many daily operations, which happen much more often than running reports --- a report is usually run once in a while.

    Which is more costly ?

  5. #5
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Highly normalized databases are very effective for online transactions.

    However, as was pointed out reporting can be slower due to the need for multiple joins to assemble data. This is gradually decreasing as a problem as servers become increasingly powerful.

    The solution in the meantime is to have an online system that is normalized for transactions that updates a denormalized reporting database or data warehouse at desired intervals. The reporting database is then used as needed.

  6. #6
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Highly normalized databases are very effective for online transactions.

    However, as was pointed out reporting can be slower due to the need for multiple joins to assemble data. This is gradually decreasing as a problem as servers become increasingly powerful.

    The solution in the meantime is to have an online system that is normalized for transactions that updates a denormalized reporting database or data warehouse at desired intervals. The reporting database is then used as needed.

  7. #7
    Join Date
    Dec 2004
    Posts
    4

    Thanks

    Thanks for your comment, sure I can hang on the processor speed and yes it's very hard to mantain data consistent in the database but for my web based systems a 42 seconds query (mysql) is prohibitive, sure my ego suffers of not being able to write a wonderfull query with multiple joins but works for me

  8. #8
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    A 42-second query could be a sign that your SQL or database needed tuning, rather than that you needed to denormalise, which is an extreme thing to do.

  9. #9
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    I agree with Tony (andrewst) that a 42 second response time is a symtpom of something else, not an inherent flaw with a well-normalized data model.

    How big is your database? What is the size of the table that you are accessing?

    When we developed a web-based application for a bank in Ireland, with the application server and database servers in Toronto, Canada, we had a database of 300,000 customers with 1.5 million accounts. Yet, it never took that long to query the database. Typically, a second or so was the total response time.

    Ravi
    Last edited by rajiravi; 01-09-05 at 02:40.

  10. #10
    Join Date
    Dec 2004
    Posts
    54

    Can you be 'too normal'

    Ravi,
    There's been a lot of good comments here. Normalization has a lot of benefits. Data quality is one.

    Another VERY IMPORTANT benefit not mentioned is minizing change. When you design an application, I can guarantee you ONE THING. The business will change. Here is where you will see the benefit of a normalized data model. In many cases, changes to the database are made by simply adding rows.

    For example - normalizing people and roles. Many times business people combine these together, a customer, a policy holder, a broker, an employee..... if you separate these into a person playing a role, any new role that comes along is accomodated by adding a row in the role table. If you design a table called 'customer' or 'broker' or 'employee', etc...... then your stuck when a new role comes along.


    In many cases normalization costs a little more in development up front, it takes more time to figure out the joins....... but in the long run you'll gain that back easily MANY TIMES OVER.

    I do agree with you some in that, in some cases (not all) some of the web designs, especially 'wizzards' force a system user to follow rules that prevent data quality errors. For example, picking or selecting a 'state' from a drop down list, keeps them from a data entry error if they manually entered a state in to a text box.

    There are always short cuts, and 'down and dirty' ways of doing things.... but they'll hurt you bad EVERY TIME.


    Vmusic

  11. #11
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    Well I suppose its to be expected that asking a question about de-normailising on a database forum would get the expected response. In my experience there are a few rare circumstances when denormalising or storing derived data in a db is sensible but they are few and far between. If there's a performance problem its usually down topoor design or poor implemetation.

    I would be concerned immediately with a 42 second response time, however 42 by itself is a looong time over the web its meaningless by itself. It could reflect the amount of time it takes to return the data over the connection. What response time did you get after denormalising the data, and how certain are you these times are repeatable (ie not affected by factors outside your control (eg bandwith restrictions server activity, network lockouts etc...).

    A 42 second response time to me would indicate to me that you have a query which is malformed, possible causes could be incorrect specified joins, joins on unindexed columns incorrect datatypes or possibly the indexes are goosed. It could even be inherent to the way in which the system is deployed eg the ISP has overloaded the server (web or data).

    I've yet to see any query sent to MySQL or SQL Server or DB2, irresepctive of data volumes return in more than 5 seconds, the actual return to the customer/viewer can take anywhere between 1..5 secoinds and 60 seconds or more, depending on the type of connection they have AND how busy the network or communication pathway(s) is/are.

    So then Tomas what improvement did you actually see after denormalising?

  12. #12
    Join Date
    Oct 2004
    Posts
    9
    You've got some good responses here. I can tell you from experience that denormalizing data can be your worst nightmare. As others have pointed out, what happens if you need to make some sort of change? As for data quality, don't get me started. Whenever you have a design that requires the same data in more than one place you ARE going to have problems. What is THAT going to do to performance and cost? And, don't forget the negative PR that can come out of it. Or the issues of disgruntled customers who have had problems because of that.

    2 realworld examples:

    Several years ago, after giving birth to a child, my insurance carrier promptly paid all of the bills presented, except for one. When I finally spoke to a person I discovered that their system required that each bill had to have the age of the patient entered (rather than being drawn from a table with the data about the patient), no matter how many bills were presented. And, of course there was a problem with the age - someone had put the *year of my birth* into the field for my AGE. So, of course, the system thought I should covered by Medicare. (Of course some decent bussiness rules would have caught the fact that it is highly unlikely that a woman of 65 would need coverage for nursery care of her newborn.) I'm still seeing the results of some of these design decisions in the fact that some information does not show up or needs to be dug for, on their web site.

    In the days that Verizon (before it was verizon) routinely delivered telephone books, we stopped receiving ours. It never occured to me that the problem was an incorrect address, because we were getting the telephone bill, and had not problems with service, either, so I knew that they had the correct address (This was before I leared about programing and normalized vs non-normalized databases. I just assumed that common sense dictates that you use the same address for everything, unless otherwise dictated for some reason.) Well, again, the problem was that they had the same data in different places, and, they were miselivering the books. I wonder what that little glitch cost them?

    Can the bussiness afford problems like this?

  13. #13
    Join Date
    Dec 2004
    Location
    Broomfield, Colorado
    Posts
    16

    Got Data Quality?

    Interesting thread. Sounds like many agree that data quality is a major drawback to denormalized databases.

    Turn your attention to a data warehouse which contains derived & denormalized data. Data Quality becomes a huge issue. Why I bring up data warehouses in a moment.

    One of the teams I work with consists of web developers, and we run into the performance issue all of the time. Yes, often it is a query in need of some serious tuning. And we do a pretty good job of monitoring/administering our database to maximize performance. And I always try to provide a normalized solution.

    But the reality is that sometimes Marketing comes up with a requirement that is challenging to our mix of hardware and _skill_sets_. Normaization is great..until we find that we have to sum 100,000 records to display a year to date profitablilty report for sales reps.

    To work around the harsh requirements, we sometimes are forced to employ "rollup" tables (yuck) that are updated nightly via batch programs ( yuckier). So we are basically implementing a poor-mans data warehouse in the operational system. And, at our choosing, we can monitor the data quality of the rollup table with an automated process. When errors are discovered, we can resolve, and refine the processes ( batch programs, procedures etc. ) so that it is in control and pre-yuck.

    johndz
    http://web.tampabay.rr.com/johndz

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    rollup tables (i call them "summary tables") updated nightly in batch are not yucky, they are gorgeous

    a tried and true method for delivering information in an efficient manner

    what's yucky about that?

    you might as well throw every data warehouse out the window for containing redundant data
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  15. #15
    Join Date
    Dec 2004
    Location
    Broomfield, Colorado
    Posts
    16
    I agree. Unfortunately, sometimes the batch processes are not written quite as I would like so I get paged in the middle of the night.

    Just like any other solution, they need to be justified.

    johndz
    http://web.tampabay.rr.com/johndz

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •