05-03-11, 12:15 #1Registered User
- Join Date
- May 2011
I am a co-writer and administrator of a, rather, large database (mysql, php, ajax). It started as a hobby, but now I work at the IT department. I didn't have training (yet) in dba etc.
I hear a lot of talk about normalisation of relational databases. Because of a few comment I got about my own database I started to read about normalisation. And the more I read the more questions I got.
I seems that normalisation is about reducing redundant data. But what is redundant..? Is it data that 'accidentally' is duplicated or is it data that you deliberately duplicate. I can imaging situations that you want to duplicate. is doing that 'de-normalising'?
I've also been reading about 'noSql'. A concept that does not have normalisation an schemas. Reading that made me think about something else.
Being first introduced in 1970, isn't normalisation an old concept? created in a time that computers were slow and didn't have enough calculating 'powers' to deal with large amounts of data. These days 50.000 records is considered normal and also considered a 'small' database. I can imagine that back in the 70's the situation whas a bit different and the need for normalisation therefore greater.
And the programming languages are also developing. With ajax you can dynamically update your database and perform select query's, also something that couldn't be done in the past.
And foreign keys. Why not, instaed of creating keys on database level, create dynamic keys with, for instance, php? Selecting a record and based on a field selecting an other record from a different tabel or database. Keeps you very flexible.
Maybe I don't fully understand normalisation yet. But if someone has some thoughts on the above I would gladly hear (read) it.
05-03-11, 15:38 #2Resident Curmudgeon
Provided Answers: 54
- Join Date
- Feb 2004
- In front of the computer
Normalization boils down to "One piece of information is stored in only one place, except when it is used as a link to show connection to some other data". The idea is to save the programmer/admnistrator from going insane trying to figure out which of the N copies of a single piece of information is "correct".
For insance, you only want to store a customer's balance one place in a database. If you store the balance in the customer table, the order table, and the balance forward table, it becomes very difficult to determine the actual balance when there is a disagreement between them.
An example where you might want pseudo-duplicates might be on product price. There is one price associated with the product itself, but there could be a second copy of the price in the invoice item table because it is possible that someone (like a manger or owner) might have "adjusted" the price on an item for one customer or even for a specific invoice.
From a storage perspective, normallization only matters on small devices (such as bar code scanners or cell phones), it isn't a practical concern on servers. The saving of administator/programmer sanity is a concern (at least to me) everywhere!
-PatPIn theory, theory and practice are identical. In practice, theory and practice are unrelated.
05-04-11, 12:27 #3Registered User
Provided Answers: 5
- Join Date
- Dec 2007
- Richmond, VA
it is also a matter of performance. if you store the same data in 15 different tables and you have to insert it to all 15 tables, update it in all 15 tables, delete it from all 15 tables. you can see how the amount of work and the amount of code starts to pile up. Also, say the tables are owned by different groups within an application team, who is resposible for updating which tables? If the one group does not populate the data everywher, now the others have to have some way of getting the data from the other group's table(s) into their own. Now, how long of a lag is there between it getting from one table to another. Is it like those conversations I have with a customer service department, where they say I updated the database, it will be effective in 24 - 48 hours?