I am not completely new to databases and normalization but I still have alot more I could learn. However I am a student looking to my dissertation based on the idea of normalization
I was just wondering if any of you know of any software that will take SQL tables and check to see if they are in normal form?
If not, or even if yes, is this easily achievable?
I was also wondering with normalization do you have to know the names of the columns to be able to do it
For instance you could perform normalization on a table with names such as 'age' 'teacher' 'location' because you know how they relate to each other. Could you perform normalization on a table that had names '.' '/' '?' even though you have no idea what each of them are?
Any help would be greatly appreciated
PS I have looked around for the answer to this myself, I'm not being lazy. I've had no luck though so I was really hoping someone experienced could spare 5minutes and reply
Tell us what you think - I have my thoughts on this.
Was that aimed at me? or everyone?
And if you are refering to whether my question of whether it would be capable of being done completely free and independant of user input I guess the answer would be yes and no
Yes, to some extent it would. Its been a long time since I've really had a proper go at SQL and normalization but I remember 1NF can be done completely automatically as that is based on not having more than two entries in a single attribute (Sorry if i'm getting entries, attributes and so on mixed up. It has been a long time)
2nf and 3nf could be possible providing some form of AI method was used, however this would generally be going off the scope of my dissertation, but it would be possible. Saying that it would probably need user validation as there would be so many possible words that could be used.
If it was marketed towards a specific industry, that would make it somewhat easier.
So basically, you could do it independant from user interfernce based on dependencies, but this wouldn't make it 100% accurate. At Least I get that impression
I've just been reminding myself about 2NF from wiki (Seem to use that for everything now) and as it seems 2nf would be possible automatically but only to a certain percent of correctness, dont think it would be possible to get the table 100% 2nf without any form of user input, at least not on every table.
I must say, you are very good with all this help stuff
If you can understand the meaning of column names, you can infer some dependencies (i.e. a computer likely couldn't without a sophisticated semantic rules engine)
If you are presented the dependencies, you can normalise (but this isn't really what you are looking to present to the computer)
If we rule these two out in your problem, then the software would have to infer all this from a data set.
If a dependency is not represented in the data set, then the software cannot infer it.
If a dependency that does not exist is implied by the dataset, the software will decompose unnecessarily, or assign attributes to the wrong entity
So for this software to work it would require a "perfect" data set (i.e. one that not only represents every single dependency, but also does not imply any dependencies where they do not exist).
My only effort towards this was developing a script to identify all unique and independent composite keys in a data table. The task was partly an exercise in msqlbation, but I end up using it pretty frequently when analyzing tables for a new client project.
If it's not practically useful, then it's practically useless.
Given a relation schema and a set of dependencies I think you'll find that determining candidate keys and therefore achieving Boyce Codd Normal Form is proven to be a NP Complete problem. I don't have references to hand but you should find them in a decent theory book like the Alice Book (Google "foundations of databases").
2NF and 3NF are not really important but there are certainly formal methods and tools to achieve 5NF and 6NF. See Alice again, also Terry Halpin's Information Modelling and Relational Databases.
The idea of determining dependencies just from attribute names won't run. Human language is far too vague. AI won't help because except for trivial cases even human beings can't do it reliably without additional information. You could invent a formal grammar for attribute naming I suppose but if a person has to translate requirements first then why couldn't they just list the dependencies?