Results 1 to 14 of 14
  1. #1
    Join Date
    Jul 2009
    Posts
    7

    Why is data integrity so important

    Hi,

    We currently have DB's where the constraints are enforced through the API but the DB itself has no constraints.

    The problem is that because the DB does not enforce its own integrity, the data is of a low quality with plenty of unexpected issues.

    My boss has tasked me to explain why we should design our DB's with constraints instead of being enforced through the API.

    Can you help?

    Thanks

    Josh

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by joshfolgado
    My boss has tasked me to explain why we should design our DB's with constraints instead of being enforced through the API.
    inform your boss that because the database does not enforce its own integrity, the data is of a low quality with plenty of unexpected issues

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    This always gets my goat - why is enforcing relational integrity a "decision" that should be justified? The "courageous" decision was leaving it out in the first place.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  4. #4
    Join Date
    Jul 2009
    Posts
    7

    Thanks for your comments

    I know and agree that data integrity is totally essential and it is crazy that a person has to justify this.

    Not having data integrity does indeed mean that companies have unexpected errors that most times cannot be duplicated.

    But unfortunately I do have to justify that we need data integrity where database constraints enforce data quality instead of using API driven constraints.

    To give you an example, we even have fields called "sequence" to save at which order the records are inserted instead of using an auto-incrementing column that is also a primary key.

    I am looking for a document that proves that data integrity is very important and preferably this document follows some international standard.

    Thanks to all for your help

    Josh

  5. #5
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    Your sequence example is not related to integrity. Integrity ensures the data is valid, not that it is correct.

    Take your pick of pretty well any and all relational database design books and articles written in the last thirty years.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  6. #6
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    In fact, I wouldn't even bother with articles and standards.

    Pull out some of the crappy data and ask them "do you want nonsense like this in your database or do you want valid data you can use?". A semi competent business aware person isn't going to need to see a dry, academic article by Chris Date to know that garbage in the database is not going to support his business.
    Testimonial:
    pootle flump
    ur codings are working excelent.

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    If you do not enforce relational integrity in the database (not in the middle tier or application layer), then it is only a matter of "when", and not "if", your data becomes corrupted.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  8. #8
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    You've asked two different questions. The one in the title of this thread and the one in the actual post.

    The first one is easy and you've answered it yourself: Integrity is important because without it you will get inconsistent answers from your data.

    The second question is why enforce integrity constraints in the database rather than the application. Most SQL DBMSs can only enforce a limited range of constraints. Even the most basic referential integrity support (foreign keys) is very limited in most SQL systems. So it's a fact that many integrity constraints are often enforced in application code or in other places outside the DBMS such as rules engines or a data integration tier.

    However, where it is possible to use the DBMS for constraint enforcement then there are important advantages to doing so:

    - Integrity logic only has to be implemented and maintained in one place (the database) rather than in many different applications.

    - By decoupling integrity constraints from the application code change control becomes more straightforward.

    - Integrity is (usually) guaranteed when implemented in the database, ie. it cannot be bypassed as easily as the application can.

    - Constraint enforcement in the database is often more efficient than is possible in the application because it doesn't have to be done over the network layer.

    - Constraints in the database are metadata available to the query optimiser, which may take advantage of them to make queries and other code more efficient.

    - Constraints in the database are also self-documenting for developers and are available to other tools such as data modelling software or integration software.

    But as I already said, it's almost inevitable that some or even most integrity constraints be enforced only through application code. Regretably that's the state of DBMS software in the industry today.
    Last edited by dportas; 07-28-09 at 13:35.

  9. #9
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by dportas
    But as I already said, it's almost inevitable that some or even most integrity constraints be enforced only through application code. Regretably that's the state of DBMS software in the industry today.
    you made a number of very valuable and informative points

    and then you really harshed my mellow with that bummer of a conclusion

    it's downright depressing

    you're as much as saying that data integrity can't be completely accomplished without application integrity checks, leaving the inexperienced reader to conclude that he might as well do it all in the application

    the rule of thumb should be: do as much as you can with the database, and do only the bits that the database cannot do in the application (and these bits are usually obscure and rare)

    your tirade against the inadequacies of current relational database products is getting tiresome

    no offence intended towards you personally, of course

    Last edited by r937; 07-28-09 at 14:59.
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  10. #10
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Rudy,

    My conclusion is indeed depressing, I agree. But it is a widely recognised problem and one which the industry has spent billions fixing and continues to try to solve. It's not just me who thinks so, Bill Gates agrees (even though he wrongly thinks the relational model is to blame). It's a plain fact that businesses spend $millions on rules engines and data quality software because of the limitations of the DBMSs they use. So it's clearly a hot and very practical issue for lots of people.

    Gates talks up declarative modeling language effort | Developer World - InfoWorld
    Business Rules Community (BRCommunity.com): The world's most trusted resource for business rule professionals

    Of course I agree that constraints should be enforced in the database whenever possible but in practice many of them can't be. It isn't an obscure or rare problem at all. In fact the commonest such limitation occurs in most databases I know (even yours I suspect). A generic example would be to enforce the rule that an Order must include at least one Product, or an Invoice must have at least one detail line. That's a very basic business rule but many SQL systems can't do it at all. The ones that can require lots of messy procedural code and inefficient compromises.

  11. #11
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by dportas
    ... BRCommunity.com
    omg, ron ross, i met him back in the '80s

    i have a ton of his newsletters in a box somewhere

    *now* i know where you're coming from


    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  12. #12
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by dportas
    ...messy procedural code and inefficient compromises.
    i got yer compromise right here...

    use an INNER JOIN on all data retrievals

    (declaring a VIEW is neither messy nor inefficient)

    you will never see an order until it has a product, nor an invoice until it has a detail line

    vwalah

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  13. #13
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by r937
    i got yer compromise right here...

    use an INNER JOIN on all data retrievals
    I hope you are joking but I can't help stating the obvious in case anyone thinks you are serious. Not everyone here will recognise your sense of irony Rudy!

    Using INNER JOINs as a substitute for referential integrity means that users can enter inconsistent data but then it disappears and cannot be retrieved. Most good testers would consider that a bug. At least I hope so. If not, then you should probably hire some better testers.

    If it ever did make it into production then you would be multiplying the problems many-fold because the joins will exclude rows from every table that references the inconsistent rows you are hiding from users. Then maybe the data model or the queries change slightly, maybe some rows reappear and some don't. One day of course someone has to come along and clear up the mess and anyone who has done large scale data integration or migration projects will know what kind of pain is caused by unenforced constraints.

    Referential constraints and other business rules should always be enforced as far as is practical - either in the database or in application code or via some other enforcement mechanism.

    Don't rely on application code, or views, or queries just to conceal the problems in your data. That's definitely a worse option than putting constraints into application code.
    Last edited by dportas; 07-28-09 at 17:46.

  14. #14
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by dportas
    Referential constraints and other business rules should always be enforced as far as is practical
    [emphasis added]

    you're the man now, dog

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •