Results 1 to 14 of 14
  1. #1
    Join Date
    Sep 2014
    Posts
    7

    Post What's the worst that can happen with below database design? Immediately or in the fu

    Im wondering what's the worst that can happen with a database design like the one im entertaining below.

    ================================================== ========
    Im designing a micro framework which is targeted at making module development easier. For that, i am trying to make a general db design which will be usable for majority of potential data models, without having the module developer need to create new tables for their models. This will also allow me to use a generic model class which will serve all kinds of requests over itself, thereby giving the MF engine control over the process and making engine development easier.

    So in short, there will be 3-4 big tables with many differing types of fields with generic names.

    Here is the database schema - screenshot at imgur:

    http://i.imgur.com/FurgHXh.jpg

    Attribute keys of each table introduced to each other to allow for an inter-relationship in between tables.

    Field types can be spread out in any format appropriate; Maybe two of them longtext, one of them bigint or one of them tinyint one of them varchar.

    Naturally there will be numerous empty fields in tables for data models which dont use all of the fields. I will try to minimize the impact by using varchar with default value null for most text fields. And will try to minimize the impact of other types of fields. I dont think such size overhead will be too much impact where application will be used.

    I dont know how i should distribute the fields though. How many Longtexts in a given table, or how many INTs or any bigint etc. I dont know a statistic to make such an estimation for covering most used field combinations.
    ================================================== ==============
    Last edited by unity100; 09-04-14 at 18:26. Reason: corrected the screenshot

  2. #2
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Short answer:
    Horrible to maintain.
    Horrible to optimize.
    George
    Home | Blog

  3. #3
    Join Date
    Nov 2004
    Location
    out on a limb
    Posts
    13,692
    ...manky to understand
    ...manky to extend
    ...arguably may save time in developing in a OOP context
    ...manky to scale up
    ...looking at the data requires detailed knowledge of the application design from the outset. the data design doesn't help understanding what is going on
    but data integrity will be non existent
    the developer will have to do all the verification and data checking

    is a variation on the EAV model and similar.

    if your application is so small you don't need to worry about maintenance over time, or scaling up the data then by all means make a rod for your back. One of these days someone somewhere will come up with a fully OOP database (some are getting close but not quite 'there' yet). until then if its a serious large scale application stick with the tried and tested data storage and manipulation)
    I'd rather be riding on the Tiger 800 or the Norton

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Probably the worst thing that would happen as a result of this schema would be if someone developed an application that was either popular or necessary. There is nothing in the schema that allows the database to handle integrity issues (like foreign keys, check constraints, etc.) and there is nothing apparent that would allow optimization. The guy who builds this database will be forced to maintain it... Probably forever, since no one else that was smart enough to do the job could be dumb enough to accept that responsibility.

    I've seen people create databases like this, and still be doing the "care and feeding" of the beast years after they wanted to "move on" to other jobs. Some of those people have paid enormous sums to get others to re-architect those systems so that the systems could be scaled and maintained. I've always felt sorry for the people who've painted themselves into the corner like this, but I've also been willing to negotiate with the companies to find a fair price to get them out of the bind that they'd created.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  5. #5
    Join Date
    Sep 2014
    Posts
    7
    i should have explained some details before i posed the question, sorry.

    "Horrible to maintain"
    "manky to understand"
    "looking at the data requires detailed knowledge of the application design from the outset. the data design doesn't help understanding what is going on"


    Developers are not expected to interact with the db directly at all. Or even need to see the tables or any kind of sql console/manager. Everything will be handled through admin side utilities which interact with the db through models. there will be utilities for exporting/importing data into various formats. it would be even possible to prepare a utility which will create totally normalized tables from the existing data using the models and export/import data in that fashion. maybe even create placeholder tables for data models which perform quite slowly (easy to measure due to single point of access to data through single data object) and use them to serve various types of queries.

    "Horrible to optimize"

    some performance loss is definitely in order. my aim is to make it not as much as eav models.

    "manky to extend"

    While i very much think that having 4 inter relate-able tables will cover for a whopping amount of data models, in case more is needed for some reason new tables of the same fashion can be added. or, additional fields to existing tables, which should not be necessary in most cases.

    "arguably may save time in developing in a OOP context"

    quite a lot, and that is the main reason. not only it will greatly simplify defining models through an array easily, but also through the presence of the map of these models in application init time, any interaction with the database will be possible through modules and admin interface without anyone needing to get to the database.

    another major advantage is there will be one junction point (the db model class) over which all interactions with database flows through. this will make further development of core engine, logging, testing, debugging, and even implementing various caching forms quite easier.

    "the developer will have to do all the verification and data checking"

    its module developers' responsibility to deal with their own module, like in wordpress or similar ecosystems. their data and code should not mix with the core engine at all. separate database admins are things out of the financial means of people who will use this application. even for small businesses which may use this application.

    i WILL give the means to module developers to directly run their own queries, so creating their own tables and using them will be possible. but i want to evade and discourage that as much as possible for the reasons explained earlier.

    "One of these days someone somewhere will come up with a fully OOP database (some are getting close but not quite 'there' yet)"

    now the thing is, this is intended and designed for web applications, especially for lamp stack. there mysql dominates unquestionably, and it seems it will keep doing so into the future. even now, numerous different databases which are also usable on the web and which have different approaches (some are even making up for shortcomings of eav formats as i hear) exist, but its unwise to design on such databases while the majority of web hosts are offering mysql.

    "but data integrity will be non existent"

    This, i would like to hear about more.

  6. #6
    Join Date
    Sep 2014
    Posts
    7
    Quote Originally Posted by Pat Phelan View Post
    Probably the worst thing that would happen as a result of this schema would be if someone developed an application that was either popular or necessary. There is nothing in the schema that allows the database to handle integrity issues (like foreign keys, check constraints, etc.) and there is nothing apparent that would allow optimization.
    primary_key, attribute_key1, attribute_key2, and attribute_key3 are all designed to be usable as primary keys and foreign keys for respective tables, in conjunction with data_type and attribute_name (maybe i should rename it to attribute_type) to enforce uniqueness in between different data types.

  7. #7
    Join Date
    Sep 2014
    Posts
    7
    however to be honest, i dont want to set the foreign keys in the database schema while creating it - i want to handle that inside the data object. i know this is going to give some performance loss, but i want the design to be flexible. Ie, someone to be able to use tables 2 and 3 for their data model, if they want.

  8. #8
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    I'm just curious at this point, but you seem to be very much OOP oriented and are looking to design an object store like an XML container. Have you ever done this in production before? If so, what kind of results, performance, and problems did you have?

    The reason that I'm asking is that you've gotten three responses that strongly suggested avoiding this design and no response that supports it. I'm trying to understand what is drawing you so strongly toward this design.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  9. #9
    Join Date
    Sep 2014
    Posts
    7
    Quote Originally Posted by Pat Phelan View Post
    I'm just curious at this point, but you seem to be very much OOP oriented and are looking to design an object store like an XML container.

    The reason that I'm asking is that you've gotten three responses that strongly suggested avoiding this design and no response that supports it. I'm trying to understand what is drawing you so strongly toward this design.
    -PatP
    What's drawing me to it is exactly what you yourself describe in the earlier paragraph, pat.

    I want to totally isolate the module development from directly interacting from database. even at the point of creating tables. Then handle all interactions through one single data object. Due to reasons of ease of coding, and also ease of controlling and extending the engine of the application.

    Have you ever done this in production before? If so, what kind of results, performance, and problems did you have?
    this is a good question, and it has a lot to do why i want to go for such a design.

    i indeed worked on many applications which had quite varied approaches to database design. from properly normalized tables to total eavs to hybrids and those who wanted you to access db directly and do everything yourself to those who created defined data objects and let you manipulate them.

    out of these, the ones which would be useful are oscommerce and wordpress. practically polar opposites.

    oscommerce is 10 year old spaghetti code. but, because its relational tables are well designed, it is able to cope up with 2 to 4 complex joins easily. Product listing, category listing and similar views have numerous such joins.

    And when you do a complex search spanning 2-3 tables with multiple criteria, the result comes in quite fast.

    Its so that you can comfortably run an oscommerce store with 70,000~ products (each of which have 2 or more additional rows in different tables - options and such -) in a cheap shared hosting like godaddy. can handle 2000~ unique visitors a day using the site and you wont have much problems.

    Wordpress on the other hand, uses a hybrid of eav and relational. all is well for doing simple listings, but when you search something with multiple criteria that is contained in metas, then you run into slowness with 2000-3000 posts in your database. However developing modules and plugins for wordpress was of course considerably easier.

    both use data objects to an extent. Oscommerce has order objects, wordpress has post objects and so on.

    Which are actually almost totally unnecessary since these data are just a collection of arrays if you look in detail. However it works well to objectify the data.

    The disadvantage of this in these apps was that, you frequently pulled and used unnecessary data. While wanting to update an order status, you would get a full order object and so on. wordpress has different functions for updating different parts of the post object, but this just adds complexity and overhead of code.

    At this point i should note that i am intending to enable data model to process PARTS of the object- ie, it will be able to update the product attribute of a product without taking the full product object. Over the data model of course.

    Accessing the data as an object is never as fast as accessing it through a direct sql query over well designed and normalized relational tables. To be honest i dont think the performance disparage in between relational and eav designs will decrease anytime soon.

    But, as the success of wordpress ecosystem shows, it is possible to forfeit a measure of performance, and still create applications - large and small - which can be used by a large variety and segment of users. Wordpress runs many setups now, from tiny blogs to digital download stores to cnn, reuters' websites. So in that regard, apparently the ease of development is a major factor in regard to an application or framework's ecosystem.

    So what i am doing is to implement the ease of development side of this, without going under the burden of eav/hybrid design performance, AND maintaining the relational database concept to a high degree.

    My foremost concern now - aside from any non-forethought major downside which you people may explain - is to create a distribution of field types which would cover most uses, but still not take too much space when the fields are empty/null.

  10. #10
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    One of the major problems you will find in saving all of the data as varchar will be in ordering. You may input data like this:
    Code:
    1
    2
    5
    10
    But, ordering by that field will give you
    Code:
    1
    10
    2
    5
    Dates get even worse, especially depending on some of the date formats out there. Is 1/31/2014 before or after 10/31/2014? That depends on where "/" gets sorted to in the (extended) alphabet. Then you get to start worrying over things like is "13/13/2014" really a date? Nothing on the database side will bark about that value. And if these things are going to be handled in the front end, then a conversion will have to be attempted, which will be more code on the front end, and lessens (perhaps not entirely) the "saves development time" argument.

  11. #11
    Join Date
    Sep 2014
    Posts
    7
    Quote Originally Posted by MCrowley View Post
    One of the major problems you will find in saving all of the data as varchar will be in ordering. You may input data like this:
    Code:
    1
    2
    5
    10
    But, ordering by that field will give you
    Code:
    1
    10
    2
    5
    Dates get even worse, especially depending on some of the date formats out there. Is 1/31/2014 before or after 10/31/2014? That depends on where "/" gets sorted to in the (extended) alphabet. Then you get to start worrying over things like is "13/13/2014" really a date? Nothing on the database side will bark about that value. And if these things are going to be handled in the front end, then a conversion will have to be attempted, which will be more code on the front end, and lessens (perhaps not entirely) the "saves development time" argument.
    no no, i wont have all data written as varchar. there will be an even distribution of fields. longtext, bigint, blob, date if applicable. i dont know how many per table yet, however there will be.

    there will be more varchar fields since these are needed quite often and in higher numbers in user facing websites.

  12. #12
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    There are only 3 things I am good at in software development. I make crap work faster, I can write large amounts crap in very short time periods, and I get called in to look at troubled projects. EAV models have lined my pockets for years because they always feed into my strengths.

    There was an engineer that became a software developer at the world's largest and most evil oil company. He built a project management system where the requirements was being able to handle an unlimited and undefinable amount of data points. I was brought in to look at performance issues he was having that he could not explain. I relayed the news and demonstrated to him that it was because of his "totally new and unique way" of doing design. I always find it unsettling to see grown men cry.

    RDBMs and set based theory have been settled science for decades now. There are clear ways of doing things right and wrong. Some of the people on this board have been doing database design for decades. They know what they are talking about most of the time.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  13. #13
    Join Date
    Sep 2014
    Posts
    7
    I dont know how i am failing to convey the fact that this is not an eav system.

    Its not even a hybrid like wordpress.

    It can be summarized as a relational table with generic fields, which are mappable to anything through the application.

  14. #14
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Quote Originally Posted by unity100 View Post
    It can be summarized as a relational table with generic fields, which are mappable to anything through the application.
    That is the definition of an EAV. You can eliminate the databasae level from your planned layout easily enough by using XML files and a simple folder hierarchy. With judicious use of file caching by the app or web server, this may even perform better than a database will for the same data load.

    The problems that I've never seen anyone solve is how to handle the inevitable integrity and performance issues.

    When you have more than a few dozen objects stored and have logical relationships amongst those objects, there will be corruption. It will be small at first, but will become unwieldly after some time passes. The design itself will fail a GAAP audit, so you won't be able to secure additional capital to fix the system once it is running.

    Because performance will be tied to the app/web servers and the inter-server network, there is very little that can be done to scale the design. There are some tricks that AWS and/or Azure can make available as stopgaps, but those will only relieve the problems for a very short period.

    If you read Bjarne Stroustrup's book with anecdotes on OOP (I don't remember the title of the book), it gives a great anaylysis of this kind of problem and why it isn't possible to solve the underlying problems.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •