Page 1 of 9 123 ... LastLast
Results 1 to 15 of 126
  1. #1
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527

    OTLT/EAV design - why do people hate it?

    Definitions
    OTLT=One True Lookup Table
    EAV=Entity Attribute Value (a big param table with associated meta data)

    The issue
    I'm trying to find out exactly why OTLT and EAV are commonly looked down upon by database designers. I feel they have a number of advantages in certain problem areas. I had some experience implementing such a system for storing all the semi static data that describes the configuration within a bank (servers, SAN, networks, printers, people, software etc).

    For the sake of argument lets assume that the EAV method allows us to keep meta data on any field. That we can the link any field to any type of thing we wish to store information about. All the params will be stored in a single table as varchar (I can feel you cringe from here). The lookups would similary be held in a single table (our OTLT) as varchar.

    Advantages of OTLT over multiple lookup tables

    • One easy to find table with all valid values for given fields. This means a maintenance screen can be added so these values can be kept up to date in the future.
    • No need to create a new table for new lookup values. This means the database schema doesn't change, it means no new code needs to be written and it means you don't need to schedule a weekend release for your code.
    • A single stored proc can be provided to test if a value is valid before adding it to your param table. This proc should be the only method of adding data to the database.
    • Fewer tables in a database usually means less code meaning less bugs.
    • All the features you provide in your OTLT will be available for all lookup values. So if you provide a method to order values in HTML forms then this facility will be available for all lookups. You could provide default values or transformations that might turn UNKNOWN values to NULL.
    • I can understand that they don't feel comfortable with storing the allowed values 1,2,3,4 in a varchar field within our OTLT. Would it help if we regard these values not as being integers but simply as valid values that are stored in our valid values table.


    Advantages of EAV over standard design databases

    • The screens can construct themselves from the meta data meaning no work is required to build screens (other than our generic one).
    • The screens for any type of data will be very similar making it easy to understand for users.
    • Meta data can provide appropriate integrity checking which will be used automatically by any input into the system whether a screen or a feed. Warnings can be logged rather than errors being raised which cause other data to be lost.
    • The system can be changed on the fly without releasing new code, UAT's and weekend releases. Most standard databases end up rotting slowly over time as the dev people decide whether to go through the above process or just just squeeze the data into fields that weren't designed for the new purpose.
    • Because of the ease of change this means the system can easily reflect what's currently in place rather than lagging 6 months behind while waiting for coders to catch up. This also means the project is less likely to get canned due to high on going costs.


    Ok I'd appreciate it if people reply with clear arguments rather than just labeling the method a beginners approach etc. If you've never used an OTLT/EAV system then please feel free to try out my little example and keep in mind it takes a matter of hours rather than months to set up a new system with feeds, screens etc. This is not a pitch as it's not for sale - I really do want to know why people are so rabid against OTLT and EAV.

    Mike

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    rabid? where did "rabid" come from?

    you are obvioulsy a programmer, so you see the programming advantages -- and let's be honest, they are considerable!!

    good luck to you, sir
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Actually I make my living designing databases and producing "solutions" - usually financial information or reporting systems using Sybase/MS SQL Server (and Oracle if I'm forced to). I just feel that the database should provide a service to the client and do what's required rather than being some mathematical model of perfection that will soon collapses in a heap the moment I move on to my next project

    Mike

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,566
    "Oh, that way madness lies; let me shun that."
    William Shakespeare, "King Lear", Act 3 scene 4
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I can feel people shaking their heads the moment I mentioned OTLT and EAV, all the web pages seem to suggest it's a path that shouldn't be followed. But the real madness is waiting for a few concreate arguments against the method when I can see so many advantages.

    Mike

    PS Blindman - why do you have a chess opening as your signature?
    PS tgata - there is a query language on the system mentioned above but it's not SQL - try help > reporting language

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,566
    Quote Originally Posted by mike_bike_kite
    I can feel people shaking their heads the moment I mentioned OTLT and EAV, all the web pages seem to suggest it's a path that shouldn't be followed. But the real madness is waiting for a few concreate arguments against the method when I can see so many advantages.
    Yes. You are right, and everybody else is probably mistaken.
    Quote Originally Posted by mike_bike_kite
    PS Blindman - why do you have a chess opening as your signature?
    Because I enjoy chess. It is a game of deep and unfathomable principles. The better you understand the principles, the better you play the game. If you violate the principles, you will find yourself in trouble later. If you follow the principles, you will be able to get out of a lot of sticky positions later on. Sometimes these principles can only be understood through experience, and cannot be easily explained to new players. Often, new players ignore the advice of more experienced players, and have to make these mistakes on their own before they come to an understanding of how the game should be played.
    Are you catching the analogy to OTLT, EAV, and database design?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Often, new players ignore the advice of more experienced players, and have to make these mistakes on their own before they come to an understanding of how the game should be played
    Sadly I'm a bit of an old player already (at chess and databases) and yet still can't see how the cons outweigh the pros for a certain class of application - say configuration management systems. I really am after solid arguments rather than people just saying they have a bad feeling in their water.

    Are you catching the analogy to OTLT, EAV, and database design?
    Probably well before I got to this sentence

    But why the evans gambit as opposed to any other opening?

    Mike

  8. #8
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by mike_bike_kite
    PS Blindman - why do you have a chess opening as your signature?
    it is a gambit

    like every gambit, you take it or you turn it down

    this thread is a gambit, by you

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  9. #9
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,566
    No, some gambits are sound. This is more like a new player asking "Why not bring my queen out early? Its my most powerful piece so it makes sense to use her right away."
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    in all fairness to mike da bike, it seems he is asking "okay, but why is bringing my queen out early a bad idea?"

    and if i understand blindman, he's saying you have to write a lot more sql, and then you'll know why

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    In the cases where I have encountered a DBA who liked OTLT/EAV I maintained a logical model with all the look up tables accounted for and allowed the physical design to implement OTLT/EAV. However, as the processing power of hardware increases and the cost of having all the joins for separate lookup tables decreases I have been moving away from any implementation of OTLT/EAV. As a designer, it is just easier to manage and keeps the creation of new lookups out of the hands of the developer and user.

  12. #12
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Originally Posted by r937
    if i understand blindman, he's saying you have to write a lot more sql, and then you'll know why
    That wouldn't count as a conclusive argument to me even if I'd never written a line of SQL. However, having been in this game for 20 years, I still believe OTLT and EAV is the best answer to a certain class of problem. I'm simply looking for a clear cut argument that proves it's not. I won't be impressed by people simply saying that I'll learn in time or that they read somewhere that it's bad - I want to know why it's bad!

    Originally Posted by certus
    However, as the processing power of hardware increases and the cost of having all the joins for separate lookup tables decreases I have been moving away from any implementation of OTLT/EAV.
    As processing power goes up it should favour OTLT/EAV designs which are inherently slower than hand crafted designs. The big factor though is the cost of development & maintenance and that must favour OTLT/EAV - there's little coding required, no table design, and in many cases screens and feeds are built automatically.

    Originally Posted by certus
    As a designer, it [normal table design] is just easier to manage and keeps the creation of new lookups out of the hands of the developer and user.
    Who says the user should do any of this work - it should be fully in the designers hands. I'm sure most users would make a complete mess of building their own homes but that doesn't mean we should all live in caves - we get a builder in. And what can be easier than having all your lookups in one table, presumeably with a simple interface, that allows you to create any lookups that may be needed in the future. The alternative to OTLT is to create new tables each time, insert the values, write code to check against this table, test it, UAT it and then release it over the weekend. I know what seems easier to me!

    Mike

  13. #13
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    14,923
    I see OTLT and EAV as two wholly separate issues, so I'll deal with your thread as having two distinctly different questions.

    OTLT is just fine for very small systems, but it does not scale well. It always melts down on me before I reach a few thousand simultaneous users, often before I reach a few dozen. This is because it creates an arbitrary, artificial "hot spot" in your schema that becomes your application's "Achilles Heel" and your DBAs worst nightmare from a performance management perspective. Since I seem to always end up in the "hot seat" trying to support the production version of the system, I won't knowingly sign up for something that has consistantly proven to be a disaster for me in the past.

    As an analogy for both the postive and negative aspects of EAV, think about the hive structure that is used for the MS Windows registry.

    EAV is wonderfully convenient for the developer becuase the schema can be extended at will and as needed. This is logically just one step removed from the days of "flat file" storage, where every application could store what it wanted in whatever format it wanted to store things... Granted that EAV isn't quite as chaotic as flat files, but the difference is one of degree not of quality.

    EAV is a two-edged sword in this respect... The ease of use for the developer is because the schema is dynamic, just like XML. They can do whatever they want, at any point in time. This also means that they have to live with the potential chaos that this can bring, since a typographical error can cause application failure and as the schema grows and becomes more complex query time grows roughly at a Log(N) rate where N is the number of EAV nodes for well defined queries (which is not bad), and between N * Log(N) and N * N for ill-defined queries (which is REALLY bad).

    Compare OTLT and EAV to a Third-Normal-Form (TNF) schema with suitable indexing... You end up with more tables, but tables are cheap in my world. You end up with more indicies, but well designed indicies are really, really cheap in my world. TNF induces no artificial hot spots. The significant queries can almost always ride an index, so nearly all single row queries perform at very close to Log(N), and multiple row queries are at Log(N) for the first row and near zero for all subsequent rows.

    Besides all of that, you as a developer don't have to fight with your DBA nearly as much, so you can spend more time being productive and coding!

    -PatP

  14. #14
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Yeah, what he says. And what I said.

  15. #15
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Originally Posted by Pat Phelan
    OTLT is just fine for very small systems ... because it creates an arbitrary, artificial "hot spot" in your schema
    The OTLT is a lookup table so it's read rather than written. This should mean it will never get hot spots.

    The param table in the EAV could suffer from hot spots but it depends on how you build your indexes and how the batches shove the data into the system - one batch at a time should present no issues. If you have 1000's of users also trying to add data at the same time then you shouldn't really be using EAV.

    EAV designs are really for semi static data and not high volume transaction systems so hot spots shouldn't be an issue.


    Originally Posted by Pat Phelan
    As an analogy for both the positive and negative aspects of EAV, think about the hive structure that is used for the MS Windows registry.
    The registry is just a complete mess - while the meta data around a modern EAV system should make it difficult to determine whether you are using a TNF database or an EAV design.

    Originally Posted by Pat Phelan
    EAV is a two-edged sword in this respect... The ease of use for the developer is because the schema is dynamic, just like XML. They can do whatever they want, at any point in time. This also means that they have to live with the potential chaos that this can bring
    If you let your users / developers design your database then I suspect you'll get chaos whether using EAV or TNF.


    Originally Posted by Pat Phelan
    since a typographical error can cause application failure
    Why? Even if there isn't a front end that forces you to enter correct values for types ad params I'd just go back, delete the old param and add the correct param - then reschedule the batch to load the data.

    Originally Posted by Pat Phelan
    as the schema grows and becomes more complex query time grows roughly at a Log(N) rate where N is the number of EAV nodes for well defined queries (which is not bad), and between N * Log(N) and N * N for ill-defined queries (which is REALLY bad).
    I'm not that hot at maths but I think I understand what you're getting at. To extract a row from a TNF takes just the time of navigating the index (I assume B-trees give Log(N)). In EAV you need to get a row for each param but remember the params for an object should be next to each other in the param table so once you have the first param the rest will be in the same block - it's a logical read rather than a physical one. Of course EAV will always be slower than TNF but as computers and disks speed up I think this is becoming less of an issue.

    Originally Posted by Pat Phelan
    but tables are cheap in my world
    They're cheap to create but remember that nearly every table needs to have a method of getting data into it - either a screen or a feed. Each new screen/feed requires extra coding which means longer dev times, higher cost and usually more bugs. If a table is just to be maintained purely through manual SQL the data in this table will quickly become stale as the people who take over once you've gone will simply be to afraid to touch it (thats if they know about it).

    Lets say we have a TNF database up and running and we want to add a new table or alter the structure of an existing table. Making the change takes seconds but then we need to test the whole system, perhaps go through a UAT and finally schedule your change on live for some weekend in the future (usually the weekend of your daughters birthday). With an EAV design you just add the new type or param onto the system via a screen - end of story.

    More tables means more expense!

    Originally Posted by Pat Phelan
    Besides all of that, you as a developer don't have to fight with your DBA nearly as much, so you can spend more time being productive and coding
    I design databases (and their corresponding applications) for a living. Nearly always financial systems - nearly always TNF but just occasionally an EAV. DBA's in the UK tend to simply administer databases rather than design them and are usually very friendly

    Mike

    PS Good arguments by the way.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •