Results 1 to 9 of 9
  1. #1
    Join Date
    Mar 2010
    Posts
    1

    attribute data for many product types

    Hi all,

    I am in a bit of a quandary regarding my design for a website that has to keep track of thousands of different product types (e.g. cars, speakers, baby strollers, etc.) and store attributes (e.g. turning radius, maximum output, color, etc.) for each.

    My understanding is that if people are going to be searching for, say, all blue baby strollers, I'm best to have a table for each type of product (in addition to a "generic" product table that stores things common to all products, like name and brand). If it weren't for concerns about searching speed I'd be keen to use an EAV type of model, but I've been discouraged from that based on things I've read.

    This seems like a common problem, but I have yet to see a good answer for how to deal with what I'm trying to do. I know there are a ton of sites out there tracking this kind of data, but I'm finding it difficult to imagine that they've got thousands of different tables in order to have a "details" table for each product type. Is it possible they're all using EAV and that the EAV-naysayers, while technically correct, are exaggerating the performance loss of using that type of model? Or are they just using one big "product" table with columns covering every attribute, and just not worrying about all the empty values?

    Any help would be greatly appreciated. I just know there's a common way of doing this, and I'd really like someone to let me in on it!

  2. #2
    Join Date
    Feb 2010
    Posts
    16
    If you ever obtain a definitive answer, let me know..!

    Meanwhile, I found the following document helpful:

    http://pugs.postgresql.org/files/Hierarchial.pdf

    Cheers.

  3. #3
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Quote Originally Posted by neomech View Post
    EAV-naysayers, while technically correct, are exaggerating the performance loss of using that type of model?
    They are and they are not. For simple queries, such as "where type='speakers' and brand='somebrand'", there is no problem. However, if you go a couple of steps further, which is bound to happen, and look for "type='speakers' and 'max output' > 200 and size in ('large', 'medium')", and the consequences are often catastrophic.

    In my view, EAV model is a valid approach, as long as a) queries against are of a simpler kind; or b) you foresee performance issues and deal with them early on. Some of the approaches I saw were: materialized views; full text indexes; table partitioning; even indexing of the resulting HTML pages, where the search would not query the database, but rather "google" the pages generated by the application.
    ---
    "It does not work" is not a valid problem statement.

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    The big danger in EAV (what us "Naysayers" are continually warning about), is doing the WHOLE DATABASE as EAV. If you needed one EAV table simply for tracking product attributes, that would probably be OK. Another alternative for dynamically defined fields is the XML datatype which is available in SQL Server.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by neomech View Post
    My understanding is that if people are going to be searching for, say, all blue baby strollers, I'm best to have a table for each type of product (in addition to a "generic" product table that stores things common to all products, like name and brand). If it weren't for concerns about searching speed I'd be keen to use an EAV type of model, but I've been discouraged from that based on things I've read.
    With EAV you shouldn't get any problems with search speed as you'd just search for entities of a particular type with a colour attribute of blue. This is one of the main indexes on the table and it will return the ids of all the blue baby strollers. If there are other attributes to filter by then you apply these filters to the set of ids above. You might well get performance issues if you try to generate traditional reports using the EAV model but I'm not sure you need to do this anyway as sales etc can be stored in normal 3NF tables.

    The 3NF alternative of using a 1000 tables may be technically correct but it would be a bitch to maintain adding new fields and tables each time your product types alter. I doubt if it would be very good performance wise either as you'd have to pick which table to use and find what fields are available on every SQL operation.

    The hierarchical model is an option work but you still have the issue of having to alter the schema to add new product types. You also have the performance issue of working out where your data will come from. Also as a developer you'll now have to think hard each time to work out where in your schema you'll find the data you need.

    The simple way to test the above schemas is to just draw each ER model on paper (limit it to hold just 10 product types) and then try writing some SQL to query it. Now imagine having 1000 product types. The EAV search query will look identical no matter how many product types you have so it seems like a simple choice to me.

    You won't get many folks recommending EAV on this forum as it's not a good approach in most situations and it does have a number of well documented drawbacks but it can solve your issue regarding how to manage 1000's of product types each with different (and perhaps changing) fields. The main thing is to be very careful about doing integrity checks manually as the normal checks available through the RDBMS wouldn't work ie domains, FK, types.

    Quote Originally Posted by n_i View Post
    They are and they are not. For simple queries, such as "where type='speakers' and brand='somebrand'", there is no problem. However, if you go a couple of steps further, which is bound to happen, and look for "type='speakers' and 'max output' > 200 and size in ('large', 'medium')", and the consequences are often catastrophic.
    I think these types of queries can be handled in EAV. Just pull the ids of all the entities of type "speakers". Then just iteratively filter by each criteria ie 'max output' > 200 and size in ('large', 'medium')". You can improve the performance by recording how restrictive each attribute is - size only has 3 values (large,medium and small) - then apply the more restrictive filters first. A better improvement is to apply the most restrictive filter when getting your initial set of ids. The EAV queries will always be slower than a dedicated query written to work on a single table but not catastrophically slow.

    Odd - I was expecting to be publicly stoned for supporting any application of EAV so I was quite careful how I wrote my words. Imagine my surprise when I pressed send and found n_i and blindman already supporting (a partial) EAV solution
    Last edited by mike_bike_kite; 03-16-10 at 16:19.

  6. #6
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Quote Originally Posted by mike_bike_kite View Post
    I think these types of queries can be handled in EAV. Just pull the ids of all the entities of type "speakers". Then just iteratively filter by each criteria ie 'max output' > 200 and size in ('large', 'medium')". You can improve the performance by recording how restrictive each attribute is - size only has 3 values (large,medium and small) - then apply the more restrictive filters first.
    Well, this isn't the worst case that can happen. Imagine an address table implemented as EAV, with each address element being a separate record. For a few million people. With potentially multiple addresses. And a search form that ORs every criterium that is entered and builds the query dynamically (or doesn't, which is even worse). And 300 concurrent users, each beginning every transaction with an address lookup.
    ---
    "It does not work" is not a valid problem statement.

  7. #7
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by n_i
    Well, this isn't the worst case that can happen. Imagine an address table implemented as EAV, with each address element being a separate record. For a few million people. With potentially multiple addresses.
    I totally agree that EAV would be very inefficient in that particular scenario but then I guess most of us (and certainly including me) would just use a standard 3NF table to store such data. EAV should only be considered as part of a solution if the world you're modelling is likely to change frequently. So isn't it fairer to compare EAV to a 3NF solution in just such a scenario - say the problem posted by the OP

  8. #8
    Join Date
    Jun 2003
    Location
    Toronto, Canada
    Posts
    5,516
    Quote Originally Posted by mike_bike_kite View Post
    EAV should only be considered as part of a solution if the world you're modelling is likely to change frequently. So isn't it fairer to compare EAV to a 3NF solution in just such a scenario - say the problem posted by the OP
    Sometimes you have to live with decisions made by others. I guess my point was that, in whatever scenario, EAV is a valid approach but can fail spectacularly if pushed hard and far enough. As long as you understand the implications, you don't have to religiously avoid using EAV.
    ---
    "It does not work" is not a valid problem statement.

  9. #9
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Recursive EAV is where things really get hairy. If you find yourself being drawn to that point, you should really consider an XML datatype instead.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •