Results 1 to 6 of 6
  1. #1
    Join Date
    Mar 2010
    Posts
    3

    Question Newbie Database Design Question, Can the database structure constantly change?

    Hi everyone!
    I have what might seem like a simple/stupid question, but I'm new so please bear with me.
    I'm trying to build a products database with the following relationships for a web application. Each product will have certain features that can change fairly rapidly and I'm wondering how to store the features information. Basically I have two ideas as shown below:

    IDEA ONE

    Tables:
    • Products(Product_ID[PK], Product_Name, Product_Description)
    • Features(Feature_ID[PK], Feature_Unit)
    • Product_Features(Product_Feature_ID, Product_ID[FK], Feature_ID[FK], Product_Features_Max, Product_Features_Min


    In this setup we have one Product_Features table with all data in it which means if I want to find all products with two features I have to do a join as follows (note this may be erroneous thinking on my part):
    SELECT Product_ID FROM Product_Features
    WHERE
    Product_ID IN (SELECT Product_ID FROM Product_Features WHERE Feature_ID = 1 AND Product_Features_Min > 1 )
    AND
    Product_ID IN (SELECT Product_ID FROM Product_Features WHERE Feature_ID = 2)

    Disadvantages:
    This can get really redundant since some of the products will have up 15 features and we might want to search on all of them. It just seems like bad planning to have to join a table to itself at the design phase.
    It also gets tricky if a feature has anything other than numerical values. For example if a product is available in several colors.


    IDEA TWO


    Tables:
    • Products(Product_ID[PK], Product_Name, Product_Description, Product_Archived)
    • Features(Feature_ID[PK], Feature_Unit, Feature_Description)
    • Product_Features(Product_Feature_ID, Product_ID[FK], Feature_1, Feature_2, Feature_3, etc (Not sure if this is necessary, but hold information as to whether a certain product has a feature with only yes or no values.
    • ..... (Lots of different individual feature columns)


    In this set up we break each feature into its own table for example, the Product_Color table would look like this (comma delimited, bold=column names, ends with ):
    Product_ID[FK], Product_Red, Product_Black, Product_White
    1, Y, Y, N
    2, N, Y, Y
    ... etc

    Disadvantages: If the features are changing fairly regularly the database structure would also be constantly changing (which I'm told is bad).

    QUESTIONS: Performance will also be very important, and I really don't know which would do better performance wise.

    NOTES: I honestly prefer option 2, it seems cleaner and more efficient to me, but I really want it to be right. I learned most of what I know about databases and design while I worked for a company that had a horrible data model (eg there were tables that had columns called date_added, date_modified and date_deleted that would all update when a record was modified, empty code tables, and absolutely no documentation -- most of the time to find data you had to use the testing application to reverse engineer data ). Since working there I am now terrified that I will fall into the same mistakes that they did in order to get things working.

    I'm completely open to any suggestions on how it should be set up!

    THANKS FOR ALL YOUR HELP!!!

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    idea two is far less attractive because (a) it violates the spirit of first normal form, and (b) it requires an ALTER TABLE when you need to add a new feature

    idea one is right way, but you don't need a join, nor multiple IN subqueries
    Code:
    SELECT Products.Product_Name
         , Products.Product_Description 
      FROM Products
    INNER
      JOIN ( SELECT Product_ID 
               FROM Product_Features 
              WHERE Feature_ID IN ( 1 , 2 ) /* list of feature ids */
             GROUP
                 BY Product_ID
             HAVING COUNT(*) = 2 /* number of required features */
           ) AS only_these
        ON only_these.Product_ID = Products.Product_ID
    this approach also allows you to select products which have a minimum number of features... for example, the IN list could contain 6 feature ids, and the HAVING clause could specify COUNT(*) >= 4

    you can't do that easily at all with joins
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Mar 2010
    Posts
    3
    Quote Originally Posted by r937 View Post
    idea two is far less attractive because (a) it violates the spirit of first normal form, and (b) it requires an ALTER TABLE when you need to add a new feature

    idea one is right way, but you don't need a join, nor multiple IN subqueries
    Code:
    SELECT Products.Product_Name
         , Products.Product_Description 
      FROM Products
    INNER
      JOIN ( SELECT Product_ID 
               FROM Product_Features 
              WHERE Feature_ID IN ( 1 , 2 ) /* list of feature ids */
             GROUP
                 BY Product_ID
             HAVING COUNT(*) = 2 /* number of required features */
           ) AS only_these
        ON only_these.Product_ID = Products.Product_ID
    this approach also allows you to select products which have a minimum number of features... for example, the IN list could contain 6 feature ids, and the HAVING clause could specify COUNT(*) >= 4

    you can't do that easily at all with joins
    Thank you so much!!! This makes sense.

    I have one more question, how do I deal with features that are not numerical, such as color? Would I have to add a row for every color in the features table?
    I guess i could add a feature group column which will allow me to group feature id's by type such as color, style etc?
    Last edited by temitayo_g; 03-05-10 at 12:17. Reason: an idea strcuck

  4. #4
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by temitayo_g View Post
    how do I deal with features that are not numerical, such as color? Would I have to add a row for every color in the features table?
    that's correct

    remember, your "feature_id" can represent any type of feature
    Features(Feature_ID[PK], Feature_Unit)
    clearly, Feature_Unit would have to be VARCHAR to hold colour names
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  5. #5
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by temitayo_g View Post
    I'm completely open to any suggestions on how it should be set up!
    Depends. I'd like to know what "features" are in this model. Are they really a collection of the same type of attribute, same data types, same constraints and logic that apply in each case? If so, then option 1 looks like it could be on the right track. Not sure why the min and max columns are necessary though.

    If however the features are truly different things with different data types or constraints or other logic then they are truly different attributes of a product. In that case option 2 looks much better and I would try to avoid option 1.

    In either case, don't be afraid of change because change is good. If a business isn't changing then it is dying. Data models are a representation of some portion of the business and therefore it is natural that data models must change too. The only question is how you manage change, not how to avoid it.

  6. #6
    Join Date
    Mar 2010
    Posts
    3
    Quote Originally Posted by dportas View Post
    Depends. I'd like to know what "features" are in this model. Are they really a collection of the same type of attribute, same data types, same constraints and logic that apply in each case? If so, then option 1 looks like it could be on the right track. Not sure why the min and max columns are necessary though.
    The reason that I preferred option 2, was because not all features will have the same data types. I didn't post the specifics is because of the nature of the products and I was afraid of getting banned. Here is an example that hopefully wont offend anyone.
    One feature is speed, not all products will have a speed setting, some will only have one speed and others may have 3. However on some products the speed functions are customizable and the min and max are to reflect that. For Example
    If a product has a max feature speed of 0 then it doesnt have a speed feature
    If has its min and max equal and not equal to 0 then speed is not customizable
    If it has its max > the min then it is customizable
    (guess there could be a Product_Customizable table that could store that information)

    On the other hand the unit, Max and Min mean nothing to a feature like color. In that case there would have to be a different feature id for every possible color. (note: I'm thinking that we would have to have Feature_Group Table and add a Feature_Group_ID to the features table that way we can always tell when a feature has a group associated with it. Which may solve the speed type problem)

    I hope I'm making sense. I'm getting to the point where I'm thinking aloud but it helps to have a sounding board.
    THANKS SO MUCH FOR THE HELP!

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •