Page 1 of 2 12 LastLast
Results 1 to 15 of 25
  1. #1
    Join Date
    Mar 2010
    Posts
    10

    Normalization Vs. Overdesign

    I'm working on redesigning a database of the "technically works but doesn't really work the way a database should" variety. I think that, at some point, my predecessor just gave up and crammed everything into one table. I'm trying to correct that.

    Some background information: Our product is customized to order. A set of customized options is referred to as a "Specification" and anywhere from 5-20 different components can be selected. Not all components are applicable for every spec.


    What I can't decide on is how far it really needs to be corrected. If I break out every component into its own table, I'm going to end up with a lot of tables that have <5 entries, some of which will consists of little more than an ID field and one additional information field. It is unlikely that any scalability will ever be needed for these properties, either.

    As an example, let's say the product is... shoes. I have a ShoeSpec table consisting of (SpecID, CustomerID, HeelID, LaceID, price). In reality, this is a much more involved product with a lot more traits per spec, but this should work for the sake of example. SpecID and CustomerID are both keys for a spec, as the price is dependent on both.

    I would then have the Lace table(LaceID, Lacetype), and Heel Table (HeelID,HeelType).

    The only data that will ever be in the lace table, though is
    ID|Type
    1|Laceless
    2|Round
    3|Flat

    And for heels
    ID|Type
    1|Stiletto
    2|Flat

    Question 1: How necessary and desirable is it to break all of these out like that, given their static nature and the limited amount of data? I feel like, even though this may be the proper way to do it, I'm making a mountain out of a mole hill.

    Question 2: Keeping with my shoe example, but adding one further degree of complexity, suppose I wanted to add aglets (that plastic bit on the end of a shoe lace) to the specification. I can see two ways to go about this:

    In both, I would first create an Aglet table (AgletID, color).

    What I'm not sure on after that is how to include that into a specification. An aglet, being ultimately a trait of a shoelace, could be added to the lace table, which would become: (LaceID, Lacetype,AgletID)

    Alternatively, I could add it directly to the specification table, which would now be (SpecID, CustomerID, HeelID, LaceID, AgletID, price)

    Finally, I could create a third table, ShoeLaceSpec(SLSID, LaceID, AgletID), which would then contain the assorted combinations of laces and aglets.

    If this were done, the Spec Table would now look like (SpecID, CustomerID, HeelID, SLSID, price).

    Basically, in this case, would it be best to make aglets a linked property of laces, make it just another aspect of the final product specification, or would it be best to combine the two into a sub-specification that is then used in place of either in the final product spec?

    Thoughts, advice, input? Am I overthinking this? I don't see any reason that any of the above wouldn't necessarily work, but I feel like there is probably an objectively correct way to go about it, and I'm partial to doing it that way.
    Last edited by zeroinsum; 03-03-10 at 18:31.

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    "I'm trying to correct that."

    why?
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Mar 2010
    Posts
    10
    I hope you'll excuse my snark, but that... seems like an odd question coming from someone who apparently has written a book on the topic. However, I will answer it:

    For a variety of reasons.

    First of all, on sheer principle, its design is just objectively wrong in a lot of ways. For example, the order table contains fields named "Quanty1, Description1, Quantity2, Description2.." etc., all the way out 5 deep. This is how our bills of lading are currently populated. A completely separate table is used for actually entering the specification information, but for some reason isn't referenced for generating a bill of lading.

    This means there is needless work being done (in the form of duplicate data entry), and, additionally, more opportunities for mistakes to be made.

    Furthermore, from a more functional perspective, while at present it is adequate for tasks such as data entry and generating a bill of lading (though, as mentioned, that is needlessly complicated and error-prone in its present design), it doesn't work so well for, say, reporting, because the data entry is incredibly arbitrary in the present design. If I wanted to produce a report on high-heeled shoe sales, I can only do that by filtering for "highheel" on the product name and hoping that what I get in the report actually reflects reality.

    So, out of curiosity - your question seems to be implying that it's not worth correcting. Might I ask why that is?
    Last edited by zeroinsum; 03-03-10 at 19:17.

  4. #4
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    No, you are not overthinking this. The way you have designed it with aglet as a sub table of lace is correct, otherwise you might end up with a situation where they opted for no laces but specified aglets.

    Properly done normalization can create many tables that have less than 5 rows. If the answer is only YES or NO then you shouldn't create a table, but if there is any possibility the options could be added to then they should be their own table even if there are only two currently. I have seen tables like Gender with only two rows (Male and Female) get other rows added as the user finds other requirements (this was on a parts table where later Eyelet and Neither was added).

  5. #5
    Join Date
    Mar 2010
    Posts
    10
    Thanks for the input, Mark. One further question (actually, I'm sure I'll have many more questions to ponder as, like I said, the actual product specifications are a bit more involved than my example):

    Having settled on AgletID as a reference on the Lace Table, suppose that the customization was taken a step further, and the customer could specify different aglets for each end of the shoelace?

    At that point, it seems like I could flesh out the Aglet table from (AgletID, Color) to, say, (AgletID, LeftAgletColor, RightAgletColor) and the aglet table would then consist of all of the potential aglet combinations. This seems awkward to me, especially if there is no functional difference between a "left" aglet and a "right" aglet.

    Alternatively, I guess - correct me if this wouldn't work as I think it would - The Lace table could be changed to (LaceID, LaceType, LeftAgletID, RightAgletID), both of which would link to the Aglet table's AgletID field which, again, seems awkward to me.

    Is there a better way to handle that?
    Last edited by zeroinsum; 03-03-10 at 21:02.

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by zeroinsum View Post
    your question seems to be implying that it's not worth correcting. Might I ask why that is?
    you inferred, i didn't imply -- i was merely asking for details, and you provided some



    in my experience, a system that "technically works" is often best left alone, but you will of course be the final arbiter as to whether the cost and effort to change the system is worth it from an ROI perspective
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    I could see aglets having attributes of color, shape, and length. At least the first two would be foreign keys to tables containing possible aglet colors, and aglet shapes. The length could be either a numeric field or might be a key to a table if only specific lengths were allowed.

  8. #8
    Join Date
    Mar 2010
    Posts
    10
    That's very true, and I suppose that would further render (AgletID, LeftAgletColor, RightAgletColor) a rather improper way to do it.

    I have, I think, one last question before I'm ready to start building tables...

    I'm going to depart from my shoe example for this. I think it's not a great one for this next question.

    So, still a customizable product that is ultimately being assembled to spec.

    My last issue that I'm not sure how to deal with is the fact that there are two different product lines. These two items are the two base products - everything else is a modifier to these.

    While there are some traits that are qualitatively identical between the two, other traits are only applicable to one or the other.

    For instance, let's say the product is... containers. And the two major product lines are "buckets" and "boxes". So, for the sake of example, let's assume the following:

    -For any individual specification, a box Vs. a bucket is a mutually exclusive option. A specification cannot include both a box and a bucket.

    -They both have certain traits that are qualitatively the same (even if quantitatively different) - let's limit those to "volume" and the material they are made out of.

    -Both have a material field, but the materials lists are not the same. For example, say buckets can be made from metal or plastic. Boxes can be made from wood, cardboard, or plastic.

    -Buckets have a handle and a lid, boxes do not.

    So, here's what I need some advisement on:

    Question 1: Is it necessary, in this case, to have a buckets table and a boxes table? It seems like it is, for the sake of normalization, but it also seems like I should be able to have a "products" table for, well... my products. It seems rather ungainly to need a different table for each product.

    Question 2: Assuming I do split them into a buckets table and a boxes table - If I incorporate them both into one specification, since bucket vs. box is a mutually exclusive option at the specification level, I will have a lot of "Not applicable" fields. For instance, if a specification were for a box, the "lid" and "handle" fields would be not applicable.

    Does this mean I should further split specifications into BucketSpecs and BoxSpecs? While doing that eliminates the "Not applicable" problem, it seems to cause an additional problem or two.

    For instance, I have the Orders table (OrderID, Date, CustomerID) (there is more to it, of course, but I'm trying to keep things abbreviated), and the OrderDetails table (OrderDetailID, OrderID, SpecID, QTY) - that is the individual line items for a given order (which can consist of multiple specifications).

    If I split the bucket specs and the box specs into separate tables, it would then have to be (OrderDetailID, OrderID, BucketSpecID, BoxSpecID, QTY) which seems to leave me right back where I was before, in that I have two mutually exclusive fields.

    Additionally, splitting the specs up would necessitate two separate data entry controls for placing an order - one for boxes and one for buckets.

    Question 3: Materials. It's a shared trait, but the list is different for a box than for a bucket. Would I be better off creating separate tables for BucketMaterials and BoxMaterials, or would it be acceptable to use data validation methods to ensure a wooden bucket isn't spec'd out?

    So, how would you handle this? Unless I am missing something, it seems like I will be forced to employ a data validation check at some point along the way to ensure that mutually exclusive data is not being entered. If I split everything out (which, for normalization purposes seems like the right thing to do), I appear to get stuck with a slightly more cumbersome UI.

    Thoughts? Am I missing something obvious here?
    Last edited by zeroinsum; 03-04-10 at 13:27.

  9. #9
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    My thoughts:

    Order (OrderID, Date, CustomerID)
    OrderDetail (OrderDetailID, OrderID, ProductType [Box/Bucket], Qty)
    BucketDetail (BucketDetailID, OrderDetailID, Material, etc)
    BoxDetail (BoxDetailID, OrderDetailID, Material, etc)

    ProductTypeMaterialOptions (ProductType [Box/Bucket], MaterialID) -- Lists valid materials for each product type.

  10. #10
    Join Date
    Mar 2010
    Posts
    10
    That's actually fairly close to what I've come up with so far. The bucket and box detail tables aren't something I'd come up with yet. I'm trying to see how they fit in. I'm going to tinker for a bit, maybe post a diagram of where I'm at if I just can't get it sorted after a bit of that. Thanks again for your help, Mark.

  11. #11
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by zeroinsum
    Thoughts, advice, input? Am I overthinking this?
    What is it your database actually holds information about? Is it actually shoes? It just seems unwise to get recommendations on how to design a database but not tell us what the database is going to hold. There is also no idea of scale ie number of different types? how many layers there might be ie shoes - shoe laces - aglets (I've learnt a new word). Or how much data is going to be held. This might make a difference on the design.

    It would also help if you could provide more detail as to what benefits you want from the new design over your existing design. As Rudy has already pointed out - if there are no real benefits from all this change other than it being better normalised then someone at some point is going to ask why you spent a load of time and money making all these changes while running the risk of ending up with a system that might not be as effective as the original system.

    Can I also ask why you don't just have a single field holding a short description of the spec ie "black laced stileto shoes" and save yourself all the hassle?

    Mike

  12. #12
    Join Date
    Mar 2010
    Posts
    10
    What is it your database actually holds information about? Is it actually shoes? It just seems unwise to get recommendations on how to design a database but not tell us what the database is going to hold.
    It's not actually shoes, but the abstraction is close enough to be relevant. I'm also not looking to have anyone design it for me. What I'm looking for here is advice on good practices and proper design. I could fairly trivially redesign this to work adequately (far better than it presently does, at that). Redesigning it to be merely okay isn't really why I'm here.

    It would also help if you could provide more detail as to what benefits you want from the new design over your existing design. As Rudy has already pointed out - if there are no real benefits from all this change other than it being better normalised then someone at some point is going to ask why you spent a load of time and money making all these changes while running the risk of ending up with a system that might not be as effective as the original system.
    The primary benefit I'm expecting to get from this is data that is actually reliable, highly representative of reality, and is actually robust enough to be utilized beyond the limited capacity in which it is presently.

    As I said before, it currently "works" only in some regards. In others, it leaves a lot to be desired. An Excel sheet would "technically work" for this in some regards, too and, similarly, if I'm really bent on it, I *could* use a hammer to pound in a screw - that's hardly the right way to go about it, however.

    If nothing else, its current format requires duplicate data entry for every line item of every order, and that, all by itself, is ghastly.


    Can I also ask why you don't just have a single field holding a short description of the spec ie "black laced stileto shoes" and save yourself all the hassle?
    There WILL be a "shortdesc" field, but the inclusion of such is not some cureall - if it were, you may as well ask, "Why bother with normalization - or even relational databases at all, for that matter - when you can just jam everything into an Excel sheet and call it a day?" So, yes, I'll have a description field, but its function will be for easy recognition and differentiation of specs - a convenience - as a description really should be, not as a lazy method of storing individual datapoints. At present, a description field is basically all there is. That's not a very reliable way to keep track of information. If, for example, I want to know how many stiletto shoes we have sold, I have to hope that, for all of the (potentially thousands) of different specifications calling for a stiletto heel, nobody accidentally misspelled "stiletto" somewhere along the way.
    Last edited by zeroinsum; 03-04-10 at 18:31.

  13. #13
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    I for one am glad you are taking on a design and doing it right, not just good enough to get by. I have inherited too many badly designed databases.

    I also have no problem with using example products instead of your real ones. The design concepts hold true regardless of product. The important part is that we communicate well enough that you can get the design concepts you need and can apply them.

  14. #14
    Join Date
    Mar 2010
    Posts
    10
    Okay, having tinkered a bit, I think I must be missing your intent, Mark.

    Order (OrderID, Date, CustomerID)
    OrderDetail (OrderDetailID, OrderID, ProductType [Box/Bucket], Qty)
    BucketDetail (BucketDetailID, OrderDetailID, Material, etc)
    BoxDetail (BoxDetailID, OrderDetailID, Material, etc)

    ProductTypeMaterialOptions (ProductType [Box/Bucket], MaterialID) -- Lists valid materials for each product type.
    I think I follow you on most of that, but I lose you with the relationship between OrderDetail, BucketDetail, and BoxDetail. If I'm understanding your intention correctly, they would function as subtables of the OrderDetail table, connected by the orderdetailID as a foreign key. I can't really picture how that works in practice. I can see where something like (for example)...

    OrderDetail(OrderDetailID,BucketDetailID,BoxDetail ID,QTY)
    BucketDetail(BucketDetailID,MaterialID)
    BoxDetail(BoxDetailID,MaterialID)

    ....would work (well, except for that mutual exclusivity issue), but I can't quite grasp the other way around. Unless I'm missing some additional relationship, it would seem like the correct detail table (box vs bucket) would have to be dynamically chosen (?)

  15. #15
    Join Date
    Jul 2009
    Location
    Michigan
    Posts
    125
    When selecting an order you would select from Order and use an INNER JOIN to OrderDetail on Order.OrderID - OrderDetail.OrderID. You would also use a LEFT OUTER JOIN to BucketDetail on OrderDetail.OrderDetailID = BucketDetail.OrderDetailID and another LEFT OUTER JOIN to BoxDetail on OrderDetail.OrderDetailID = BoxDetail.OrderDetailID. Depending on the product type one of the LEFT OUTER JOINs will fail and one will succeed. The OrderDetail.ProductType column will tell you which of the joins should have worked, which in turn tells you which columns you are expecting to find information in, either BucketDetail columns or BoxDetail columns.

    Depending on your application you might do that as all one join as above, or just go as far as the OrderDetail table until the need for more information comes up - in which case you can use the ProductType column to know which table to go look for that information.

    So instead of having mutually exclusive columns you would have mutually exclusive tables. A specific OrderDetailID would only be found in one of the sub tables and not the others.

    Hopefully that is clearer.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •