Page 1 of 2 12 LastLast
Results 1 to 15 of 22
  1. #1
    Join Date
    May 2004
    Posts
    7

    Question on self-containment in databases

    I have a question about database design and I was wondering if anyone here could answer it.

    I'm working on a database which has "items". The catch is that these items can contain other items.

    Example: A bag of toy soldiers. Each soldier is considered an item, and the bag itself is considered an item. I have to express both items, as well as the fact that the bag contains the soldiers.

    Thus far I have two tables. ITEM and ITEM-ITEM (as bizzare as that sounds).

    ITEM is straight forward, just the ItemID and other related fields (name etc.). ITEM-ITEM has "ItemID:ItemID:No" where the first field is the container item (the bag), the second is the contained item (the soldiers) and the number is how many are held.

    I don't like this solution though, as it uses the same primary key twice and, to be honost, just looks wrong.

    What is the proper way to solve this?

  2. #2
    Join Date
    Mar 2001
    Location
    Lexington, KY
    Posts
    606
    Well, in your example there's nothing stopping you from having soldiers which contain other things (e.g. a meaningless assertion that a soldier contains something is possible in your schema).

    So maybe you could have three relations:
    CONTAINER_ITEMS( stuff ) <- bag would go here
    REGULAR_ITEMS( stuff ) <- soldiers would go here

    CONTAINED_IN( container_itemID, regular_itemID, etc. )

    Hmm.. That doesn't seem nice either..
    Thanks,

    Matt

  3. #3
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Hi,

    It appears that there are two distinct concepts that you are interested in:

    1. Items; and
    2. Collections of Identical Items.

    And it looks like a customer may buy either a set of different items or a collection of items.

    Is there any difference between a "collection" of items and a customer buying 20 copies of a particular item? In other words, does a collection have a predefined number of identical items? For example, you can have a collection of either 10 or 20 or 30 toy soldiers but no other combination is possible.

    If this is true, then you can treat each such collection as a distinct "item" having a special type of item, collection, that lists the quantities.

    For example, one such data model would be:

    Table Item (Item_Id, Name, ....);
    Sample data:
    Row 1: 1, 'Toy Soldier', ...
    Row 2: 2, 'Barbie Doll', ..
    Row 3: 3, 'Toy Soldier Group (10)', ..
    Row 4: 4, 'Toy Soldier Group (20)', ...

    And a second table Collection (Collection_id, component_id, quantity,...)

    Both collection_id and component_id would refer to item id, and every item that is a collection would have exactly one row in collection table to identify what items it holds and how many.

    Sample data in collection table could be:
    (Collection_ID , Component_ID, Quantity, ...)
    Row 1: 3, 1, 10
    Row 2: 4, 1, 20

    This way, it is easy to find the total of any items sold.
    For example, to find the total "Toy Soldiers" sold,
    you would first get the item_ID for toy soldier,
    then, all rows in Collection that have component_Id pointing to "Toy Soldier" and the quantity,

    With this information you can query the "sales" table to find the total sales of "Toy Soldiers".

    In the interest of consistency, you could also treat every item as a collection, some of them clearly having only itself as component with a quantity of 1.

    Hope that was not too long. And hope it is useful to you.


    Ravi

  4. #4
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Hi,

    In the original post, Oberiko said:

    I don't like this solution though, as it uses the same primary key twice and, to be honost, just looks wrong.

    Whenever we are dealing with a hierarchy, there is always a table that has two foreign keys refererring to the same parent table.

    This situation deals with a one-level hierarchy of parts. Therefore, the two foreign keys referring to the same parent table occurs here.


    Ravi


  5. #5
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Your dealing with a recursive structure. You only need one table to represent it.

    Item(item_id, parent_item_id, item_name)

  6. #6
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Hi,


    There are two ways of handling recursive structures: using one table or using two tables.

    With the one table approach, you get a "self-referencing" entity. I do not like them mainly because there will be nulls in the parent_item_id column. That causes problems while querying.

    Additionally, in this case, it does not seem like a parent-child hierarchical relationship, but is a "collection" of identical items. Parent child relationships usually are not like this. A parent is almost never a collection of identical children.

    Therefore, my instinct is to go with the two table approach in this case. Additionally, the two table approach lets us track how many identical items make up a collection. For example, we can say that there are two collections of "toy soldiers", one package with ten toys and the other with 25 toys.

    In a single table approach, it is not clear to me how this information can be presented without changing the meaning of the item table.

    For example, if we include quantity in the item table, it now reads

    Item (item_id, parent_item_id, item_name, quantity)

    This looks like it is more than an Item entity. What exactly "quantity" means is not obvious from the table definition.

    In the two table approach, the meaning of the term quantity is more readily seen. Not crystal clear, but clearer.

    As I stated at the beginning of my first message in this thread,

    It appears that there are two distinct concepts that you are interested in:

    1. Items; and
    2. Collections of Identical Items.


    That was my rationale for the two table approach.

  7. #7
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    In the one table approach you do not have to have nulls, but a "null symbol". For example, wherever a null is to be used you programmatically use an asterisk "*" instead. Then your queries aren't hindered by nulls.

    Quantity is actually part of the name of the item Soldiers_1, Soldiers_10 and Soldiers_25 not a separate attribute. However I don't think you want to dedicate a row to each toy soldier. This is where item quantity actually comes in. Soldiers_1 quantity 10, Soldiers_10 quantity 1. You will have a row for each batch of soldiers that can have the quantity altered and you will have a row for each bag containing a batch that can have the quantity altered. However, if you go this way you must be able to "break" a bag if one soldier in the batch is sold. In otherwords, if a child has it's quantity reduced the parent disappears.
    Last edited by certus; 05-21-04 at 00:54.

  8. #8
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Hi,

    I am a little bit surprised at the one table solution.

    Certus said: Quantity is actually part of the name of the item Soldiers_1, Soldiers_10 and Soldiers_25 not a separate attribute.

    If the names are to include quantity as well, then we are violating one of the fundamental principles of good database design: a column should mean only one thing. Now that this column also includes quantity, what will the name of this column be?

    Aside from that, I think you allude to the fact that if we change the quantity of items in a "Collection" item, then you will have to add a new item to the database.

    The more significant concern I have is this: How do you calculate the total soldiers sold?

    You must parse the name of the item. So now, you have a rule that does this parsing.

    Additionally, there must be a rule saying that the last part of the name of an item must include its quantity. Some data entry person may inadvertently violate this rule and then your queries are meaningless.

    This arises from the fact that quantity is implicit, not explicit.

    The query to calculate the total quantity sold becomes very difficult to write and easy to be subverted by improper naming of items.

    Information is now hidden, stored in programmers minds, but not directly in the system.

    This is what happens when one takes a simplistic approach.

    I prefer explicit solutions where the database has the information, rather than implicit solutions where knowledge is stored in programmers minds only.

    Ravi

  9. #9
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Maybe you should read my post again.

  10. #10
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Hi,

    Certus, I have reread your email and have not received any further enlightenment.

    Coming back to the topic under discussion: could you please tell me if you disagree with the statement that the inclusion of the quantity in the name violates fundamental design principles?

    If this is indeed a violation of design principles, why should we use the one table approach? What is the benefit that we get that outweighs the problems that we face when violating fundamentals?

    Also, in the one table approach, how do you calculate the total quantity of "soldiers" or any other "atomic" item sold? That is not easy and involves parsing the name.

    Ravi

  11. #11
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Hi,

    In an earlier post, certus said:

    In the one table approach you do not have to have nulls, but a "null symbol". For example, wherever a null is to be used you programmatically use an asterisk "*" instead. Then your queries aren't hindered by nulls.

    Using "null symbols" for nulls is efectively the same. You end up writing more complicated queries to handle the "null symbol" case.

    Ravi

  12. #12
    Join Date
    Dec 2003
    Location
    Canada
    Posts
    710
    Inverted hierarchy

    One table

    Null symbol "None"
    Attached Thumbnails Attached Thumbnails soldier.gif  

  13. #13
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    Certus,

    I am a bit confused here.It looks like you are referring to a "sales" or "order_items" table with the structure displayed in the table in your post? (item, item_qty, child_item, child_item_qty) In addition, there will probably be an "order_id" column. Thus, the primary key will be a composite (order_id, item_id)


    In any case, you can clearly see that the child_item and child_item_qty are directly dependent on the item_id, which happens to be a part of the primary key, but not the whole key. This creates data integrity issues.

    If two orders are for the same item, the the data will look like this:
    OrderID1, Bag 10, 2, Soldier, 10
    OrderId2, Bag 10, 4, Soldier, 10

    As you can clearly see, "Soldier" and "10" appear in both rows and are dependent only on "Bag 10", not the order ID. As such, they are redundant.

    All these problems arise because your one table design violates fundamental design principles.
    I have yet to see any benefit from the one table approach that justifies this violation of fundamentals.

    As I get more explanations, I see only that the design is becoming more and more complicated, yielding no benefits in return.


    Ravi

  14. #14
    Join Date
    Mar 2001
    Location
    Lexington, KY
    Posts
    606
    I still think that three relations would be required -- because there's no way to specify which items can be a 'bag' and which are not, so you could logically have 'Soldiers' contains 10 Soldiers, which is obviously invalid.
    Thanks,

    Matt

  15. #15
    Join Date
    Apr 2004
    Location
    Toronto, Canada
    Posts
    249
    MattR writes:

    I still think that three relations would be required -- because there's no way to specify which items can be a 'bag' and which are not, so you could logically have 'Soldiers' contains 10 Soldiers, which is obviously invalid.

    Actually, the constraint can easily be imposed by requiring that the item_id and the parent_item_id are not the same. This constraint can be imposed as a check constraint on the table.

    Of course, this leads to the possibility that we can have a bag of bags.
    That would be handled by a trigger that said that if any item exists as a bag, it may not be used as a part of another bag.


    Ravi

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •