Results 1 to 6 of 6
  1. #1
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    3

    Supertype/subtypes appropriate for shared behaviour?

    I'm trying to model a system where a number of entities have similar associations with other entities (common behaviour) and I'm interested in knowing how others have dealt with this.

    We have people, cases (which can have several people as members), and groups (which can also have several people as members). These entities can all have notes, meetings, and tasks. This could lead to a proliferation of join tables, redundancies, etc. etc. I'm wondering if anyone has evidence/experience to suggest whether using a supertype/subtype model in this case (where the subtypes are quite different) would have benefits over separate join tables. The supertype could be 'entity' and the subtypes would be people, cases, and groups.

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    that would work but would add a layer of obfuscation, er, abstraction, that you might not want

    i see nothing wrong with notes, meetings, and tasks tables for each of people, cases, and groups

    so you have 9 tables instead of 7, i think it's simpler
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    3
    Thanks for your reply, Rudy. The 3 x 3 table option you suggest would certainly simplify some aspects, but the structural repetition of the x_notes, x_meetings, and x_tasks tables concerns me. Also, it would be reasonably common to need to query across a group of three such tables (e.g. list all tasks that...). I suppose using UNION isn't the end of the world. I just find myself going around and around when trying to decide how to model this.

  4. #4
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    I think I'd tend to go the other way and have a more generic table for People, Cases and Groups. I'd have a common id field and just use a type field to specify what type of entity it is. I'd use a Relationship table to say what people belong in what cases and groups. My reasoning is :
    • All notes, meetings and memos link with a single table
    • The resulting code would be simpler
    • The above also means FKs can be used (if that's important to you)
    • You have the option of having groups being made up of other groups etc

    This type of design would mean roughly 5 tables which is simpler still Just my 2c.

    What's the difference between cases and groups?

  5. #5
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    3
    Quote Originally Posted by mike_bike_kite
    I think I'd tend to go the other way and have a more generic table for People, Cases and Groups. I'd have a common id field and just use a type field to specify what type of entity it is. I'd use a Relationship table to say what people belong in what cases and groups. My reasoning is :
    • All notes, meetings and memos link with a single table
    • The resulting code would be simpler
    • The above also means FKs can be used (if that's important to you)
    It most certainly is important to me to use FKs!

    Quote Originally Posted by mike_bike_kite
    • You have the option of having groups being made up of other groups etc

    This type of design would mean roughly 5 tables which is simpler still Just my 2c.
    I'm wary of generic tables like that - they tend to require columns that apply to one type and not another (e.g. first_name), lots of NULLs, and so on. Then you find that process X applies to one type and not another; type A can have related Zs, but type B can't; etc. etc. That's partly why I was leaning towards the supertype-subtype approach, but then Stéphane Faroult (The Art of SQL) got me concerned about what this might lead to:

    You can use subtypes incorrectly. As one of the reviewers remarked, having a kind of super-generic parent table that is referred to several times in the most innocuous query isn't a model for efficiency. Such a super-generic parent table is hammered by all queries if it stores vital information. Subtypes must be born of logical distinction, not of an ill-conceived desire to implement with tables a strong inheritance scheme inspired from object-oriented techniques.
    Quote Originally Posted by mike_bike_kite
    What's the difference between cases and groups?
    That's an interesting point that I hadn't considered - it would appear to be "obvious" what the difference is in the real world system that's being modelled, but perhaps they could be put together and distinguished by their type.

    (Please bear with me...) So far I've come up with five ways of modelling this:

    1. People, Cases, and Groups are all related to one Notes table, which has three foreign key fields (person_id, etc.). For: simple structure. Against: Needs NULLs and constraints to ensure only one FK in Notes is filled; many queries effectively require case statements. This was the first model I discarded because I believe that over time that kind of structure creates more and more headaches...

    2. People, Cases, and Groups relate to separate Person_notes, Case_notes, and Group_notes tables (as per Rudy's suggestion above). For: fairly easy to work with (I think the most complicated queries require UNIONs); FK constraints. Against: repeated structure; similar data (notes) in several tables; does not capture similar behaviour of the three main entities (and any common behaviour added in future requires more repeated structures).

    3. People, Cases, and Groups are all subtype tables related to a supertype Entities table, which in turn relates to Notes (which has entity_id as FK). For: FKs; reduces repetition. Against: adds another layer of abstraction; complicates queries; are they really subtypes?

    4. People, Cases, and Groups have their own join tables (People_notes, etc.) to a single Notes table. For: note data resides in a single table. Is this really just a special case of #2? It might avoid UNIONs for queries that require data from the various note types, but would probably require LEFT OUTER JOINs from Notes instead.

    5. Notes is a supertype of Person_notes, Case_notes, and Group_notes, which relate to People, Cases, and Notes respectively. In practice this is probably just a restatement of #4.

    Sorry to go on at length about this, but it seems to be an example of a situation that wouldn't be too uncommon yet doesn't seem to be very well covered in whatever I've read. I am very interested to see what people think about the various possibilities and am hoping it will be instructive (not just for me!). I think my main issue is that I (still) don't have enough experience to know whether going with model X above will lead to complication Y in circumstance Z.

  6. #6
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by cmbart
    I'm wary of generic tables like that
    For the moment would you mind including all the approaches that were suggested so that we can make a more informed decision. I've listed the tables in my approach below just to make it more clear. Obviously you know what other data goes into meetings etc. The Entities table would have all the fields for person, group and case but allow them to be null though the common fields across all types (like name) would be shared and not allow nulls :
    • Types: name, [apply_process_X , can_have_related_Z]
    • Entities: id, type_name, name, data ...
    • Relationships: id, parent_id
    • Notes: id, note data ...
    • Meetings: id, meeting data ...
    • Tasks: id, task data ...

    Quote Originally Posted by cmbart
    they tend to require columns that apply to one type and not another (e.g. first_name), lots of NULLs, and so on. Then you find that process X applies to one type and not another; type A can have related Zs, but type B can't; etc. etc.
    In the type table above I tried to show how to store this type of info. It might be worth thinking how you'd store this info in the other designs you have. To me it looks easier to store this information using the generic approach rather than having separate tables. Also consider what you need to do if the rules change once the system is live. In the example above you just change the data in the type table what would you do using your other methods?

    Quote Originally Posted by Stéphane Faroult
    Such a super-generic parent table is hammered by all queries if it stores vital information
    If we're talking about reads then it doesn't really matter as the table will just be cached. If we're talking about writes to the table then yes it will cause slightly more locking but then you have to ask yourself how many times a second you'll be adding, updating or deleting a user, group or case. I suspect it won't be enough to make any difference.

    Quote Originally Posted by cmbart
    So far I've come up with five ways of modelling this:
    I'll ignore the fact you missed out my approach and instead say it might be best to just draw the ER diagram on paper and then try some typical queries you might want to write in rough SQL (perhaps search for something, to display the members in a multi layered group, to display all the notes for a group or a person etc). The aim would be to see how easy it is to write the code using each method and then discuss your findings. You might also want to list the features of each design.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •