Results 1 to 7 of 7
  1. #1
    Join Date
    Nov 2011
    Posts
    3

    Unanswered: Please help me understand this schema, and validate my understanding

    Hi,

    Trying to understand the database schema from a PHP based simple blog app (tutorial), as captured with MySQL designer here.

    It models typical blogs (like Wikipedia, Typepad etc., but heavily simplified as a demo), when an User (typically the Author) writes Blog posts, and other Users add comments. For ease of classification/searching, the Blog has few Tags (similar to these forums here on dbforums.com).

    Few things that I am wondering about are:-

    1. The "Tags" table has no relationship with "Post" table, although a "tag" attribute is there. Isn't this a bad idea in the sense that, assume that the Author wrote 5 posts all containing some tag "xyz", which the author later modified to say "abc". This change is likely to be visible on the posts themselves, but in the tag-cloud (showing all available tags in this Blog), would still show "xyz", even though any related post is no longer available ?

    2. Why are the 2 relationships "Non-identifying" ? From what I understand the difference between identifying & non-identifying relations, is in the fact the case of identifying relationship, the existence of child entity (s.a. Post) is subject to condition that parent entity (s.a. Author) exists. Say, if an Author's account is deleted for some reason, I'd expect all of their Posts, the Comments on each of their Post, to also be deleted. So, the relationships should've been identifying.

    Of course, the blog web application seems to work fine with this data-model, but can I assume that the data-model is far from perfect, and that 2 points raised above, are valid points ?

    Thanks in anticipation.

    cheers,
    Jay.
    Last edited by jayachar88; 11-01-11 at 18:25. Reason: fix typos

  2. #2
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    that's a horrible design, i hope it isn't yours

    TEXT is a very large varchar column, so the column called tags in the posts table undoubtedly has values which consist of comma separated tag ids

    eeewwww

    regarding identifying relationships, they are called identifying if the child table's primary key includes the foreign key of the parent

    since all these tables use the ubiquitous "id" primary key (undoubtedly an auto_increment), therefore all relationships are non-identifying

    you asked, eh

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  3. #3
    Join Date
    Nov 2011
    Posts
    3
    Quote Originally Posted by r937 View Post
    that's a horrible design, i hope it isn't yours
    Well, it isn't. It's from a PHP framework's tutorial demo section. I suppose DB-design righteousness was one of the last concerns, and things we probably kept deliberately simple, although given my own lack of depth on DB-design has me wondering if I might've come up with anything better at all

    Quote Originally Posted by r937 View Post
    TEXT is a very large varchar column, so the column called tags in the posts table undoubtedly has values which consist of comma separated tag ids
    Bull's eye.

    Quote Originally Posted by r937 View Post
    regarding identifying relationships, they are called identifying if the child table's primary key includes the foreign key of the parent

    since all these tables use the ubiquitous "id" primary key (undoubtedly an auto_increment), therefore all relationships are non-identifying
    Ah, that's interesting (and very simple) interpretation of identifying vs non-identifying.

    If I may ask, are there any inherent disadvantages of designing the database using mostly non-identifying relationship's, vs identifying ones? Are there specific cases that are better suited for (or exclusive domain of) Identifying relationship ? Trying to understand the qualitative merits, without probably taking a full DB course (though I guess I should).

    Thanks for taking time to answer my query.

  4. #4
    Join Date
    Sep 2009
    Location
    San Sebastian, Spain
    Posts
    880
    The problem with having a large text field with comma separated values if how to search through these to find matching ones. The second issue is this particular model will not allow any indexing to be placed on it. The result is a full table scan of the posts table to eventually find the rows that match the tag searches.

    A better solution is to have 3 separate tables:

    Tags
    - tagid
    - tagname

    Posts
    - postid
    - data

    Posts_tags
    - tagid
    - postid

    This structure allows you to index the tagid to quickly return the list of posts that match.

    Though there is nothing technically wrong with having non-identifying relationships it will give you a massive headache when developing and supporting the application.
    Ronan Cashell
    Certified Oracle DBA/Certified MySQL Expert (DBA & Cluster DBA)
    http://www.it-iss.com
    Follow me on Twitter

  5. #5
    Join Date
    Nov 2011
    Posts
    3
    Thanks for answering, @it-iss.com

    Looks like my instinctive design was the better design, similar to what you've explained, although it was not founded on what is appropriate. I did design my first few sample database projects with identifying relationships, and tables whose sole job was to embody the relationship, with none or very few other attributes. My worry with that approach was, am I creating far too many tables, and would I need excessive JOIN operators ?

    Now coming to non-identifying relationships, I am hoping that they much be good for some niche cases, else why aren't they outright barred ? I hope someone can shed some light on this aspect as well.

  6. #6
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    in the many-to-many relationship table between tags and posts, the two relationships (back to the parent tags and posts tables) are identifying
    Code:
    CREATE TABLE post_tags 
    ( post_id INTEGER NOT NULL
    , tag_id INTEGER NOT NULL 
    , PRIMARY KEY ( post_id, tag_id )
    , INDEX tag_posts ( tag_id, post_id )
    , FOREIGN KEY ( post_id ) REFERENCES posts ( post_id )
    , FOREIGN KEY ( tag_id ) REFERENCES tags ( tag_id )
    );
    the relationships are identifying because the FKs are part of the PK

    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  7. #7
    Join Date
    Sep 2009
    Location
    San Sebastian, Spain
    Posts
    880
    Join operations can sometimes be a hindrance to performance, however, if we look at it from a database internals point of view, the tags definitions will be stored in a few data blocks (these are the minimum sized chunks of data that the database reads at any one time). If the data block is 1024 bytes, then you may end up having 10 blocks containing all of your data. The relationship table will have even more values in a single data block and again will contain every tag/post relationship.

    Indexes work by identifying efficiently the filtered data. It then attempts to read the data block containing the entire record. This potentially introduces two data block reads, however, if the index contains both the values you need to complete the join means that it will not need to attempt a second access to the database block containing the data.

    All the blocks in question are cached by the database server so that join operations are performed in memory which optimizes performance of the joins.
    Ronan Cashell
    Certified Oracle DBA/Certified MySQL Expert (DBA & Cluster DBA)
    http://www.it-iss.com
    Follow me on Twitter

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •