Unanswered: Please help me understand this schema, and validate my understanding
Trying to understand the database schema from a PHP based simple blog app (tutorial), as captured with MySQL designer here.
It models typical blogs (like Wikipedia, Typepad etc., but heavily simplified as a demo), when an User (typically the Author) writes Blog posts, and other Users add comments. For ease of classification/searching, the Blog has few Tags (similar to these forums here on dbforums.com).
Few things that I am wondering about are:-
1. The "Tags" table has no relationship with "Post" table, although a "tag" attribute is there. Isn't this a bad idea in the sense that, assume that the Author wrote 5 posts all containing some tag "xyz", which the author later modified to say "abc". This change is likely to be visible on the posts themselves, but in the tag-cloud (showing all available tags in this Blog), would still show "xyz", even though any related post is no longer available ?
2. Why are the 2 relationships "Non-identifying" ? From what I understand the difference between identifying & non-identifying relations, is in the fact the case of identifying relationship, the existence of child entity (s.a. Post) is subject to condition that parent entity (s.a. Author) exists. Say, if an Author's account is deleted for some reason, I'd expect all of their Posts, the Comments on each of their Post, to also be deleted. So, the relationships should've been identifying.
Of course, the blog web application seems to work fine with this data-model, but can I assume that the data-model is far from perfect, and that 2 points raised above, are valid points ?
Thanks in anticipation.
Last edited by jayachar88; 11-01-11 at 18:25.
Reason: fix typos
Well, it isn't. It's from a PHP framework's tutorial demo section. I suppose DB-design righteousness was one of the last concerns, and things we probably kept deliberately simple, although given my own lack of depth on DB-design has me wondering if I might've come up with anything better at all
Originally Posted by r937
TEXT is a very large varchar column, so the column called tags in the posts table undoubtedly has values which consist of comma separated tag ids
Originally Posted by r937
regarding identifying relationships, they are called identifying if the child table's primary key includes the foreign key of the parent
since all these tables use the ubiquitous "id" primary key (undoubtedly an auto_increment), therefore all relationships are non-identifying
Ah, that's interesting (and very simple) interpretation of identifying vs non-identifying.
If I may ask, are there any inherent disadvantages of designing the database using mostly non-identifying relationship's, vs identifying ones? Are there specific cases that are better suited for (or exclusive domain of) Identifying relationship ? Trying to understand the qualitative merits, without probably taking a full DB course (though I guess I should).
The problem with having a large text field with comma separated values if how to search through these to find matching ones. The second issue is this particular model will not allow any indexing to be placed on it. The result is a full table scan of the posts table to eventually find the rows that match the tag searches.
A better solution is to have 3 separate tables:
This structure allows you to index the tagid to quickly return the list of posts that match.
Though there is nothing technically wrong with having non-identifying relationships it will give you a massive headache when developing and supporting the application.
Looks like my instinctive design was the better design, similar to what you've explained, although it was not founded on what is appropriate. I did design my first few sample database projects with identifying relationships, and tables whose sole job was to embody the relationship, with none or very few other attributes. My worry with that approach was, am I creating far too many tables, and would I need excessive JOIN operators ?
Now coming to non-identifying relationships, I am hoping that they much be good for some niche cases, else why aren't they outright barred ? I hope someone can shed some light on this aspect as well.
Join operations can sometimes be a hindrance to performance, however, if we look at it from a database internals point of view, the tags definitions will be stored in a few data blocks (these are the minimum sized chunks of data that the database reads at any one time). If the data block is 1024 bytes, then you may end up having 10 blocks containing all of your data. The relationship table will have even more values in a single data block and again will contain every tag/post relationship.
Indexes work by identifying efficiently the filtered data. It then attempts to read the data block containing the entire record. This potentially introduces two data block reads, however, if the index contains both the values you need to complete the join means that it will not need to attempt a second access to the database block containing the data.
All the blocks in question are cached by the database server so that join operations are performed in memory which optimizes performance of the joins.