Hi All,

I'm wondering if any of you could help me out with some advice on a schema I'm developing. It's a bit weird - I'll try to explain as clearly as I can.

I'm working on a db to handle a variety of records for a tiny cms app. There are 3 basic types of record that can exist in this system. Documents, Hyperlinks, and Folders.

Now, given these basic 'types' of record, there are 3 areas of the content system: Records A, which is a list of documents in a tree, Records B, which is also a list of documents in a tree, and Records C, which is a flat list of documents and hyperlinks - no tree.

To make things a bit weird, the distinction between 'folders' and the other basic record types is not a strong one. Ie - in retrieving a resultset, I want a mixed resulset of folders and documents, ordered by date, etc.

So for example, a resulset from a node in RecordsA might look like this

Folder1
Document1
Document2
Folder2
Folder3
Document3

Given these requirements it seems it would be best to represent folders as a superset. From there I've come up with 2 basic schemas. The schemas represent the same information, but I'm unclear as to whether one is 'better' than the other. I'm hoping some of you might have some input...


Schema #1 - This schema follows all the rules of generalization hierarchy. (Documents are a subset of a record superset.) Like so...

TABLE RecordsA
id
parentid
sortorder
isfolder
createdate
title
shortdesc

TABLE RecordsA_Documents
id
objectid
body


TABLE RecordsB
id
parentid
sortorder
isfolder
showonhomepage
createdate
datevisible
title
shortdesc

TABLE RecordsB_Documents
id
objectid
body


Table RecordsC (
id
sortorder
islink
createdate
title
shortdesc

Table RecordsC_Documents (
id
objectid
body

Table RecordsC_Links (
id
objectid
link



The thing that bugs me about this, is I have 3 'documents' tables which are essentially identical.

As an alternative schema I have thought up the following...


TABLE documents
id
body

TABLE hyperlinks
id
link

TABLE RecordsA
id
parentid
recordtype
recordid
sortorder
datevisible
showonhomepage
createdate
title
shortdesc

TABLE RecordsB
id
parentid
recordtype
recordid
sortorder
createdate
title
shortdesc

TABLE RecordsC
id
recordtype
recordid
sortorder
createdate
title
shortdesc


The difference here is that there is no objectid in tables documents and hyperlinks. Instead I have a foreignkey 'recordid' in each of the supersets, which points to either documents or hyperlinks depending on the discriminator 'recordtype.' (In cases where recordtype indicates a folder, recordid would simply be NULL.)

To my mind this seems like the better schema. Not only have I combined all the documents into a single table, the supersets are more flexible. Ie, if I ever wanted to add links to the list described in recordsA, I wouldn't need to change the schema.

However, when I showed this to a database fellow he told me it was bad, and I shouldn't do it, because it breaks the rules of generalization hierarchy.

Hrm. This is a bit vague to me, I confess. Should I instead be doing something like the following...


TABLE documents
id
objecttype
objectid
body

TABLE hyperlinks
id
objecttype
objectid
link

TABLE RecordsA
id
parentid
recordtype
sortorder
datevisible
showonhomepage
createdate
title
shortdesc

TABLE RecordsB
id
parentid
recordtype
sortorder
createdate
title
shortdesc

TABLE RecordsC
id
recordtype
sortorder
createdate
title
shortdesc


Where now the subsets have a discriminator and an objectid. The objectid would point to recordsa, recordsb or recordsc, depending on the discriminator 'objecttype'. (I think this is still breaking generalization hierarchy, since the subtypes 'documents' & 'hyperlinks' can belong to more than one supertype. But maybe it is better...)

Anyhow, I'm at the point where I'm really not too sure which technique is 'proper' anymore. If any of you could offer some input as to which direction I should follow, I'd be deeply appreciative.

Thanks!