I am currently designing a database for a small business to track documents. Their scheme of issuing documents for every job is so complex that I ended up having a number of many-to-many "intermediary subtables" with consists of a mapping of the primary keys of two tables:
In fact, i have 15 of these tables. As I said, the design was a complex one to accommodate future modifications in the way the physical documents are handled. Therefore, some of these tables may hold only 1 row for the next year and some may not even contain a row up until the time the owner decides to re-design the database.
Owing to the complexity of the database design and to the number of tables to be tracked, I was thinking of doing an optimization:
where mapping_type shall indentify which kind of primary keys initial_id and secondary_id are. For example, if initial_id is an invoice number and secondary_id is a receipt number, mapping_type would be INV-RCPT. Unique codes shall be used for the other mapping types.
I am seriously thinking of adopting this scheme. Comments will be highly appreciated!
Why? Simply so that you hve less tables? But you will have thrown away referential integrity (foreign keys), and your queries will be no simpler (in fact, somewhat more complex). So what is the real benefit of doing this?
yes i agree. I also find that the use of outer joins will be a more of an occurance if im querying an all-encompassing single table - again due to filtering, similar to your previous point - although this is specific to my own experiences.
thanks andrew! well, yes, aside from the trivial benefit of having less tables, I was thinking of the occurence of some tables being wasted if not used. As I said, some of the tables may hold only 1 row for the next year and some may not even have the benefit of having any row at all.
Another thing is, the application will be heavily using a function to check if a particular M:M relationship exists. The scenario is actually like this: an order may have an invoice, and an invoice may have a receipt. Also, an order may be issued a receipt directly. The order is paid if a receipt for it exists. The M:M tables would be:
To see if the the order has been paid already, I have to see if an M:M for the particular order exists in the invoice_receipt or in the order_receipt table. If I have used the one-big-table approach, I would only checked one table only; whereas in this one, I first have to check invoice_receipt and if it does not exist, check order_receipt.
This scenario repeats at least 5 times on different combinations for the project.
And also on this note, I would like to ask: in balancing good database design (avoid redundancy, avoid null, etc) versus speed in accessing data (redundant data exists but is taken care of by the application's front-end), where and how do we draw the line?
It seems that you are being driven towards a super-generic, everthing M:M everything else model because the requirements have not been clearly analysed. It is difficult to comment with so little background knowledge, but for example if the existence of one receipt associated with an order means that it has been paid then what would be the point of creaing a second receipt for that order? In other words, it seems that each order may be associated with just one receipt.
It still isn't clear to me how having one table simplifies the problem. To see if an order 123 has a receipt the SQL will be something like this with your approach:
from table_mapping or
where or.initial_id = 123
and mapping_type = 'ORDER-RCPT'
from table_mapping oi
join table_mapping ir on ir.primary_id = oi.secondary_id
where oi.primary_id = 123
and oi.mapping_type = 'ORDER-INV'
and ir.mapping_type = 'INV-RCPT'
where order_id = 123
from order_invoice oi
join invoice_receipt ir on ir.invoice_id = oi.invoice_id
where oi.order_id = 123
I wouldn't be unduly concerned about creating a few 2-column tables that have no rows or 1 row - they aren't going to break the bank, are they?
Generic database designs are something I haved ranted about before here.
Regarding redundancy vs. speed: in a typical OLTP database (i.e. not a data warehouse) you must avoid redundancy and the associated potential for data corruption during updates. Performance should normally be fine with a normalised, non-redundant database design provided appropriate indexes are created, optimiser stats gathered, etc.
Actually, there can exist more than one receipt for every invoice or order as partial payment is allowed. But you hit the point: OTLT. I guess I'm just not comfortable with a lot of database tables for a very simple project. But thanks for the help and will now start reading your blog
to avoid later problems on the use of so many tables, you must fully understand the overall need of the company. This can be avoided with proper gathering of data before design of various entities needed with proper normalization needed with thorough knowledge on how they relate with each other. (entity relationship)
Later problems on design can be avoided if the real problem is fully understood and customer needs is fully met.
If you want to make a new design, please take time to investigate all other possible needs of your clients.