Results 1 to 10 of 10
  1. #1
    Join Date
    Jul 2006
    Posts
    4

    multiple M:M subtables or one big table?

    I am currently designing a database for a small business to track documents. Their scheme of issuing documents for every job is so complex that I ended up having a number of many-to-many "intermediary subtables" with consists of a mapping of the primary keys of two tables:
    e.g.
    Code:
    receipt_invoice
    -----------------
    receipt_no
    invoice_no
    In fact, i have 15 of these tables. As I said, the design was a complex one to accommodate future modifications in the way the physical documents are handled. Therefore, some of these tables may hold only 1 row for the next year and some may not even contain a row up until the time the owner decides to re-design the database.

    Owing to the complexity of the database design and to the number of tables to be tracked, I was thinking of doing an optimization:
    Code:
    table_mapping
    ---------------
    initial_id
    secondary_id
    mapping_type
    where mapping_type shall indentify which kind of primary keys initial_id and secondary_id are. For example, if initial_id is an invoice number and secondary_id is a receipt number, mapping_type would be INV-RCPT. Unique codes shall be used for the other mapping types.

    I am seriously thinking of adopting this scheme. Comments will be highly appreciated!

  2. #2
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    Why? Simply so that you hve less tables? But you will have thrown away referential integrity (foreign keys), and your queries will be no simpler (in fact, somewhat more complex). So what is the real benefit of doing this?

  3. #3
    Join Date
    Jun 2004
    Posts
    127
    Quote Originally Posted by andrewst
    your queries will be no simpler (in fact, somewhat more complex).
    Tony can I ask is this because forming queries that mainly comprise of self joins increase complexities in terms of the query statement size & additonal use of outer joins etc?

    i know from my own experience that self joins also hinder performance - because of serialization.

  4. #4
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    That could be a factor perhaps, but it wasn't what I meant. I meant that instead of this:
    Code:
    select r.x, i.y
    from receipts r
    join invoice_receipts ir on ir.receipt_id = r.receipt_id
    join invoices i on i.invoice_id = ir.invoice_id;
    ... one now has to write this:
    Code:
    select r.x, i.y
    from receipts r
    join table_mapping ir on ir.initial_id = r.receipt_id
    join invoices i on i.invoice_id = ir.secondary_id
    where ir.mapping_type = 'INV-RCPT';
    That's one extra WHERE condition per M:M link. Also, it's now easier to make a mistake: have I used initial_id/secondary_id the right way round?

  5. #5
    Join Date
    Jun 2004
    Posts
    127
    yes i agree. I also find that the use of outer joins will be a more of an occurance if im querying an all-encompassing single table - again due to filtering, similar to your previous point - although this is specific to my own experiences.

  6. #6
    Join Date
    Jul 2006
    Posts
    4
    thanks andrew! well, yes, aside from the trivial benefit of having less tables, I was thinking of the occurence of some tables being wasted if not used. As I said, some of the tables may hold only 1 row for the next year and some may not even have the benefit of having any row at all.

    Another thing is, the application will be heavily using a function to check if a particular M:M relationship exists. The scenario is actually like this: an order may have an invoice, and an invoice may have a receipt. Also, an order may be issued a receipt directly. The order is paid if a receipt for it exists. The M:M tables would be:
    order_invoice
    invoice_receipt
    order_receipt
    To see if the the order has been paid already, I have to see if an M:M for the particular order exists in the invoice_receipt or in the order_receipt table. If I have used the one-big-table approach, I would only checked one table only; whereas in this one, I first have to check invoice_receipt and if it does not exist, check order_receipt.

    This scenario repeats at least 5 times on different combinations for the project.

    And also on this note, I would like to ask: in balancing good database design (avoid redundancy, avoid null, etc) versus speed in accessing data (redundant data exists but is taken care of by the application's front-end), where and how do we draw the line?

  7. #7
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    It seems that you are being driven towards a super-generic, everthing M:M everything else model because the requirements have not been clearly analysed. It is difficult to comment with so little background knowledge, but for example if the existence of one receipt associated with an order means that it has been paid then what would be the point of creaing a second receipt for that order? In other words, it seems that each order may be associated with just one receipt.

    It still isn't clear to me how having one table simplifies the problem. To see if an order 123 has a receipt the SQL will be something like this with your approach:
    Code:
    select 1
    from   table_mapping or
    where  or.initial_id = 123
    and    mapping_type = 'ORDER-RCPT'
    union
    select 1
    from   table_mapping oi
    join   table_mapping ir on ir.primary_id = oi.secondary_id
    where  oi.primary_id = 123
    and    oi.mapping_type = 'ORDER-INV'
    and    ir.mapping_type = 'INV-RCPT'
    ... versus:
    Code:
    select 1
    from   order_receipt
    where  order_id = 123
    union
    select 1
    from   order_invoice oi
    join   invoice_receipt ir on ir.invoice_id = oi.invoice_id
    where  oi.order_id = 123
    I wouldn't be unduly concerned about creating a few 2-column tables that have no rows or 1 row - they aren't going to break the bank, are they?

    Generic database designs are something I haved ranted about before here.

    Regarding redundancy vs. speed: in a typical OLTP database (i.e. not a data warehouse) you must avoid redundancy and the associated potential for data corruption during updates. Performance should normally be fine with a normalised, non-redundant database design provided appropriate indexes are created, optimiser stats gathered, etc.

  8. #8
    Join Date
    Jul 2006
    Posts
    4
    Actually, there can exist more than one receipt for every invoice or order as partial payment is allowed. But you hit the point: OTLT. I guess I'm just not comfortable with a lot of database tables for a very simple project. But thanks for the help and will now start reading your blog

  9. #9
    Join Date
    Sep 2002
    Location
    UK
    Posts
    5,171
    You won't find it hard to keep up - I only add a new entry about twice a year!

  10. #10
    Join Date
    Jul 2006
    Posts
    3

    database design

    to avoid later problems on the use of so many tables, you must fully understand the overall need of the company. This can be avoided with proper gathering of data before design of various entities needed with proper normalization needed with thorough knowledge on how they relate with each other. (entity relationship)

    Later problems on design can be avoided if the real problem is fully understood and customer needs is fully met.

    If you want to make a new design, please take time to investigate all other possible needs of your clients.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •