Results 1 to 4 of 4
  1. #1
    Join Date
    Jul 2013
    Posts
    7

    Junction table size and performance considerations.

    Hello all,

    I've been working on a data model for some time now and I'm facing a difficult decision with regards to a junction table. I have 2 entity types:

    1. Building: may have somewhere in the order of 10-100 million entity instances.

    2. Property: may have about 100-1000 instances and each Property is classified through its association with a PropertyType entity type.

    The association between Building and Property is many-to-many (and qualified by the PropertyType ID if it makes any difference).

    Obviously, I need a junction table to implement the association between the two. However, considering the sheer amount of possible combinations, I could end up with 100 billion tuples in the junction table, which to my limited experience seems a lot, especially since the 2 tables will need to be joined frequently.

    I have a few questions:

    a. Am I right on my conclusion? Can >100 billion tuples be too much for a junction table? Do you think indexing will help sufficiently? :/

    b. The Property instances are part of the metamodel, so they're pretty much static. I was thinking of pre-combining them into a PropertyGroup type and relating PropertyGroup to Building instead. This is a N-1 relationship, which avoids lots of repetition, but on the downside every time I change my Property metadata I need to recalculate all the potential combinations of Properties and repopulate the PropertyGroup table. Plus, I'm getting a feeling this "flyweight-like pattern" messes up my data model on a conceptual level.

    c. The idea has crossed my mind to implement the N-N association by burying an array of Property primary keys into the 1-end (Building). I'm fully aware this violates 1st normal form, but is it something I could argue for considering my circumstances? :/

    That's it, all in all. Apologies for the long post and any insight would be very much appreciated!

  2. #2
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    100 billion tuples is far too many for text files (like CSV formatted files), but would be trivial for some database engines and hardware platforms. Indexing would be needed, and the degree to which that can help you is governed mostly by the database engine but also by other design issues that you haven't disclosed enough detail for me to offer an opinion. The short answer to both of these questions boils down to "it depends".

    Meddling with your data model, especially forcing meta-data into the user data is a disaster waiting for a victim to arrive... I wouldn't volunteer for that!

    Violating 1NF with that much data is such a bad idea that I can't even comment on it... Anything I write would be rude.

    You're on the right track with your proposed design. The changes sound like poor choices to me. In general, normalize your schema as far as you can and only denormalize after you've proved that it is required and then only to the extent which you can prove is actually required.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

  3. #3
    Join Date
    Jul 2013
    Posts
    7
    Hi Pat and thank you very much for your reply. Could you elaborate maybe a bit more on the danger of mixing data with metadata? I intuitively feel the same way, but I was wondering this: don't we every time we use a reference table basically mix "dynamic" data with "static" meta-data?

  4. #4
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    I can't answer this question now, it will take too much time to explain the background needed. I'll try to get an answer tonight, but the short answer is "no, I can't think of any time that I'd advise mixing normal data with metadata" and reference tables don't do that.

    -PatP
    In theory, theory and practice are identical. In practice, theory and practice are unrelated.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •