Results 1 to 6 of 6
  1. #1
    Join Date
    Mar 2007
    Posts
    3

    Need expert opinions: Denormalizing a system-critical table (table would fail 1NF)

    I have a price table that contains simulated hourly prices. This table is used in every aspect of our model's database in one way or another.

    Currently, our table design includes 4 columns: Scenario (smallint), ScenDate (smalldatetime), ScenHour (tinyint), and Price (real). The table contains over 26 million records (500+ scenarios * ~2200 days * 24 hours) and is truncated/re-populated every business day.

    I have a co-worker that has proposed a new table design: Scenario (smallint), ScenDate (smalldatetime), Hr0 (real), Hr1 (real), .... Hr23 (real). His reasoning behind this proposal was to improve the reloading time each day, to improve query response time (1M records instead of 26M records), and to save database space. His proposal succeeds in all of these areas.

    However, by creating 24 price columns instead of 1 price column, we loose the ability to run ad-hoc queries utilizing aggregate functions across the hours within a scenario/day. From the model's perspective, the table design doesn't matter since the prices are manipulated into a single dimension array through the application's code. From my perspective, if I ever have to use min/max/avg/stdev/etc for a day or month, I'm screwed.

    I'm interested in hearing what you think. Thanks in advance!

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    A tough call, but I would not denormlize based on the information you give.
    Denormalizing to improve loading time is bogus unless you are running close to the limits of your nightly maintenance window.
    How much performance gain do you get on your report queries between these two designs?

    Can we assume you have indexes on Scenario, ScenDate, and ScenHour?

    Why are you loading 2200 days every night? Where does the 2200 come from?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Mar 2007
    Posts
    3
    The reason that re-loading time is important to us is that this table is repopulated during business hours, not over night. The model that uses these prices runs over night and cannot be kicked off until these prices are calibrated and loaded to the database. There are also additional systems downstream of our model that are waiting to process our output over night as well.

    The gain in performance is substantial on a percentage basis (~65%), but that's only about 15 seconds in reality... hardly worth it in my opinion.

    Here's the kicker... this table has no indexes / no primary key. Through testing we've found that the time required to rebuild the primary key, indexes, and/or the statistics on 26 million records outweighs the benefit of actually having the indexes in the first place. Sounds crazy, I know, but that's the reality of this table in our database.

    The 2200 days represent the next 6 years of hourly prices. The prices are forward looking and must be re-calibrated to the current day's future market information. The prices basically re-create the day's market environment by capturing underlying commodity prices, price volatilities, skews, etc.

    I hope that helps with your questions. Thanks for your response.

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    If it is only being rebuilt once per day, then saving 15 seconds is no reason for denormalizing.
    But I think you guys should seriously reconsider the architecture of your application. I just have trouble believing that it is necessary to pre-calculate these 26 million records every day, or that they are so dynamic that they all need to be recalculated every day. If you could solve this problem then you could use indexes and your performance would fly.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Mar 2007
    Posts
    3
    The prices are part of a monte carlo simulation model... so each individual price by itself is pretty much meaningless (each price has a 1 in 500 chance of happening)... it's all about the expected value of the prices, the distribution of the prices, etc. It would be useless for us to only process a subset of these prices everyday.

    After thinking about this issue for a few days, I think I'm comfortable with denormalizing the table. We're also toying with the idea of increasing the model's number of scenarios from 500 to 1000 and adding other pricing regions (multiple pricing tables). This structure change is a must if those options become reality.

  6. #6
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    I would think that even partial indexing would be beneficial. By scenario, for instance, and possibly by date.
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •