If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Database Concepts & Design > Need expert opinions: Denormalizing a system-critical table (table would fail 1NF)

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-30-07, 13:07
MystichromeVert MystichromeVert is offline
Registered User
 
Join Date: Mar 2007
Posts: 3
Need expert opinions: Denormalizing a system-critical table (table would fail 1NF)

I have a price table that contains simulated hourly prices. This table is used in every aspect of our model's database in one way or another.

Currently, our table design includes 4 columns: Scenario (smallint), ScenDate (smalldatetime), ScenHour (tinyint), and Price (real). The table contains over 26 million records (500+ scenarios * ~2200 days * 24 hours) and is truncated/re-populated every business day.

I have a co-worker that has proposed a new table design: Scenario (smallint), ScenDate (smalldatetime), Hr0 (real), Hr1 (real), .... Hr23 (real). His reasoning behind this proposal was to improve the reloading time each day, to improve query response time (1M records instead of 26M records), and to save database space. His proposal succeeds in all of these areas.

However, by creating 24 price columns instead of 1 price column, we loose the ability to run ad-hoc queries utilizing aggregate functions across the hours within a scenario/day. From the model's perspective, the table design doesn't matter since the prices are manipulated into a single dimension array through the application's code. From my perspective, if I ever have to use min/max/avg/stdev/etc for a day or month, I'm screwed.

I'm interested in hearing what you think. Thanks in advance!
Reply With Quote
  #2 (permalink)  
Old 04-02-07, 13:55
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
A tough call, but I would not denormlize based on the information you give.
Denormalizing to improve loading time is bogus unless you are running close to the limits of your nightly maintenance window.
How much performance gain do you get on your report queries between these two designs?

Can we assume you have indexes on Scenario, ScenDate, and ScenHour?

Why are you loading 2200 days every night? Where does the 2200 come from?
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #3 (permalink)  
Old 04-03-07, 14:22
MystichromeVert MystichromeVert is offline
Registered User
 
Join Date: Mar 2007
Posts: 3
The reason that re-loading time is important to us is that this table is repopulated during business hours, not over night. The model that uses these prices runs over night and cannot be kicked off until these prices are calibrated and loaded to the database. There are also additional systems downstream of our model that are waiting to process our output over night as well.

The gain in performance is substantial on a percentage basis (~65%), but that's only about 15 seconds in reality... hardly worth it in my opinion.

Here's the kicker... this table has no indexes / no primary key. Through testing we've found that the time required to rebuild the primary key, indexes, and/or the statistics on 26 million records outweighs the benefit of actually having the indexes in the first place. Sounds crazy, I know, but that's the reality of this table in our database.

The 2200 days represent the next 6 years of hourly prices. The prices are forward looking and must be re-calibrated to the current day's future market information. The prices basically re-create the day's market environment by capturing underlying commodity prices, price volatilities, skews, etc.

I hope that helps with your questions. Thanks for your response.
Reply With Quote
  #4 (permalink)  
Old 04-03-07, 16:20
blindman blindman is offline
World Class Flame Warrior
 
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
If it is only being rebuilt once per day, then saving 15 seconds is no reason for denormalizing.
But I think you guys should seriously reconsider the architecture of your application. I just have trouble believing that it is necessary to pre-calculate these 26 million records every day, or that they are so dynamic that they all need to be recalculated every day. If you could solve this problem then you could use indexes and your performance would fly.
__________________
If it's not practically useful, then it's practically useless.

blindman
www.chess.com: "sqlblindman"
Reply With Quote
  #5 (permalink)  
Old 04-04-07, 16:16
MystichromeVert MystichromeVert is offline
Registered User
 
Join Date: Mar 2007
Posts: 3
The prices are part of a monte carlo simulation model... so each individual price by itself is pretty much meaningless (each price has a 1 in 500 chance of happening)... it's all about the expected value of the prices, the distribution of the prices, etc. It would be useless for us to only process a subset of these prices everyday.

After thinking about this issue for a few days, I think I'm comfortable with denormalizing the table. We're also toying with the idea of increasing the model's number of scenarios from 500 to 1000 and adding other pricing regions (multiple pricing tables). This structure change is a must if those options become reality.
Reply With Quote
  #6 (permalink)  
Old 04-04-07, 16:49
loquin loquin is offline
Super Moderator
 
Join Date: Jun 2004
Location: Arizona, USA
Posts: 1,797
I would think that even partial indexing would be beneficial. By scenario, for instance, and possibly by date.
__________________
Lou
使大吃一惊
"Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
"I have my standards. They may be low, but I have them!" - Bette Middler
"It's a book about a Spanish guy named Manual. You should read it." - Dilbert

Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On