| |
|
If this is your first visit, be sure to check out the FAQ by clicking the link above.
You may have to register before you can post: click the register link above to proceed.
To start viewing messages, select the forum that you want to visit from the selection below.
|
 |
|

03-26-08, 15:08
|
|
Registered User
|
|
Join Date: Oct 2007
Location: Chicago, IL
Posts: 82
|
|
|
Handling different time zones in a data warehouse.
|
|
What's the best/optimal method of handling time zones in a data warehouse?
Some ideas:
1. Add a FK column to the fact tables for each time zone. So if the requirement is to handle U.S. time zones; each fact table would have 4 FK columns to the date dimension table (pst_datekey, mst_datekey, cst_datekey, est_datekey).
2. Decide on a standard time zone (e.g. central or UTC) and base the date dimension on the central time zone. Then add FK columns to the date dimension, pointing back to itself for each additional time zone. So if we used central time as the standard, the date dimension table would have pst_datekey, mst_datekey, and est_datekey. The reporting app would need to perform a join from the fact table do the date, and then from date to itself.
Example query to pull data for 1/1/07 for Eastern time zone:
SELECT SUM(A.SomeMeasure)
FROM some_fact A
INNER JOIN date B ON A.date_key = B.date_key
INNER JOIN date C ON B.est_datekey = C.date_key
WHERE C.date='1/1/2007'
Open to suggestions.
Thanks,
-A
|
|

03-26-08, 16:07
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
Fact table?
Dimension?
Oh...I get it. You are creating a dataMART, not a datawarehouse.
Because, in a datawarehouse I would just store GMT, or perhaps GMT and local time.
For datamarts, spin off whatever is simplest and clearest for you users. Perhaps spin off a different mart for each user depending on what time zone they are in.
Yet another person who has drank the Kimball Koolaid...
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|

03-26-08, 16:51
|
|
Registered User
|
|
Join Date: Oct 2007
Location: Chicago, IL
Posts: 82
|
|
|
|
It's used by the entire company and comprises of multiple functional areas. For the sake of this thread/question, lets assume it is a data warehouse (and yes, it was designed using Kimball's method).
Ok, so the date should be stored as GMT/UTC. But how do you recommend handling the time zones?
-A
|
|

03-26-08, 19:46
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
If you store GMT and local, why do you need the time zone?
Are you doing grouping or sorting by time zones?
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|

03-27-08, 11:45
|
|
Registered User
|
|
Join Date: Oct 2007
Location: Chicago, IL
Posts: 82
|
|
I need to display hourly data for an entire day; but based on the time zone, I need to select a different 24-hour chunk of data.
-A
|
|

03-27-08, 12:10
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
Based on the timezone of the user, or of the data originator?
What interface are you using? If you are basing the 24 hours on location of the user, then query off of GMT but shift the datetime parameters accordingly.
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|

03-27-08, 14:07
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
|
|
Timezones are a nightmare in a data warehouse, for many reasons. From the purely theoretical viewpoint, a timezone is implicitly an adjustment between the time standard (UCT) and the local observed time. In terms of what you can accurately measure and store, the timezone is wasted effort. From the practical standpoint, timezones get even harder to manage because they are full of exceptions.
If someone FORCES you to support timezones, get them to define which timezones, ahat kind of support, and what do you do when that timezone support forces you to either create unusable data or violate the local laws. This is not a pretty issue!
-PatP
|
|

03-27-08, 15:03
|
|
Registered User
|
|
Join Date: Oct 2007
Location: Chicago, IL
Posts: 82
|
|
Quote:
|
Originally Posted by blindman
Based on the timezone of the user, or of the data originator?
What interface are you using? If you are basing the 24 hours on location of the user, then query off of GMT but shift the datetime parameters accordingly.
|
Based on the time zone of the user.
The fact/measures are calculated based on the standard time zone (in our case, central time). So when I join to the date dimension, I can't simply filter on the date column since that is the date in central time. Please refer to and provide feedback on the solutions I mentioned in my first post. Also, at the moment, the requirement is to support U.S. time zones.
-A
|
|

03-27-08, 15:08
|
|
Registered User
|
|
Join Date: Oct 2007
Location: Chicago, IL
Posts: 82
|
|
Quote:
|
Originally Posted by Pat Phelan
If someone FORCES you to support timezones, get them to define which timezones, ahat kind of support, and what do you do when that timezone support forces you to either create unusable data or violate the local laws. This is not a pretty issue!
-PatP
|
Pat, you're right, it's a complete PITA. Regardless, it's something we do need to support. Requirement is to support the 4 U.S. time zones only. They realize data might be skewed during the day light saving time changes, and they're OK with that. I can't think of any other times when the data may get skewed due to the time zone.
-A
|
|

03-27-08, 15:36
|
|
Registered User
|
|
Join Date: Oct 2007
Location: Chicago, IL
Posts: 82
|
|
Another possible solution is to create a time-zone dimension. This would require each row in the fact/measure table to be stored 4 times; once for each time zone with the proper date keys.
-A
|
|

03-27-08, 15:59
|
|
Registered User
|
|
Join Date: Oct 2007
Location: Chicago, IL
Posts: 82
|
|
I'm gonna lay out the possible solutions again with pros/cons
1. Add a FK column to the fact tables for each time zone. So if the requirement is to handle U.S. time zones; each fact table would have 4 FK columns to the date dimension table (pst_datekey, mst_datekey, cst_datekey, est_datekey).
Pros: Efficient from a data storage perspective.
Cons: Based on time zone, the JOIN criteria changes. e.g. if the user is in PST time zone, the query needs to JOIN the fact/measure with the date dim. based on the pst_datekey column. Not efficient for development.
2. Decide on a standard time zone (e.g. central or UTC) and base the date dimension on the central time zone. Then add FK columns to the date dimension, pointing back to itself for each additional time zone. So if we used central time as the standard, the date dimension table would have pst_datekey, mst_datekey, and est_datekey. The reporting app would need to perform a join from the fact table do the date, and then from date to itself.
Example query to pull data for 1/1/07 for Eastern time zone:
SELECT SUM(A.SomeMeasure)
FROM some_fact A
INNER JOIN date B ON A.date_key = B.date_key
INNER JOIN date C ON B.est_datekey = C.date_key
WHERE C.date='1/1/2007'
Pros: Efficient on data storage side.
Cons: Requires extra self-join on date dim. for every query. Not efficient for report/SQL development since the join criteria would change based on time zone.
3. Create a time-zone dimension. This would require each row in the fact/measure table to be stored 4 times; once for each time zone with the proper date keys.
Pros: Efficient querying, and easy to write SQL against. Time-zone can be a drop-down on the front-end, or be tied to use, and the SQL is very simple. You simply join with the time-zone dim. to get the proper results from the fact/measure.
Cons: Inefficient data storage. Requires the data in the fact/measure to be duplicated for each time zone.
Option 2 doesn't provide any advantage over option 1. So really, it's between options 1 and 3. Please provide feedback, or additional ideas.
Thanks,
-A
|
|

03-27-08, 16:33
|
|
Registered User
|
|
Join Date: Oct 2007
Location: Chicago, IL
Posts: 82
|
|
|
|

03-27-08, 20:31
|
|
Resident Curmudgeon
|
|
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
|
|
There are eleven time zones (EST. EDT, CST, CDT, Indiana, Sioux, MST, MDT, Arizona, PST, and PDT) that I know of rigth now in the lower 48 states. Which four are you using? Watch out for Indiana, because its a real joy (it is no where near as simple as it seems). If you have historical data watch out for Navajo, RR, and Atlantic.
Avoid the timezone dimensions... Include additional rows in your stock time dimension with different observed and UTC times, then do rollups on observed, UTC, or both as needed. This is vastly simpler to create, explain, and use than separate dimensions that dance under the user as they try to find answers.
-PatP
|
|

03-28-08, 00:14
|
|
Registered User
|
|
Join Date: Oct 2007
Location: Chicago, IL
Posts: 82
|
|
Thanks for the input Pat. I'll be going with option 1 on the hourly grain by adding more FKs from the fact/measure to the date dim. This would follow your suggestion of not creating a time zone dim. and it also follows the Kimball method.
I'm still looking for a solution on the fact/measure on the daily grain. I could add a time-zone attribute to the measure/fact table, and calculate for each time zone. And I'm thinking that if I go this route, wouldn't it make sense to just normalize that out into a timezone dim. for the daily measure/fact table so I'm not repeating the descriptive timezone string and just storing integer keys.
-A
|
|

03-28-08, 01:35
|
|
World Class Flame Warrior
|
|
Join Date: Jun 2003
Location: Ohio
Posts: 11,726
|
|
I agree with Kimball regarding storing both absolute and local (relative) time values. This is what I suggested earlier, and also handles situations where you may recieve data from two different sources which happen to be in the same time zone, but where the system clocks are not in synch.
I had to do this a few years back. We had systems sending us batches of data that may have been recorded at various times, so each system sent the time of the data transaction, plus the current system time. By comparing the remote system time to the local system time, we knew exactly how much to adjust the data transaction time to put them all on a standard clock.
I completely DISAGREE with Kimball's idea of separating the date and time portions into two different fields. This sounds like a recipe for a mess to me. You'd have to write code that would adjust the date portion any time adjustments to the time portion crossed midnight. And that is just asking for errors.
As far as creating a "calendar day dimension table", well, you've probably already figured out that I think Kimball is full of crap. This is the sort of nonsense that required splitting the date values in the first place, as he himself points out.
Ask yourself, what will you gain from doing this except create extra admin to maintain the date table?
__________________
If it's not practically useful, then it's practically useless.
blindman
www.chess.com: "sqlblindman"
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|
|