Page 1 of 2 12 LastLast
Results 1 to 15 of 17
  1. #1
    Join Date
    Oct 2007
    Location
    Chicago, IL
    Posts
    82

    Handling different time zones in a data warehouse.

    What's the best/optimal method of handling time zones in a data warehouse?
    Some ideas:
    1. Add a FK column to the fact tables for each time zone. So if the requirement is to handle U.S. time zones; each fact table would have 4 FK columns to the date dimension table (pst_datekey, mst_datekey, cst_datekey, est_datekey).

    2. Decide on a standard time zone (e.g. central or UTC) and base the date dimension on the central time zone. Then add FK columns to the date dimension, pointing back to itself for each additional time zone. So if we used central time as the standard, the date dimension table would have pst_datekey, mst_datekey, and est_datekey. The reporting app would need to perform a join from the fact table do the date, and then from date to itself.
    Example query to pull data for 1/1/07 for Eastern time zone:

    SELECT SUM(A.SomeMeasure)
    FROM some_fact A
    INNER JOIN date B ON A.date_key = B.date_key
    INNER JOIN date C ON B.est_datekey = C.date_key
    WHERE C.date='1/1/2007'

    Open to suggestions.

    Thanks,

    -A

  2. #2
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Fact table?
    Dimension?

    Oh...I get it. You are creating a dataMART, not a datawarehouse.

    Because, in a datawarehouse I would just store GMT, or perhaps GMT and local time.

    For datamarts, spin off whatever is simplest and clearest for you users. Perhaps spin off a different mart for each user depending on what time zone they are in.

    Yet another person who has drank the Kimball Koolaid...
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  3. #3
    Join Date
    Oct 2007
    Location
    Chicago, IL
    Posts
    82
    It's used by the entire company and comprises of multiple functional areas. For the sake of this thread/question, lets assume it is a data warehouse (and yes, it was designed using Kimball's method).

    Ok, so the date should be stored as GMT/UTC. But how do you recommend handling the time zones?

    -A

  4. #4
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    If you store GMT and local, why do you need the time zone?
    Are you doing grouping or sorting by time zones?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  5. #5
    Join Date
    Oct 2007
    Location
    Chicago, IL
    Posts
    82
    I need to display hourly data for an entire day; but based on the time zone, I need to select a different 24-hour chunk of data.

    -A

  6. #6
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Based on the timezone of the user, or of the data originator?

    What interface are you using? If you are basing the 24 hours on location of the user, then query off of GMT but shift the datetime parameters accordingly.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  7. #7
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Timezones are a nightmare in a data warehouse, for many reasons. From the purely theoretical viewpoint, a timezone is implicitly an adjustment between the time standard (UCT) and the local observed time. In terms of what you can accurately measure and store, the timezone is wasted effort. From the practical standpoint, timezones get even harder to manage because they are full of exceptions.

    If someone FORCES you to support timezones, get them to define which timezones, ahat kind of support, and what do you do when that timezone support forces you to either create unusable data or violate the local laws. This is not a pretty issue!

    -PatP

  8. #8
    Join Date
    Oct 2007
    Location
    Chicago, IL
    Posts
    82
    Quote Originally Posted by blindman
    Based on the timezone of the user, or of the data originator?

    What interface are you using? If you are basing the 24 hours on location of the user, then query off of GMT but shift the datetime parameters accordingly.
    Based on the time zone of the user.

    The fact/measures are calculated based on the standard time zone (in our case, central time). So when I join to the date dimension, I can't simply filter on the date column since that is the date in central time. Please refer to and provide feedback on the solutions I mentioned in my first post. Also, at the moment, the requirement is to support U.S. time zones.

    -A

  9. #9
    Join Date
    Oct 2007
    Location
    Chicago, IL
    Posts
    82
    Quote Originally Posted by Pat Phelan
    If someone FORCES you to support timezones, get them to define which timezones, ahat kind of support, and what do you do when that timezone support forces you to either create unusable data or violate the local laws. This is not a pretty issue!

    -PatP
    Pat, you're right, it's a complete PITA. Regardless, it's something we do need to support. Requirement is to support the 4 U.S. time zones only. They realize data might be skewed during the day light saving time changes, and they're OK with that. I can't think of any other times when the data may get skewed due to the time zone.

    -A

  10. #10
    Join Date
    Oct 2007
    Location
    Chicago, IL
    Posts
    82
    Another possible solution is to create a time-zone dimension. This would require each row in the fact/measure table to be stored 4 times; once for each time zone with the proper date keys.

    -A

  11. #11
    Join Date
    Oct 2007
    Location
    Chicago, IL
    Posts
    82
    I'm gonna lay out the possible solutions again with pros/cons

    1. Add a FK column to the fact tables for each time zone. So if the requirement is to handle U.S. time zones; each fact table would have 4 FK columns to the date dimension table (pst_datekey, mst_datekey, cst_datekey, est_datekey).
    Pros: Efficient from a data storage perspective.
    Cons: Based on time zone, the JOIN criteria changes. e.g. if the user is in PST time zone, the query needs to JOIN the fact/measure with the date dim. based on the pst_datekey column. Not efficient for development.

    2. Decide on a standard time zone (e.g. central or UTC) and base the date dimension on the central time zone. Then add FK columns to the date dimension, pointing back to itself for each additional time zone. So if we used central time as the standard, the date dimension table would have pst_datekey, mst_datekey, and est_datekey. The reporting app would need to perform a join from the fact table do the date, and then from date to itself.
    Example query to pull data for 1/1/07 for Eastern time zone:

    SELECT SUM(A.SomeMeasure)
    FROM some_fact A
    INNER JOIN date B ON A.date_key = B.date_key
    INNER JOIN date C ON B.est_datekey = C.date_key
    WHERE C.date='1/1/2007'

    Pros: Efficient on data storage side.
    Cons: Requires extra self-join on date dim. for every query. Not efficient for report/SQL development since the join criteria would change based on time zone.

    3. Create a time-zone dimension. This would require each row in the fact/measure table to be stored 4 times; once for each time zone with the proper date keys.
    Pros: Efficient querying, and easy to write SQL against. Time-zone can be a drop-down on the front-end, or be tied to use, and the SQL is very simple. You simply join with the time-zone dim. to get the proper results from the fact/measure.
    Cons: Inefficient data storage. Requires the data in the fact/measure to be duplicated for each time zone.

    Option 2 doesn't provide any advantage over option 1. So really, it's between options 1 and 3. Please provide feedback, or additional ideas.

    Thanks,

    -A

  12. #12
    Join Date
    Oct 2007
    Location
    Chicago, IL
    Posts
    82
    Kimball's method of handling multiple time-zones: http://www.intelligententerprise.com...questid=525467

    -A

  13. #13
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    There are eleven time zones (EST. EDT, CST, CDT, Indiana, Sioux, MST, MDT, Arizona, PST, and PDT) that I know of rigth now in the lower 48 states. Which four are you using? Watch out for Indiana, because its a real joy (it is no where near as simple as it seems). If you have historical data watch out for Navajo, RR, and Atlantic.

    Avoid the timezone dimensions... Include additional rows in your stock time dimension with different observed and UTC times, then do rollups on observed, UTC, or both as needed. This is vastly simpler to create, explain, and use than separate dimensions that dance under the user as they try to find answers.

    -PatP

  14. #14
    Join Date
    Oct 2007
    Location
    Chicago, IL
    Posts
    82
    Thanks for the input Pat. I'll be going with option 1 on the hourly grain by adding more FKs from the fact/measure to the date dim. This would follow your suggestion of not creating a time zone dim. and it also follows the Kimball method.

    I'm still looking for a solution on the fact/measure on the daily grain. I could add a time-zone attribute to the measure/fact table, and calculate for each time zone. And I'm thinking that if I go this route, wouldn't it make sense to just normalize that out into a timezone dim. for the daily measure/fact table so I'm not repeating the descriptive timezone string and just storing integer keys.

    -A

  15. #15
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    I agree with Kimball regarding storing both absolute and local (relative) time values. This is what I suggested earlier, and also handles situations where you may recieve data from two different sources which happen to be in the same time zone, but where the system clocks are not in synch.
    I had to do this a few years back. We had systems sending us batches of data that may have been recorded at various times, so each system sent the time of the data transaction, plus the current system time. By comparing the remote system time to the local system time, we knew exactly how much to adjust the data transaction time to put them all on a standard clock.
    I completely DISAGREE with Kimball's idea of separating the date and time portions into two different fields. This sounds like a recipe for a mess to me. You'd have to write code that would adjust the date portion any time adjustments to the time portion crossed midnight. And that is just asking for errors.
    As far as creating a "calendar day dimension table", well, you've probably already figured out that I think Kimball is full of crap. This is the sort of nonsense that required splitting the date values in the first place, as he himself points out.
    Ask yourself, what will you gain from doing this except create extra admin to maintain the date table?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •