Page 1 of 3 123 LastLast
Results 1 to 15 of 42
  1. #1
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1

    Unanswered: Date table in data warehouse? Opinions...

    I'm reviewing a data warehouse design schema for a client that is following Kimball's data warehousing principles. One of the first things I noticed was a table of dates with expanded columns giving such information as the year, month, month name, fiscal year, quarter, etc for each date, They also have a surrogate key (int) for the date value. The fact tables store the surrogate key rather than the date value itself.
    They were very surprised when I questioned the purpose of this table, assuring me that Kimball was very strong on the concept of having a date dimension for each table.
    I don't see the purpose of a table containing nothing by derived date formats. I think they will get a bigger performance hit from having to link through the surrogate key than they would suffer from having to convert datevalues stored in the fact tables.
    Has anybody else ever seen this before? Does Kimball really advise this?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  2. #2
    Join Date
    Feb 2004
    Location
    San Antonio, TX
    Posts
    565
    the time dimension is extremely common and in most cases necessary for most cubes.

    this is something that kimball suggests as so does microsoft in course number 2074.

    i am curious as to what you mean by derived dates and "expanded:

    can you elaborate.

  3. #3
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    The columns in the table are something like this:

    DateID (Int) --The surrogate key that is used in the fact tables
    BusinessDate (DateTime)
    Year (Int)
    Month (Int)
    MonthName (Varchar)
    DayOfWeek (Int)
    .
    .
    .
    etc....

    There are about fifteen interpretations of the date value.

    It makes sense to me to include this in cubes that are spun off of the data warehouse, but I don't see any advantage to storing this data in the data warehouse schema.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  4. #4
    Join Date
    Feb 2004
    Location
    Poland
    Posts
    96
    It's posible include date in fact table but i always heard that solusion
    with
    DateID (Int) --The surrogate key that is used in the fact tables
    is beter

    You can share the dimension for example

  5. #5
    Join Date
    Mar 2004
    Location
    Toronto
    Posts
    28
    I implemented this idea in my current project. Here are a few rows and columns of the date table in place:

    Date# IntDate DOW DayName SlashDate
    114610 19991231 5 FRI 1999/12/31
    114611 20000101 6 SAT 2000/01/01
    114612 20000102 0 SUN 2000/01/02
    114613 20000103 1 MON 2000/01/03

    With this data, I don't need to use the DATEADD, DATEDIFF, CONVERT functions for date handling (which i always found to be awkward) Date math can be done by using the datenumber column. Also i can avoid the use of the datetime datatype, which i never liked myself. I have found that queries on specific dates and date ranges work better than using a datetime column.

    This has worked out great for me.

  6. #6
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    There are actually a whole bunch of reasons for using a date table in a data warehouse. There are a lot of things that influence fiscal years, quarters, reporting blocks, etc... Mergers and acquisitions are a colossal headache without the ability to change fiscal year end values almost "on the fly".

    While the date table seems strange at first, I've come to love the little beggar for the amount of back breaking work that it has saved me!

    -PatP

  7. #7
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    OK, I'll buy into the convenience of the date table.

    But why use a surrogate (INT) key on it?

    The smalldatetime datatype is the same size as int, so why not just use the "natural" key, and store the actual date value in the fact table as well? That way you are not forced to link to the date table unnecessarily.

    I tested performance against a table of 2,000,000 records, and I got response time 30% to 70% faster querying against a smalldatetime value in a fact table as opposed to linking the fact table to a date dimension table.

    So I think I'll recommend to these folks that they keep the date dimension table, but use smalldatetime as the primary key rather than creating a surrogate.

    Opinions?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  8. #8
    Join Date
    Feb 2004
    Location
    San Antonio, TX
    Posts
    565
    in a dimension table you typically have two key values
    a surrogate key for identity population during the etl into the cube and an application key which is the key value from the relational side
    the neccesity of the app key is to equate a dimension with it's relevant relational value whereas the surrogate key gurarantees uniqueness of the row only.
    you could use the app key as the pk in the dimension table but you would have to do a lot of work to align the app key to the data in the fact table
    this is not practical in incremental populations of the cube.

    but as you know by testing this each situation is different and the surrogate is just a guideline not a rule.

    I have a saying.
    "if it gets you home before 5:00 and you dont sacrifice quality, i'd piss on a spark plug if i thought it would help"
    Last edited by Ruprect; 06-03-04 at 01:09.

  9. #9
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Somewhere along the way "Star Schemas" became synonymous with data warehouses. Well, I've been looking at this all week and surfing the web for different opinions, and I've decided that is a bunch of bullcrap.
    Forcing a database into a star schema means by default that the format of the database is more important than the data it contains or the business function it is to serve, and that is just a back asswards philosophy. Star Schemas SUCK for data warehouses, but the problem is that nobody (Kimball, et al...) seems to be pushing for data warehouses. They are peddling DATA MARTS, or mere reporting databases, and calling them data warehouses.
    More than ten years ago I built my first database in Access, long before I ever heard the term Data Warehouse (it may have been before Inmon even coined it...). The database blended financial, employee, production, and asset data into a 3rd normal form. I put all the data I could find into it, and changed the schema as frequently as necessary to accomodate the business requirements. We use it for creating sales territories, upper management reporting, and advanced statistical trending such as multi-linear regression analysis. THAT was a data warehouse, and I dare anyone to force anything half as usesful into a bloody Star Schema, or even a Snowflake.
    I think Kimball and others are pushing the simplest cookie-cutter solutions because those are the easiest to sell. And I think a LOT of people are being led astray by this and the whole concept of the data warehouse is being watered down to glorified pivot tables.

    From now on, here is my list of data warehouse definitions:

    Data Warehouse: A database that collects data from MORE THAN ONE SYSTEM and ties the data together in 3rd normal form.

    Data Mart: A database (or cube) that presents a subset of data collected from more than one one system for fast and simple data retrieval (perhaps in a star schema).

    Reporting Database: A database that collects data from ONLY ONE system in order to split processing between OLTP and OLAP requirements.

    A STAR SCHEMA IS NOT A DATAWAREHOUSE! AT BEST, IT IS A DATA MART.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  10. #10
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    Quote Originally Posted by blindman
    More than ten years ago I built my first database in Access, long before I ever heard the term Data Warehouse (it may have been before Inmon even coined it...)
    "The origin of the data warehouse can be traced to studies at MIT in the 1970s... This helped justify the Information Center phenomenon in the 1980s that was motivated strongly by the desire to off-load the new and wildly unpredictable analytic load from the transactional systems platform... In 1988, Barry Devlin and Paul Murphy of IBM Ireland tackled the problem of enterprise integration head-on. They introduced the term "information warehouse" for the first time ... In 1991, Bill Inmon published his first book on data warehousing, ...the first widely available how-to guide on building a data warehouse."

    the above is from http://www.daman.org/company/articles/dwrealtime.htm

    the company i worked for in 1987 was one of the first DB2 installations in canada

    we called it our "data warehouse" and we did not get that terminology from Inmon

    we used QMF against DB2 tables loaded from proprietary application file formats

    oh man, those were the days...



    blindman, never mind the surrogate key, the smalldatetime idea makes a lot of sense
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Yeah, but this client will probably go with the surrogate key anyway, because "That's the way Kimball does it."

    All in all their design is pretty good, but another odd thing I found today was an integer column on each fact table that is always set to the value "1". This was to facilitate counting records using the SUM function instead of COUNT. Why not just use COUNT? "Kimball says..."
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  12. #12
    Join Date
    Jun 2004
    Posts
    1
    Blindman,
    Surrogate keys are perfectly legitimate and not just for date tables. Surrogate keys can increase performance immensely, especially when the natural key is alpha-numeric, etc. 2 kinds of perf. increase, 1) joining fact to dimensions based on a integer is mucho faster, 2) the storage space used for indices and performance of an index scan is greatly improved. Of course this is not noticed on systems of smaller scale, but when you have 300 million to 1 billion records or more, it adds up quickly. I saved a former company 46GB in index disk space, just by using surrogate keys. Also, surrogates are not exclusive to kimball, they have been used for a long time. When the natural key changes (it will), the surrogate key can remain in place and being used. Believe me, you don't want to have change keys on tables with 500 million records, once you do, you will be a big proponent of surrogates for any table. Just my thoughts, the client should not follow Kimball blindly, but, if they are comfortable and understand the design methodology (pros and cons) and it fits their requirements, they will like it. Cheers

  13. #13
    Join Date
    Apr 2002
    Location
    Toronto, Canada
    Posts
    20,002
    marc, that was lovely, but i'm sure blindman is well acquainted with surrogate keys and their function

    what about his problem of the surrogate key for a datetime?

    >> 1) joining fact to dimensions based on a integer is mucho faster

    yes, but smalldatetime is an integer. so it's just as fast

    and with a natural datetime key you don't have to join!!

    >> 2) the storage space used for indices and performance of an index scan is greatly improved.

    yes, but smalldatetime is the same size as integer, so with no surrogate key, it actually takes less space!

    >> When the natural key changes (it will),

    oh, puhlease

    the last time the calendar changed, people rioted in the streets, shouting "give us back our eleven days"

    trust me on this one, a natural key using a datetime value is just not gonna change any time soon

    no offence to you personally, marc, but people who say everything's gotta have a surrogate key are just as nuts as people who don't use them at all
    rudy.ca | @rudydotca
    Buy my SitePoint book: Simply SQL

  14. #14
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    All this time I thought surrogate keys were things you used when you were unable to have baby keys with your own key.

    And what is this fixation with data warehouses being FAST? Sure you want them to be as fast as possible, but its not like a production system that gonna cost you money if takes 30 seconds to insert every new record. In a data warehouse, speed should be second to functionality.
    If it took me two freakin' hours to get a report out of my old Access database, hell that was still two WEEKS faster than other other business divisions could do. And I could do that because I had collected the information beforehand not because I needed it, but because I anticipated needing it.
    Back then I followed this data warehouse philosphy: "Data is cheap to get, and expensive to throw away." If I'd tried to fit every data file I came across into a star schema I'd have had vaporlock.
    Last edited by blindman; 06-04-04 at 21:42.
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  15. #15
    Join Date
    Feb 2004
    Location
    In front of the computer
    Posts
    15,579
    Provided Answers: 54
    Quote Originally Posted by r937
    oh, puhlease

    the last time the calendar changed, people rioted in the streets, shouting "give us back our eleven days"

    trust me on this one, a natural key using a datetime value is just not gonna change any time soon
    Ours (the corporate one) has changed five times in seven years. The "first of the month" can be a really slippery thing, especially in the months that you have two (or more) of them!

    -PatP

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •