Abusing the date dimension: when does it make sense to NOT use it

The date dimension is a dimensional model staple. You will not find a data mart, data warehouse or any kind of a reporting database that does not have one form or another of a date dimension. The benefits of using a date dimension are numerous, and it’s a great tool for report developers, who can simply reference columns in the date dimension rather than look to write complex sql functions to perform date gymnastics. However, when it comes to the semantic layer, I suggest being much more judicious when implementing the date dimension. Overusing aliases in your semantic layer to represent many different dates, or using contextual separation of dates, can add significant complexity to your semantic model and reduce its maintainability dramatically. And the value of using the date dimension with many of the dates tracked in your fact may not be significant.

So, when does it make sense to NOT use your date dimension? The first condition that should exist for you to choose not to use your date dimension is the use of “smart keys” for your dates. Smart keys were very popular 30 years ago, and were used to embed business rules and logic into a numbering system that represented the rows values. This sounded like a great idea, until these business rules began changing. As the rules embedded in the smart keys started getting more dynamic, database designers faced a whole slew of new problems and headaches due to the need to make changes in existing keys. To avoid these problems, data warehouses use surrogate keys (or synthetic keys). These are “dumb” serial numbers that are used to encapsulate the “business keys” (legacy keys) and make handling of changes much easier. This is an almost bullet proof technique and you cannot go wrong with implementing surrogate keys in any dimension, except, the date dimension. This is the only exception that merits a diversion. Dates are not “slowly changing dimensions”, and most implementations of a date dimension are implemented at the grain of a day, and do not require multiple rows per day. This kind of implementation allows the use of “smart” date keys. Keys that follow the convention yyyymmdd (for example 20110419) are numeric, easy to produce and maintain, and very importantly, will allow report developers to deduct the date, without using the date dimension directly.

And this leads me to the second condition under which date dimensions should be considered not applicable in the semantic model: date proliferation. If your semantic model looks like this, before accounting for anything else but dates, you may want to reconsider your use of the date dimension.

Many subject areas require the use of many tens or dates per fact. Especially with so called “factless facts”, or facts associated with domains that are more process oriented then measurement oriented, it’s very common to find dates upon dates that can be used to analyze the process. Think about your sales process and how it is captured in your CRM: your lead generation date, your initial call to the lead date, your second call to the lead date, your first meeting date, your first quote date, your revised quote date, your po date, your booking date, your revenue recognition date, and so on and so forth. Or imagine a ticketing system used to track tickets escalation in a highly regulated environment where each step of the escalation process goes through several approvals, all happening at different dates. So, while it is easy to imagine a fact of this sort with 50-60 dates, would you really want, or need to create these many joins and aliases in your semantic model? Would that be productive and add value to your business users? Do they find it useful to know the date, week number in month, week number in year, month number, month name, year, quarter number, quarter name, weekday name, etc etc for each one of these dates? Or would you simply want to know the actual date? The common answer is that there are probably a handful to a dozen of dates that can benefit from a join to the date dimension, and for the rest, all that is needed is simply the ability to report the date. So, if your date dimension has a smart key, as described above, it would be simple to convert it to a date with a simple sql function. If more is needed, it would also be simple to join it back to a date dimension alias and add any kind of date permutation on it. This approach is also known as eating the pie AND having it…

This entry was posted in BI At Large, Data Warehousing and tagged , , , . Bookmark the permalink.

3 Responses to Abusing the date dimension: when does it make sense to NOT use it

  1. Irwin says:

    Hey Ron,

    I get what your saying but I would question the sanity/competency of a data modeler who creates separate dimensions for the types of dates you show in the star schema. These are not “dimensions” in the classic sense of the word. Dimensions are enduring aspects of the business, like product, customer and vendor.

    What you’re showing above are ATTRIBUTES and while perhaps important to distinguish, I do not believe they are analytically meaningful enough to merit their own dimension…in any layer, semantic or otherwise.


    • Ron Keler says:

      Thank you for your comment Irwin! I am in complete agreement with you. It’s just semantics (no punt intended…). These dates would be modeled as part of the facts, they are in fact, well, factual. The concept of using a “smart date key” to represent them in the fact table gives the semantic model and report designer the flexibility to choose whether they should be treated as singular attributes, without joining them to the date dimension, of if they merit a join (and an alias) to the date dimension table, as a significant business staple.

  2. Irwin says:

    Not a big fan of smart keys either as they invariably become bastardized. SAP frowns on them also. They don’t mind ranges for their natural keys but insist on no intelligences in individual key positions. That’s a general statement.

    When it comes to using these smart keys for dates. the only risk you run is the potential inconsistency of the YYYYMMDD format. Although. this can be controlled with edits on the front-end and/or ETL decodes, etc. at the back end). Other than that, it’s a logical choice as they are always unique and sequential and more or less behave like an SK. Barring any major, biblical-type catastrophe, it’s not likely the date (natural key) will ever be reorganized. If so, you’ll have bigger problems than wondering why you didn’t elect to go with SKs for these fields
    : – )


Leave a Reply

Your email address will not be published. Required fields are marked *