Historical reporting is common enough, but what are some ways to slice through your historical data in SQL Server Analysis Services (SSAS) Tabular? Tracking and including historical data or Slowly Changing Dimensions (SCDs) is common enough in data warehousing, and Business Intelligence as a whole, but putting it into an easily-digested form is always a new set of issues.
In this post, I will walk through some strategies we’ve used for integrating historical data into reporting and analytics solutions with SSAS Tabular, as well as some ways you can restrict this information to give your users a cleaner experience.
Slowly Changing Dimensions
In short, an SCD is a dimension or sub-dimension with some pattern around it for tracking history. For reference, you may want to read through some of the Kimball Group’s definitions for the various SCD “types”, which are the shorthand names for different patterns: 0 1 2 3 4 5 6 7
Typically, these patterns fulfill some combination of the following actions:
- Updating values in-place or creating “current” and “past” columns for a field
- Creating new rows to reflect new versions
- Creating dedicated history tables or “outriggers”
It can be helpful to leave a gap between the physical SCD type used to store and track the data in the warehouse or database, and the analogous SCD type that the users interact with in reports. The users may need one or more, potentially-conflicting, versions of the SCD behaviors, and these can often be overlooked initially, so storing “more” never hurts. Consider a basic user that would only need to see the current version of events, while an auditing user may need to inspect the historical relationships more thoroughly.
Sample Use Case
Building on the same sample as in our earlier Factless Fact post, let’s imagine we’re working with the managers at Sample Merchandise, Inc to deliver a reporting and analytical solution. Since the last posts, they’ve moved on to looking at how the different products are organized, and you’ve imported the “Product Dept” attribute which determines the department and organization of the item in the store, and the SCD Type 2 table that stores the history of this attribute, Product_Dept_History.
In this case, we’re assuming that the historical data is stored separately, or is separated during the data import process. You may need to assess how this assumption affects your own data, since these are not neccesarily split in a data warehouse. In general, though, separating “normal” attributes and history-tracked attributes can dramatically lower row sizes in the “normal” table, and prevent extra columns from being heavily duplicated in the “history” table, which can help with performance.
Figure 1. The basic data model being used for this example. The primary fact table is the Sale table, and Product_Class_History stores the Product_Dept attribute for each Product_ID by defining the first and last date for which the condition was true.
Sample results will be provided assuming the following historical relationships:
Table 1. Sample product department history data. Ranges are defined inclusively.
Current/Fixed Timepoint Data
If you only need to show the current data, or if you want to summarize your history data in the model instead of your source database, then there are a few simple options. As far as the user’s perspective goes, this is similar to reporting the attributes as a “Type 1” SCD. Since there are situations where the definition of “current” can be considered fixed, at least in respect to any scheduled refreshes of your cube, you can easily switch between a calculated column or measure. (For a more thorough comparison of Calculated Columns and Measures, SQLBI has a great discussion post.)
By far, the most direct approach is to add a calculated column. Since it’s a range search, you can’t use a LOOKUPVALUE() call directly, but one way to retrieve the value is in the DAX below, which is added to the Product table (and is why we know there’s only one ‘Product'[Product_ID] value):
Current_Product_Dept:= CALCULATE( VALUES(Product_Dept_History[Product_Dept]), FILTER( Product_Dept_History, Product_Dept_History[Effective_Start_Date] <= TODAY() // In this case, the range ends are inclusive && Product_Dept_History[Effective_End_Date] >= TODAY() && Product_Dept_History[Product_ID] = 'Product'[Product_ID] ) )
Now, this has the same limitations as any other calculated column, in that it won’t change unless the Product_Dept_History or Product tables are processed, or a Recalc operation is performed. However, as long as future data isn’t included, or updates aren’t infrequent, this shouldn’t be an issue. This can also be used for other fixed metrics as well; you could replace TODAY() with the last day of the previous year to materialize the departments at the prior end-of-year date.
As an alternative, let’s say that you can’t or don’t want to expose an extra calculated column for this task. In that case, you’d need to modify your measures to account for the appropriate historical filter context. For this example, since we need to modify a filter context, we’ll use the following measure as our baseline:
Product Count:= COUNTROWS('Product')
This measure will simply count the Product rows that belong in any particular flavor of Department field, according to the relationships we’ve already defined in the model. In this case, since there’s a bidirectional filter between the Product and Product_Dept_History tables (see Figure 1), if we drag this out onto a Pivot Table with the earlier Current_Product_Dept column, and the Product_Dept_History[Product_Dept] column, we see the following:
Table 2. The simple Product row count measure against the historical Product_Dept column, and the calculated Current_Product_Dept column. Note that duplicates are being counted since there were multiple historical entries, and so the grand totals can’t be found by summing the results directly.
If we apply the same type of filter that we used for Current_Product_Dept, we could get the following history-sensitive measure:
Current Product Count:= CALCULATE( COUNTROWS('Product'), FILTER( Product_Dept_History, Product_Dept_History[Effective_Start_Date] <= TODAY() && Product_Dept_History[Effective_End_Date] >= TODAY() ) )
And switching Product Count for the new Current Product Count, we can confirm the results:
|Current Product Count||Current_Product_Dept|
Table 3. A table showing a consistent view of Product_Dept assignment, using the new Current Product Count measure.
If filtering your history to a fixed point in time isn’t enough, then you’ll probably need to parameterize this operation to provide a sort-of “Type 2” SCD interaction. By creating a Parameter Table, you can allow the user to slice by some flavor of “reporting date” in order to show the relationships they need at a certain point in time.
In this case, since there is a fixed history and a small amount of data, the parameter table for this example will be to simply duplicate the existing Date table, Calendar, as the new table Historical_Calendar. This is at the day-resolution already, which matches the history resolution, so no additional changes are necessary. Check our earlier post on Relationships and Bridges if you’re looking for ideas on how to generate some of this inside your cube.
Similar to the fixed data, two of the most straightforward approaches involve precalculating items independent of your measures and contexts, or altering your measure contexts to include the logic. Precalculating the historical relationship to link between a new parameter table and your products should sound like building a bridge table, because that’s exactly what we can do.
You can generate the example table using the following DAX:
Materialized_Product_Dept_History:= FILTER( GENERATE(Product_Dept_History,'Historical_Calendar'), Product_Dept_History[Effective_Start_Date] <= 'Historical_Calendar'[Date] && Product_Dept_History[Effective_End_Date] >= 'Historical_Calendar'[Date] )
When combined with the new Historical_Calendar table, here is the updated data model, and the result of combining the new Product_Dept field on your bridge, the new Date parameter, and the original Product Count measure:
Figure 2. The tabular data model, with the addition of the Historical_Calendar parameter table, and Materialized_Historical_Product_Dept calculated bridge.
Table 4. Sample results from comparing the Historical_Calendar[Date], Materialized_Historical_Product_Dept[Product_Dept], and the original simple Product Count measure.
Of course, there are some downsides as well. Since the table generates the full set of combinations of Product and Date, if you don’t have a way to reduce these numbers, this table will get to be inordinately large. You will probably want to both reduce the range (i.e. only including the past year of history up to the current period) and the resolution (i.e. switching from daily history to monthly or weekly) in order to prevent this from impacting performance. However, you’ll also have to be careful that you always include every product in the table, or else you may see some unexpected behaviors from the unmatched members in your “bridge”.
If this only applies to certain measures, or you’d rather apply this by modifying your measure filter contexts, consider the following code:
Historical Calendar Product Count:= CALCULATE( COUNTROWS('Product'), FILTER( Product_Dept_History, Product_Dept_History[Effective_Start_Date] <= MAX('Historical_Calendar'[Date]) && Product_Dept_History[Effective_End_Date] >= MAX('Historical_Calendar'[Date]) ) )
(Note: This still requires the Historical_Calendar parameter table, but you can remove the Materialized_Historical_Product_Dept bridge table.)
Then you can construct a table from the original Product_Dept_History[Product_Dept] field, the parameter data from Historical_Calendar, and the new Historical Calendar Product Count measure to produce an identical (mostly) table:
|Historical Calendar Product Count||Date|
Table 5. Sample results from comparing the Historical_Calendar[Date], original Product_Dept_History[Product_Dept], and the filtered Historical Calendar Product Count measure.
Why is it different for the Grand Totals of the rows? A side effect of the DAX is that it sets up a default case. When working with the bridge table, the Grand Total meant that the Date filter was ignored, and so every Product that was ever Outerwear could be counted as Outerwear in the total. In contrast, the assumed date (MAX()) in this filtered measure means that the Grand Total reflects a particular version of the history instead, and you would need to account for this accordingly. Making the call for which Grand Total is right, though, is up to your specific case, and is outside the scope for this post.
You will need to add this code to any relevant measure, but this can also help to force you to consider the impact of the additional history relationship in each case, which can be helpful if it shouldn’t always apply. You will also need to make sure your default case is set up correctly for your situation, like was just discussed. In this case, I chose to use the MAX() function to ensure that there was always one value, but you can set up the logic to ignore this filter if the parameter table wasn’t used.
However, since this doesn’t need any extra tables besides the parameter, this can help keep your cube size, row counts, and relationship sizes all low, which can more than make up for any overhead.
You can also take advantage of the Roles and Perspectives features in Tabular to implicitly hide certain versions of the history.
For instance, let’s build up a new Role called “Current Only”. If we go into the Model > Roles… dialog, we could filter the version of history that the users assigned to this role would be able to see, such as by pre-filtering the Product_Dept_Hist table.
The full DAX snippet is as follows:
=Product_Dept_History[Effective_Start_Date] <= TODAY() && Product_Dept_History[Effective_End_Date] >= TODAY()
Alternatively, if we want to apply this to either the filtered or the bridge approach, we could rely on the As-Of date model and filter the Historical_Calendar table. If we assume that the Historical_Calendar captures data up until what would be “current”, we could apply the following filter, and then even hide the table completely using Perspectives. Try adding the DAX snippet below as a row filter in a Role, to the Historical_Calendar table, and creating a new Perspective to hide the table, under Model > Perspectives > Create and Manage.
=[Date] = MAX('Historical Calendar'[Date])
Then you can try either of the following combinations to see immediate results:
- The unfiltered Product Count measure and the Product_Dept field from the Materialized_Historical_Product_Dept calculated bridge table.
- The filtered Historical Calendar Product Count measure and the Product_Dept field from the Product_Dept_History table.
They both produce the results below, and with the new perspective in place, won’t expose or require the user to interact with the Historical_Calendar parameter table:
|Historical Calendar Product Count|
|Grand Total||5||Grand Total||5|
Tables 6-7. The sample table results of options (1) and (2), respectively.
So there you have it. Starting with some clean historical data, it doesn’t take much to provide either fixed point-in-time or dynamic/parameterized historical relationships. You can also choose between applying this as either a new filter context or a set of physical constraints in your model, and to even augment this with Perspectives and Roles to give users personalized views of history.
Do you have your own strategies for building history into your Tabular solutions? Is there a more complicated historical model you need to capture? Please feel free to tell us below!