While relatively simple data marts can be designed as true simple stars, more often than we like, our reporting data models contains snowflakes. Whatever you want to call these entities, they represent one basic problem for your reporting tools: Cartesian products. Let’s see why. In a simple snow flake model, your fact and dimension tables are joined in a “one way” relationship: one dimensional attributes can have zero or many facts. That’s it. This is a nice and clean design, and it makes aggregation of measures across facts a walk in the park. You never have to worry about cartesian products when you join facts and dimensional attributes, since your facts cannot be double counted by joining them to the dimension. Structurally, the joins look like this:
And they will yield data like this:
John Doe had two transactions, totaling $50 and in total the 4 transactions sum up nicely and with no issues to $85.
When you snow flake, you have to account for the possibility of creating a Cartesian product. Now, each dimensional row has a one-to-many relationship with an additional table, in effect creating a many-to-many relationship in your model. Now, your model looks like this:
And when you join addresses in with sales and clients, you get wrong numbers as you sum things up:
Since Mr. and Mrs. Doe moved from CA to NY, when we join in this information, our relational DB will make it look as if their 3 transactions are 6 and the $85 we earned is $145, because it’s double counting transactions. This may be good news for the sales team, but the auditors will give you a bit of a hard time.
As much as you will try to avoid this problem, complex models often require this kind of snow flaking for all sorts of good reasons. In this example, imagine if you are dealing with many millions of customers, with addresses as multipliers for each customer, I doubt you will be looking to add an “address” dimension in your sales fact. Slightly normalizing your model and adding the kind of relationship illustrated in the second model is perfectly reasonable.
Different tools have different solutions to this problem; BusinessObjects Universe meta-data layer provides you with the tools to address this problem in the form of aliases and contexts.
First, create your universe and setup your joins. You can try running the universe Check Integrity tool, but it will not detect any problem. The structure looks correct from a simplistic perspective, and as long as you are not trying to aggregate, you will not see problems. But we know better…
To solve the problem, we can force business objects to issue two separate queries when users request to get data from all three tables, and maintain the correct aggregation level for the sale measures. Here’s how:
First, we create an alias table to the sales table. We will point the sales object to the alias and leave the rest of the table attributes as mapped to the base representation of it.
Next, create a join from the client table to the sales alias. Finally, create two contexts, one for the join path that includes the address table and the sales table, and the other for the client and sales alias only.
That’s it! You are now ready to handle a variety of query scenarios without needing to worry about your universe users accusing you of “cooking the books”… Here’s what will happen:
When your users select the client name and address objects, webi will issue a single query based on the join path/context specified as CLIENT_SALES. When they add the address zip code, webi will issue a SECOND SEPARATE QUERY to get that data and will stitch the result sets together in its micro-cube.