Data is surprising

The only sure thing about data is that it will surely hold surprises you are not expecting. Data is generated from systems that ultimately collect information entered by people. And as people are surprising and act in unpredictable ways, the data they generate follows suit.

This fact has impact on all the stages of your BI project. As I’ve written in a prior post, you can make faster progress in and move in the direction of your goal by working on your data/ETL and reporting/visual tracks in parallel. On both tracks though, you need to be mindful of the challenges you will need to overcome stemming from the unpredictable nature of the data and information you are working with.

For example, on your data and ETL track, you will face many challenges, starting with ambiguous business rules, unclear hierarchies, muddy relationships between entities and gaps between what the data you are working with “should look like” and what it actually is. The budgets that you supposed to tie to projects over time, is actually only available for the entire project length, and not broken by period. The physicians that are supposed to work in no more than three offices, sometimes work in five, or seven, and products that cannot be discounted at more than 20% are sold for half their list price. The reality of business is the reality of life, it’s messy, confusing and chaotic, so surprises are more of the norm then the abnormal. Then, what do you do as an ETL developer? Do you write code to handle every conceivable possibility that may ever occur? Well, if you have a few years to work on your project, you might be able to do that. But if you are under some sort of a realistic timeline, you will make assumptions, solve the problems you are faced with to the best of your current knowledge and move on.

On your reporting and visual presentation side, things are not any easier. The bar chart that was going to be the most appealing part of the report, suddenly does not make any sense because out of the 8 data points you were going to plot on it, 7 are in the 50-100 range, and one is at 23000, rendering your scale meaningless. Or maybe the item you were going to group your report by does not make sense anymore because instead of 7 or 8 members in the group, you learn that there are actually 178.

Any software project has an inherent level of risk due to the fact that what is being built is new and in essence, “imaginary”. Software is always abstract and is hard to to “spec”. Creating specifications for BI projects is particularly difficult because of the inherent unpredictability that comes with the domain, to boot.

What can you do about all this unpredictability? You can shriek in panic and run for the hills, or you can stay calm, and be prepared to adjust your plan as you progress. Rather than trying to specify each and every aspect of your BI project up front, leave yourself some wiggle room to accommodate new findings. Make sure your logical model is robust and extensible, address error handling in your ETL design, and be prepared to quickly address them as they happen. Design your user interfaces in principal to illustrate and visualize the data that will become available as the project progresses, don’t create the expectation from the get go that each and every chart or graph you plan to have will actually materialize. Explain to your stake holders that new and additional ways to illustrate the point, deliver information and make the project successful by choosing the best representation as supported by the data.

This entry was posted in BI At Large and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *