Don’t let the software migration cycle cripple your BI project

speedbumps

Software developers have been taught for decades that they must follow the Development/Integration/Staging/Production practice. This practice was instituted to minimize the number of “bugs” in the software and allow for rigorous testing of code changes without “interfering with production”. This practice was quickly adopted by IT departments and is now the practical standard followed by organizations large and small. While BI straddles the line between technologists and business folks, it is typically controlled by the organization IT/IS/Development department (it requires software and hardware..) and as such, the development of BI content is approached in the same fashion as any traditional software project: we develop in dev system, we integrate in a shared environment with limited datasets, we test and stage in an environment close to prod, and we finally deploy to production. There are of course variations on this theme, but most of you are very familiar with this rant. Unfortunately, this approach has a crippling effect on BI content development, which typically needs to be very rapid, often does not need “accounting accuracy” and is very difficult and expansive to replicate across multiple systems.

First, there’s speed. Burning business questions cannot wait for months, or even weeks, while reports or queries are being migrated through various systems designed to work as a chain of error catching nets. Some margin of error is allowed (and even assumed) when it comes to many business questions (more on this on the next point). To make things even worst for IT departments struggling to keep up with the ever increasing speed of the demand, most BI vendors provide self-service and ad-hoc capabilities that allows savvy business users to design their own reporting content. This increases the frustration of business folks, waiting for weeks and months for changes or development of new content, after they experience the development of ad-hoc content in minutes or hours.

Second, there’s the typical need for directional accuracy as opposed to pennies accounting. For many computer scientists (and accountants), accuracy is an absolute term. In business however, things are usually not as clear cut, and some information is better than none. So, obsessing over each data anomaly and assuring there are no mistakes what-so-ever, is not only impossible when it comes to large amounts of data generated by complex systems (and complex humans), it can lead to stagnation in the process, as “every penny” is being reconciled, and focus is placed on fractional amounts of information, while the millions or billions of correct results are ignored, and left to wait for the next “release date” when all bugs can be fixed.

Finally, with large systems, and large amounts of data, it is virtually impossible (or at least very very expensive) to reproduce the entire production data in any other environment. There is simply too much of it to have copies. This fact intensified by the need for speed, and the ability to “let go” of the penny accounting approach to the data also supports the need to “skip” the migration cycle for BI content development.

Moreover, other than in very rare cases, BI content is based on reading data only. Even interactive BI content (such as dashboards, or interactive reporting) is based on data reading only. Most BI vendors make it almost impossible to change data while using their tools, so the risk of causing any kind of “damage” to systems by writing changes to database is not a problem as well.

There are cases where BI content development can benefit from following the typical software migration practice, but in most cases it suffers. And if internal technology departments want to keep up with their internal demand, they need to adapt to a more rapid approach to BI content development.

This entry was posted in BI At Large and tagged . Bookmark the permalink.