Don’t let the software migration cycle cripple your BI project

speedbumps

Software developers have been taught for decades that they must follow the Development/Integration/Staging/Production practice. This practice was instituted to minimize the number of “bugs” in the software and allow for rigorous testing of code changes without “interfering with production”. This practice was quickly adopted by IT departments and is now the practical standard followed by organizations large and small. While BI straddles the line between technologists and business folks, it is typically controlled by the organization IT/IS/Development department (it requires software and hardware..) and as such, the development of BI content is approached in the same fashion as any traditional software project: we develop in dev system, we integrate in a shared environment with limited datasets, we test and stage in an environment close to prod, and we finally deploy to production. There are of course variations on this theme, but most of you are very familiar with this rant. Unfortunately, this approach has a crippling effect on BI content development, which typically needs to be very rapid, often does not need “accounting accuracy” and is very difficult and expansive to replicate across multiple systems.

First, there’s speed. Burning business questions cannot wait for months, or even weeks, while reports or queries are being migrated through various systems designed to work as a chain of error catching nets. Some margin of error is allowed (and even assumed) when it comes to many business questions (more on this on the next point). To make things even worst for IT departments struggling to keep up with the ever increasing speed of the demand, most BI vendors provide self-service and ad-hoc capabilities that allows savvy business users to design their own reporting content. This increases the frustration of business folks, waiting for weeks and months for changes or development of new content, after they experience the development of ad-hoc content in minutes or hours.

Second, there’s the typical need for directional accuracy as opposed to pennies accounting. For many computer scientists (and accountants), accuracy is an absolute term. In business however, things are usually not as clear cut, and some information is better than none. So, obsessing over each data anomaly and assuring there are no mistakes what-so-ever, is not only impossible when it comes to large amounts of data generated by complex systems (and complex humans), it can lead to stagnation in the process, as “every penny” is being reconciled, and focus is placed on fractional amounts of information, while the millions or billions of correct results are ignored, and left to wait for the next “release date” when all bugs can be fixed.

Finally, with large systems, and large amounts of data, it is virtually impossible (or at least very very expensive) to reproduce the entire production data in any other environment. There is simply too much of it to have copies. This fact intensified by the need for speed, and the ability to “let go” of the penny accounting approach to the data also supports the need to “skip” the migration cycle for BI content development.

Moreover, other than in very rare cases, BI content is based on reading data only. Even interactive BI content (such as dashboards, or interactive reporting) is based on data reading only. Most BI vendors make it almost impossible to change data while using their tools, so the risk of causing any kind of “damage” to systems by writing changes to database is not a problem as well.

There are cases where BI content development can benefit from following the typical software migration practice, but in most cases it suffers. And if internal technology departments want to keep up with their internal demand, they need to adapt to a more rapid approach to BI content development.

Posted in BI At Large | Tagged | Leave a comment

Embedding Tableau Dashboards

I delivered a webcast about embedding tableau dashboard in external facing apps, and embedded BI in general. You can view the webcast recording at the following address, or checkout the slides that accompanied the webcast here.

Posted in BI At Large, Data visualization, Tableau | Tagged , , | Leave a comment

Embedded Tableau Oscars Dashboard

oscarsdash

A few months ago I published my Oscars dashboard (http://bihappyblog.com/2014/03/10/oscars-dashboard/). Recently, I decided to produce a new version of it, leveraging Tableau, and extending it with some additional features possible with some html5 integration. Embedding Tableau in an external web application framework is a great way to leverage Tableau terrific data exploration features like drilling, grouping and filtering with some intuitive, simple to understand and use interface suitable for a user’s portal or an executive audience. This example leverages my Oscars database file and allows exploration of Oscar nominated actors, actresses, directors as well as a free form exploration option for perusing more of the data set. Click the image above to interact with the full version. Enjoy..

Posted in Data visualization, HTML5, Tableau | Tagged | Leave a comment

Tableau connected Medicare enrollment dashboard

tableau_medicare_dashboard

Our medicare enrollment database continues to grow and now contains over 9M enrollment records from across the country. I began collecting this information almost two years ago with my colleague Josh Tapley, and we used it to produce our medicare advantage dashboards using the SAP Dashboards (Xcelsius) tool, as well as our HTML5 reporting solution. Aside from being an interesting dataset, relevant to medical insurance professionals and anyone else interested in medicare and healthcare, this platform provides us the medium to demonstrate many technical advantages and techniques we often solve on projects. So, to add to our arsenal or medicare advantage dashboards, I have now added a Tableau version. This version looks and operates just like it’s siblings from SAP and our custom HTML solution, however uses completely different technology under the covers. To create it, we had to overcome several interesting challenges, from the ability to serve up Tableau content from our secure server which resides behind our firewall via secure proxy to the internet, addressing proxying, authentication and security challeneges to the ability to create visuals which do not exist natively in the tool, such as a donut chart. This dashboard is connected to the live data, and executed a query each time a state is selected. This design pattern is consistent across all three versions of this dashboard and is designed to demonstrate the ability to work with these dashboarding tools in a completely “hands free” no hassle, and no maintenance mode, where data is refreshed in the database and automatically reflected in the dashboard with no need for any intervention. Enjoy.

Posted in Data visualization, HTML5, Tableau | Tagged , | Leave a comment

Using Webi 4.1 SDK to get Report SQL

sql

When SAP introduced BusinessObjects 4.0, clients who were invested in the Webi REBEAN SDK got worried. The Webi SDK had been mostly deprecated and it was not clear what is going to be the direction for those seeking programmatic ways to interact with webi reports. Well, in 4.1, the Webi SDK has made a big come back, and with a modern, web friendly architecture. Webi reports can now be modified, explored and edited programmatically via RESTFUL calls. In a prior post, I wrote about the ability to logon to the BO enterprise platform via REST web services, and building on this example, I took the 4.1 SP2 SDK for a spin. The use case I wanted to address was the extraction of the SQL query from a webi report. This is a use case I run into often, especially when working on dashboards projects where database developers who have no idea (or interest) in webi are looking to get their hands on the SQL queries being used to generate dashboard screens (via BI services). To accommodate this, I was looking for a way to expose the query from the webi data provider programmatically, via a simple hyperlink in the dashboard.

In this example, the service is implemented as a .jsp file on the BO tomcat server. As explained in the first post, we have to work around the domain-of-origin issue, and a java proxy seemed to be the way to go here, though other approaches are feasible.

The actual code to retrieve the sql query of the webi report is below:

<%@include file="getlogin.jsp" %>
The getlogin.jsp file basically uses the logon code from the first post and saves the generated logon token to the session. That way, the logon token can be used for subsequent operations easily…
<%@ page import="
org.apache.http.HttpEntity,
org.apache.http.HttpResponse,
org.apache.http.client.methods.HttpPost,
org.apache.http.client.methods.HttpGet,
org.apache.http.impl.client.DefaultHttpClient,
org.apache.http.util.EntityUtils,
org.apache.http.entity.StringEntity,
org.apache.http.client.HttpClient,
org.apache.http.protocol.BasicHttpContext,
org.apache.http.HttpStatus,
java.io.*,
java.net.*,
org.apache.http.Header,
org.apache.http.impl.client.BasicResponseHandler,
org.json.simple.*,
org.json.simple.parser.*"
%>
<%@page contentType="text/json"%>
<%@page trimDirectiveWhitespaces="true" %>
<%
String reportId = request.getParameter("id");// allow passing a report id as a query string from the dashboard
if (reportId==null) reportId = "7749"; //catchall in case no id is passed
StringBuffer sbf = new StringBuffer();
 HttpGet httpRequest = new HttpGet("http://localhost:6405/biprws/raylight/v1/documents/"+reportId+"/dataproviders/DP0");//in this example, there is one data provider for the report, if there are multiple, you will need to explore them and find their names in order to query each one separately
 httpRequest.setHeader("Content-Type","application/json");
 httpRequest.setHeader("Accept","application/json");
 String logonToken = "";
 logonToken = (String) session.getAttribute("logonToken");
 httpRequest.setHeader("X-SAP-LogonToken",logonToken);
HttpClient httpclient = new DefaultHttpClient();
 HttpResponse httpResponse = httpclient.execute(httpRequest);
if (httpResponse.getStatusLine().getStatusCode() == HttpStatus.SC_OK && httpResponse.getEntity() != null) {
 HttpEntity ent = httpResponse.getEntity();
 BufferedReader in = new BufferedReader(new InputStreamReader(ent.getContent()));
 String inputLine;
 while ( (inputLine = in.readLine()) != null) sbf.append(inputLine);
 in.close();
 EntityUtils.consume(ent);
 } else {
 out.println("error: "+ httpResponse.getStatusLine().getStatusCode());
 Header[] headers = httpResponse.getAllHeaders();
 for (Header header : headers) {
 out.println("Key : " + header.getName() 
 + " ,Value : " + header.getValue());
 }
 HttpEntity entity = httpResponse.getEntity();
 String responseString = EntityUtils.toString(entity, "UTF-8");
 out.println(responseString);
}
 %>
<%
String jsonStr = sbf.toString();
Object obj=JSONValue.parse(jsonStr);
JSONObject array = (JSONObject)obj;
JSONObject obj2=(JSONObject)array.get("dataprovider");
out.println(obj2.get("query")); 
%>
Posted in BI At Large, BusinessObjects 4.0, Web Intelligence | Tagged , , | Leave a comment

Nulls are evil (or at the very least mischievous)

If you have been involved in hands on database development for a while, or in any kind of BI / reporting initiative, you probably already know this. Nulls are evil (or at the very least mischievous).
A null value in a database column is different than a blank, a white space or any multitude of other invisible characters. It is truly void, nothingness, the abyss…
The implications of leaving null values can be very confusing for downstream reporting development. Null values will cause problems when trying to aggregate numeric values, they will make joins fall apart and complicate the querying of any table where null values are involved. As a case in point, I set out to demonstrate below how four major databases handle (or not handle..) null values. The answer is the same in each case.
The use case I created is simple:
I created a table with 8 records in it. 2 of the records are null values. Then I queried the database with three simple questions:
• How many values are there where the value is not ‘a’ – since one record has a value of a, I expected the answer to be 7
• Given a wild card search on the value, provide all records – I expected to get 8 records back from a “wild card” search on all records
• How many records are there in the table, counting the value field – I expected 8
All databases gave the same answer:
How many values are there where the value is not ‘a’ – 5. The 2 null values were not considered
Given a wild card search on the value, provide all records – 6. The null records were ignored.
How many records are there in the table, counting the value field – 6. Yes, the database simply does not count the null values, as if they did not exist. Real nothingness…
Below are the test results in Oracle, SAP HANA, mySql and Sql Server. Beware of nulls…
create table test_nulls (column_a varchar(255));
insert into test_nulls (column_a) values (‘a’);
insert into test_nulls (column_a) values (‘b’);
insert into test_nulls (column_a) values (‘c’);
insert into test_nulls (column_a) values (null);
insert into test_nulls (column_a) values (null);
insert into test_nulls (column_a) values (‘f’);
insert into test_nulls (column_a) values (‘g’);
insert into test_nulls (column_a) values (‘h’);

Oracle:
select * from test_nulls;
ora01
select * from test_nulls where column_a <> ‘a';

ora02
select * from test_nulls where column_a like ‘%';
ora03
select count(column_a) from test_nulls;
ora04
HANA:
create column table test_nulls (column_a varchar(255));
insert into test_nulls (column_a) values (‘a’);
insert into test_nulls (column_a) values (‘b’);
insert into test_nulls (column_a) values (‘c’);
insert into test_nulls (column_a) values (null);
insert into test_nulls (column_a) values (null);
insert into test_nulls (column_a) values (‘f’);
insert into test_nulls (column_a) values (‘g’);
insert into test_nulls (column_a) values (‘h’);

select * from test_nulls;
hana01
select * from test_nulls where column_a <> ‘a';
hana02
select * from test_nulls where column_a like ‘%';
hana03
select count(column_a) from test_nulls;
hana04
MySQL
select * from test_nulls;
mysql01
select * from test_nulls where column_a <> ‘a';
mysql02
select * from test_nulls where column_a like ‘%';
mysql03
select count(column_a) from test_nulls;
mysql04
Sql Server
select * from test_nulls;
sqlserver01
select * from test_nulls where column_a <> ‘a';
sqlserver02
select * from test_nulls where column_a like ‘%';
sqlserver03
select count(column_a) from test_nulls;

sqlserver04

Posted in BI At Large, Data Warehousing, SAP HANA | Tagged , | Leave a comment

Oscars dashboard

oscars

 

I LOVE movies. I am fascinated by the craft of creating movies. It’s a field that combines technology and imagination in very unique ways and where science and art interact to create an amazing product (well, not always, but many times..). As millions of others, I enjoy the yearly Oscars awards which allows  us who enjoy watching the movies to view the people who are responsible for the creation we enjoy in a different light. So, this year, I set out to explore the Oscars from a BI perspective. I began by going to the Oscars.org web site and obtained an extract of the nominations and winners data since the first award show in 1927. The Oscars.org web site makes this data public, however the data is presented in a format that is very un friendly for analysis beyond observation on a web page. So, the first task I faced was transforming the data into a format I can load into a BI tool that can be used to mine the data.

Next, I leveraged some of the new features available in SAP Web Intelligence (Webi) 4.1 to create a visual interactive dashboard that is completely HTML based (will work on any device) and is both visually appealing as well as interactive (with the Search capabilities). Use the Search functionality to look for your favorite actor/s or any other film related terms to see how many times they were nominated and/or won. Feel free to click the image above to link to the live Oscars dashboard. If you have other interesting ideas for Oscars related data visualization, please let me know, I would be delighted to share my Oscars data file!

Posted in BI At Large, Data visualization, HTML5, SAP Mobile BI, Web Intelligence | Tagged , | 2 Comments

Signal and Noise inspired dashboard

Reading Nate Silvers’ “The Signal and the Noise: Why So Many Predictions Fail — but Some Don’t” (amazon link to book) inspired me to create this dashboard to articulate the idea of searching for signal and meaningful insights in the noise of the data. This is the html5 output from an xlf, feel free to download the .xlf used to produce it

Posted in BI At Large, Data visualization, HTML5, SAP Mobile BI, Xcelsius | Tagged , , | Leave a comment

How to add animation in SAP Dashboard (Xcelsius) on Mobile

One of SAP Dashboards (Xcelsius) most attractive features is animation. The way components animate in the desktop version of compiled dashboard and the ability to animate various visualizations and include animated files is key to the “sleekness” associated with so many dashboard applications. However, when compiling .xlf files for mobility in the mobile app, animation is not currently supported. This can be worked around, to a degree, by applying dynamic visibility to a series of images that can create the effect of animation. This technique can help bring mobile dashboards alive, and can be used to create anything from “please wait while data is loading…” spinners to Cylon heads with moving red eye. In the example below, I used the html5 output files of the .XLF file you may download to see how this animation effect is achieved and can be used in your SPA mobile app.

Posted in BusinessObjects 4.0, Data visualization, HTML5, SAP Mobile BI, Xcelsius | 2 Comments

Book review: Software Development on the SAP HANA Platform

Recently, I have been asked by Packt Publishing to review the e-book “Software Development on the SAP HANA Platform” by Mark Walker.

I found this to be a great introductory book for anyone interested to learn the basics about HANA and gain good, hands on understanding of the various areas of HANA development. The book include some good exercises and detailed, step-by-step instructions that can help any developer who is starting out with HANA take the first few steps in a variety of topics, such as: modeling, security, hierarchies, data sourcing and development on the XS engine. The examples are clear and simple, and the language and descriptions are easy to follow, simple to understand and well-articulated.

Overall, I would recommend this book for getting started and becoming familiar with HANA basic concepts. However, this is probably not a fit for experienced developers who have been using the technology for a while and are looking for more advanced in-depth materials. I would actually recommend the various product manuals for gaining much of that insight..

Posted in SAP HANA | Tagged | Leave a comment