Using dbt macros to centralize key model parameters

Leveraging dbt to manage modularized sql for analytics is a game changer and an important pillar of the modern data stack. But not only the ability to re-use sql components on your cloud data warehouse, clearly understand data lineage and relationship with the visual DAG feature, build trusted data assets with re-usable sql tests scripts and centralize the ELT documentation make dbt such a delight for data engineers, the integration of jinja templates and the ability to apply them through macros throughout your dbt project can become incredibly powerful when it comes to flexibility and re-use.

Case in point: imagine you are working on an analytics project that includes analyzing time series data. Let’s say your initial assumption, after speaking with the key business stake holders, are that the time window needed for the analysis is 8 weeks. 8 weeks, you are told, is a magical amount of time, and anything that happens within an 8 weeks’ time window is paramount to our understanding of the business problem at stake.

Armed with this knowledge, you set out to build a series of interesting and complex dbt models that all rely on this 8-week period. Some models might require an application of a where clause that measures 8 weeks, others might apply window functions that use sliding windows for more complex aggregation.

For example, you may end up with some intermediate, or reporting model that looks like this:

Of course, this is just an example used to illustrate a concept. The 8 weeks timeframe is referenced in this model in two places, once as an interval for a between where clause, and once as the upper bound of a window function. While we can use a database reference for a between function in sql (between table_a.column_a and table_a.column_b), the window function syntax is more restrictive and can only be referenced as a constant.

In the real world you would have many sources and many models with many references to the magical 8 weeks periods. Perhaps over time, you find that you end up with dozens of models referencing this key assumption…

Well, fast forward a week or two, you now have the results of the model neatly compiled into a consumable product, perhaps a beautiful Tableau dashboard, or a basic excel data dump, and you proudly show the results to the project business stake holder.

Your stake holder sees the results and after a quick consideration informs you that the numbers look off and the time comparison period should be 9 weeks…

Sigh…

Now you have all these “8 week” references to replace in many models. Assuming you are using VS Code for your dbt project, you can use the Find and Replace functions to search across all your models and tests, but this is an error prone, time consuming, and frustrating process… And there’s no guarantee that the time window won’t need to be changed back to 8 weeks next time the stake holder reviews the results..

How can you centralize this key assumption so it can be used across all your model and test files in your dbt project, without needing to hard code in everywhere?

If you create a database object, perhaps a CTE in this fashion

You will be able to reference it in some places and leverage it in some parts of the model, but not everywhere. For example, in the “rows between current row and n following” sql snippet where the number n is a static constant that cannot be provided to a select statement from a column name in a database object (at least in Postgres). What to do…? Well, that’s where dbt macros come to the rescue!

dbt comes with some critically important macros, which are basically jinja macros. Jinja is a great templating engine and integrating it with dbt is what allows the ref and source macros for example to be used to generalize sql code and create reusable components.

And, lucky for us all, we get to define our own jinja template macros and re-use them in our sql models! So, solving our 8 weeks model assumption centralization problem is as simple as a single line of code in the dbt macros folder:

So now, you can reference the model_weeks() macro function in your sql models, and replace all the hard coded references to 8 with this macro. The macro can be injected anywhere in your sql files, dbt will first compile the sql files to be run on the database, replacing any jinja template macros with the appropriate text, and then execute them in the DB, so the macro can be used anywhere needed in the sql code, as variable, constant, clause statement, anything!

In the following image, note what the model above looks like with the macro references instead of the hard coded numbers. Note how the macro is used both within a single quote or in a window function, with no issues. The macro will simply be replaced with the number of weeks and then the compiled .sql file will be executed!

This is of course a fairly simple example, you can leverage your macros in many ways to centralize logic, increase reusability and make your models more agile, accurate and adaptive to the rapidly changing business needs!

Posted in Uncategorized | Comments Off on Using dbt macros to centralize key model parameters

AWS Lightsail windows instance snapshot automation

If you are using AWS Lightsail to host analytics related applications, or any other applications for that matter, and you happen to be using windows OS on your Lightsail instances, you may have noticed the message indicating that “Automatic snapshots are not supported for Windows instances at this time” in your management console. This is a bit inconvenient, if you are running applications that should be backed up, and snapshotting is such a convenient way to assure recoverability.

While you can take manual snapshots occasionally, it’s hard to keep up with that process. After searching and not finding any solutions online, I ended up writing a PowerShell script that leverages the AWSPowerShell cmdlets and can be scheduled on your windows server Lightsail instance using task scheduler (or used in a Lambda function to run serverless). This is a simple use case with one instance, but of course this concept can be extended to support more complex scenarios and multiple instances as needed. The script executes the following operations:

  • It creates a new snapshot using the New-LSInstanceSnapshot cmdlet
  • It retrieves all the existing snapshots for the instance using the Get-LSInstanceSnapshotList cmdlet
  • It deletes older instances using the Remove-LSInstanceSnapshot, based on a parameter to keep the snapshots from the last n number of days

The script also includes code to connect to an office365 mail account and send error emails in case of a failure.

The script:

$ErrorActionPreference = "stop"
try {
  <#
The try block will only catch errors in PS execution. 
To check for errors in the response from the AWS service, can parse the response info and check..
Also, can look at CloudWatch API for more advanced options to keep an eye on the snapshot creation and removal processes
#>

  <#These are some parameters for email authentication with office365, if the script errors, it will send an email. Password can be stored as encrypted string somewhere as needed#>
  $emailUsername = "some.email@somecompany.com"
  $emailTo = "some.other.email@somecompany.com"
  $emailassword = "TypeOrGetFromYourFavoriteEncryptionMethod"
  $cred = New-Object System.Management.Automation.PSCredential ($emailUsername,(ConvertTo-SecureString $emailassword -AsPlainText -Force))

  <#The default config of the script would remove all snapshots that are older then 5 days ago. NumberOfDaysToKeep variable value can be changed as needed#>
  $NumberOfDaysToKeep = 5
  <#The InstanceName is needed to identify which instance to snapshot as well as used for snaphot name base#>
  $InstanceName = "MyInstance123"

  <#First, issue the command to create a new snapshot, append a timestamp to snapshot name#>
  $CurrentTimestamp = Get-Date -Format "yyyyMMdd-HHmm"
  $InstanceSnapshotName = $InstanceName + "-" + $CurrentTimestamp
  <#Create a snapshot#>
  New-LSInstanceSnapshot -InstanceName $InstanceName -InstanceSnapshotName $InstanceSnapshotName

  <#Next, check if there are older instances that need to be removed, to save space..

  #Calculate actual date threashold to use to remove snapshots that are older than that#>
  $MinDate = ([datetime]::Today).AddDays($NumberOfDaysToKeep * -1)

  <#Get all instance snapshots, sort them descending by created date (can delete based on number of instances, not dates..)#>
  $CurrentSnapshots = Get-LSInstanceSnapshotList | Sort-Object -Property CreatedAt -Descending

  <#Iterate through the instances, and remove any instance that is older then the threashold min date#>
  foreach ($SnapshotItem in $CurrentSnapshots) {
    if ($SnapshotItem.CreatedAt -le $MinDate)
    {
      Remove-LSInstanceSnapshot -InstanceSnapshotName $SnapshotItem.Name -Force
    }
  }
}

catch
{
  $msg = $Error[0].Exception.Message
  <#Write the error to the console in case this is run interactively and also send an email..#>
  Write-Host $msg
  Send-MailMessage -From $emailUsername -To $emailTo -Subject "LS Snapshot script errored" -Port 587 -Body $msg -SmtpServer “smtp.office365.com” -UseSsl -Credential $cred
  exit 1
}

If you are looking for some additional information about how to setup the AWSPowerShell tools, here are some additional steps and info:

You can find all the information you need about Lightsail PowerShell setup in this guide: https://docs.aws.amazon.com/powershell/latest/userguide/pstools-getting-set-up.html

There are several installation methods, as described in the doc, the Install-Module option was the simplest for me to use. After installing the modules and setting the appropriate execution policy, I was able to configure my IAM user profile to be used for accessing the AWS services, such as the snapshotting services used in the script above.

The document referenced above describes in great details this process, but it boils down to running a PS cmdlet called Set-AWSCredential with your IAM user access key information.

Once that is done, you can simply execute AWS cmdlets without needing to worry about authentication on each session, it is a straightforward one-time setup.

Next, I was able to find the list and documentation for all the AWSTools cmdlets, which is here: https://docs.aws.amazon.com/powershell/latest/reference/items/Lightsail_cmdlets.html

Finally, I chose to implement the solution as a PowerShell script on the instance itself and have it run on a scheduled basis using the windows task scheduler. I have this run daily and it works great for my simple use case.  If you are looking to have this process run serverless and not on the hosted instance, and also extend to leverage across multiple instances, you can import your ps1 file into a Lambda function and use CloudWatch to schedule it as well and interrogate logs more carefully.

Posted in AWS, Lightsail, PowerShell | Tagged , , | Comments Off on AWS Lightsail windows instance snapshot automation

Explore US Spend Data With Tableau Extract API

As published on Cleartelligence blog first

Tableau can connect to a huge number of data sources out-of-the-box. This includes pretty much any database, flat file, excel, and more. However, there are some cases where companies and users are looking to connect Tableau to a data source that is not included in that long out-of-the-box connections list. For those cases, Tableau provides the ability to connect to “custom” data sources through several APIs, SDKs and approaches that can be put to use to provide a seamless Tableau user experience. To demonstrate this, my colleague Scott Briden and I set out to create an embedded Tableau dashboard that leverages a government web service which provides information on government spend via XML data.

The US government publishes spend information on its public web site https://www.usaspending.gov/Pages/Default.aspx

The site also provides a web services APIs (https://www.usaspending.gov/DownloadCenter/API/Pages/api.aspx) to allow programmatic data extraction.

To leverage these web services, we chose to use the Tableau Extract API. We created a web page with some selectors that allow the user to select which state, year and how many records to retrieve from the web service. Then we embedded a Tableau dashboard in the page that uses an extract connection hosted on our Tableau server. The web page allows the user to not only select the variables, but also to “refresh” the extract file the dashboard uses. That is done by invoking a program that retrieves the web service XML data, converts it to a Tableau extract using the Tableau Extract SDK and then re-publishes the extract to the server. Finally, once the dashboard page is refreshed, it alerts the user to view the new data in the dashboard. Enjoy…

Posted in Tableau | Tagged , | Comments Off on Explore US Spend Data With Tableau Extract API

Tableau Dashboards as text messages

–Originally posted on cleartelligence.com blog

  The addition of conditional subscriptions in Tableau server version 10.1 opened up the door for easily operationalizing alerts with Tableau dashboards. Not only it is easy to create conditional subscriptions that can automate operational information related tasks, these alerts can be distributed not only via email, but also via SMS or MMS messages. This means that critical, real time information about any data condition can be delivered in a manner that can be consumed by new (and old..) kind of devices for even better visibility and effectiveness. For my example, I first created a basic simple dashboard that is suitable for a phone, or even a smart watch.

I then published the dashboard to the server so it can be used with subscriptions.

Next, I created a new user account with an email address that will be resolved to my phone as a text message. In my case, I use AT&T, so the email format would be xxxxxxxxxx@mms.att.net where xxxxxxxxxx is the phone number. Verizon and other carriers have similar formats.

Next, I was able to easily subscribe the text user and set the text to a subscription which runs every 15 minutes but only sends the message if the threshold condition to the dashboard is met and it has data.

That is all the configuration needed! Next thing I know, I received the text message with the dashboard image on my phone and watch!

iPhone:

Apple watch:

 

Posted in Tableau | Tagged , , , | Comments Off on Tableau Dashboards as text messages

Tableau Piano Chart

tableau_piano_chart

Ever since I got to wait for my son during some of his piano classes, I became fascinated by the relationship between music and math. There are so many mathematical concepts in play when playing, writing or even listening to music. One day, I hope to create a way to convert any dataset represented in a chart into a good sounding musical tune (there are many topics to address to make the music sound good, I will perhaps elaborate on some later post..), but for now, I was able to create a Tableau Piano Chart.. Using standard charting techniques in Tableau and an inventive dataset, I got the keyboard look and then set off to write a javascript implementation that will play a sound onMarksSelection…

I was able to find some great piano notes sound files and used the DOM audio object to play a sound whenever a markSelection event fired on the dashboard, passing the relevant note to the function using Tableau JS API. I tried the piano chart on my laptop, and all was well! I then tried it on my iPhone, and alas, it looked good, but was silent.. As it turns out, iOS Safari has a different implementation of the audio object and much of the standard functionality supported on desktop browser is not available.. Getting the sound to play on my iPhone during a tableau marks selection event became quite an adventure, but I finally got it to work! Matt Harrison WebAudioAPI helped me through it. While not 100% perfect, it works well enough, and I can now enjoy a fully functional piano chart in Tableau on my computer and iPhone as well.. Enjoy!

Posted in Tableau | Tagged , , , | Comments Off on Tableau Piano Chart

Voice Controlled Tableau Dashboard

voice_activated_tableau_dashboard_image

The way we interact with computer is changing and in the very near future we can expect our computers and electronic devices to become much better at understanding what we tell them to do, literally. In this example, I used the annyang js library (with slight mods) to integrate voice command activation for Tableau dashboard. The annyang api leverages google voice recognition and currently works with google chrome, but we can expect similar offerings and multi-platform support for this type of functionality from Apple, Microsoft and the rest of the computer industry leaders in the next few years, to allow developers to fully integrate voice activation in their software in a seamless fashion.

*As originally published on Cleartelligence blog

Posted in BI At Large, Tableau | Tagged , | Comments Off on Voice Controlled Tableau Dashboard

Mine News Websites – Tableau Live Dashboard

explore_the_news

Two years ago I wrote about setting up a process to scrape leading news websites home pages and storing them in HANA for text analysis. I hypothesized about the various analysis one could derive from parsing multiple news websites over time and looking at words and categories patterns over time. Well, after two years of accumulating data, we now have a significant enough dataset to start mining and analyzing for trends. The first attempt I made at building an analytical tool on top of this dataset is a word trend explorer. This is meant to provide users with the a ability to search for any words or terms in the news, review how often the term has been used across time, observe the trend by time only, category or word and source as well, and dive into the specific details by providing the word/phrase content/context for reach search. There are other capabilities that were critical to any such tool as well: performance – it has to be fast, useful – it has to be functional and interesting, usable – it has to be fairly intuitive to use and beautiful – it has to look good and meet modern design standards.

I also wanted to use some tools and technologies that are widely adopted and can be related to by many organizations and users, who can leverage this example as a demonstration of capabilities for their own projects.

So, this dashboard was built using Tableau. It is hosted on Cleartelligence tableau server and is exposed via a proxy to the outside world. We are using trusted authentication to bypass the Tableau login screen and allow users to access the specific content straight from the World Wide Web.

To provide full scope of analysis to the entire dataset, we are leveraging a live connection in this dashboard. It would have been impossible to use an extract with so much data. As you use the tool you will discover that the performance of the live connection, which relies on an SAP HANA in-memory database, is as fast as an extract (if not faster..).

While developing the dashboard several UI techniques were used to refine the look & feel of the dashboard, from leveraging images, and applying various formats to more refined techniques.

Finally, displaying the word mentions in the dashboard required working with BLOB objects in the database (the news data is stored in BLOBs that are parsed in HANA to words with sentence and offsets indices). This required implementing some out-of-the-box functionality in the form of a java program to obtain the relevant blob data from HANA, extract the relevant sentences and display it within the Tableau dashboard using a URL action that is triggered when you click on a month point in the trend bar.

Furthermore, to support a modern web site experience, we implemented a modal window popup from the Tableau dashboard that is being triggered by the URL action.

I hope you enjoy this tool and use it to learn about interesting news related trends. I look forward to adding additional functionalities to this tool, and welcome suggestions and other feedback!

Posted in BI At Large, Data visualization, HTML5, SAP HANA, Tableau | Tagged , , , | 3 Comments

Audit Tableau Filters Application With the JS API

tableau_js_api_audit
While Tableau Server provides good auditing capabilities in terms of logins and dashboard views, in some cases, you need very detailed and granular auditing of certain dashboards. Like other websites, you may need to understand exactly which features of a dashboard are used by understanding how users interact with a dashboard. Which filters they are using and how, what navigation paths are taken, and what selections are being made by different users. If you do need to get to this level of auditing detail, you can use the Tableau javascript API which provides programmatic means to capture all the interactions on a viz. As a proof of concept, I went ahead and created this demonstration. The proof of concepts consists of two dashboards, laid out side by side on a web page. The left dashboard is being audited. It is loaded with my Oscars data as an extract, and contains several filters. You are encouraged to use the filters and click around this dashboard. The dashboard on the right uses a live connection to the audit database and can be refreshed to view live data by clicking the refresh button on top. As you click on filters on the left, you will see them populated on the right…
To accomplish this, I needed to address three areas of development: database backend audit table, server side service to process the audit event and save them in the table, and a frontend javascript component to utilize the Tableau JS API, capture user clicks and pass thenm to the backend service.
I began with the database layer and created a simple auditing schema with some information I wanted to track about it in a database table. The attributes I am tracking are the audited dashboard workbook file name and url, the dashboard name being clicked on, the username of the tableau user evoking the action, the filter field being applied, the field type, whether the filter is being used in include or exclude mode and the filter values being applied.

tableau_js_api_audit_table
With a database table in place, I went on to write a RESTful web service that I could connect to from a webpage via javascript and pass info to be saved in the table.
Due to the asynchronous nature of the filtering and auditing sequences, I ended up putting some more login in the service class to be able to parse a JSON string that may contain multiple filter events. Since a single filter event may contains several values being applied, to save on the “chatter” between the dashboard and the server. This is despite the fact that the auditing is happening in the background asynchronously and does not impact the dashboard performance.

tableau_js_api_audit_service_class

The third leg of this audit stool was the javascript needed to collect the user events and pass them to the service. The Tableau JS API provides methods to get filters and selections, as well as events that can be listened to for both operations. This was the hook needed to develop the audit capability. To handle filters, a hidden sheet is included in the dashboard and is “probed” for the filters being applied to it from the dashboard on each filter or select event. Selection events are also captured. In developing this script, I did not use any framework such as AngularJS, or even jQuery, and kept to the basics.

tableau_js_api_audit_js

Finally, I created another dashboard connected live to the audit database, to explore the audit events being captured on the first dashboard, and created a webpage to lay them out side by side. One final issue to grapple with was handling the trusted authentication SSO through the API, and was that was sorted out, the demo was complete. Enjoy!

Posted in Tableau | Tagged , , , , | 3 Comments

Tableau Infographic dashboard – it’s not the tool which makes a dashboard cool

Ever since Josh Tapley published the infographic style dashboard, I have been working to reproduce this design in various tools. I used SAP webi and crystal reports to create similar versions in the past and recently produced another version in Tableau. This is an interactive dashboard, and clicking the Change Data button will refresh the dashboard with new random data. Click the image below for the interactive tableau workbook.

This is another example of why proper design in a dashboarding project is just as important (and in some cases more..) as data design. It is also an example that illustrates how Tableau visual capabilities can be stretched from a dashboarding perspective.. Enjoy!

TableauInfoGraphic

Posted in Data visualization, HTML5, Tableau | Tagged , | Comments Off on Tableau Infographic dashboard – it’s not the tool which makes a dashboard cool

Change the order of execution of connections in Tableau

I recently ran into a situation where I had to change the order of refresh execution in a tableau dashboard that had multiple connections (extracts) that support multiple worksheets/dashboards.

The original order of execution of the connections is determined by their creation order. The connection that gets created first, gets executed first, then the second, etc, when extracts are used. Tableau actually appends a timestamp to the connection name upon creation, and uses that timestamp appended name to execute the extracts in order when it’s time to refresh them.

For example, I created a tableau dashboard with 3 simple extracts, each one connected to a different excel file. The connections were created in order (Conn1, then Conn2, then Conn3).

01_original_exec_order

Now, I would like to change the order in which these connections are refreshed…

My tableau file was saved as ConnOrderOfExec.twb. To view and modify the order in which connections will be refreshed, follow these steps:

1. Make a backup copy of the original .twb file

2. Rename the .twb file to .xml

3. Open the .xml file with a good and simple text editor (notepad is fine, not word please)

4. You will find a datasource tag with a caption attribute which is the nice name you given the connection and and internal name attribute for each connection with would follow the form of databasetype.timestamp

01_tableau_file_xml

5. To change the order of connections execution on refresh, simply replace the .timestamp with a number that represents the order of your choice! Be sure to use replace all to replace all the occurrences of the timestamp you are looking to change

03_tableau_replace_conn_timestamp

6. Save the .xml file, close it and then rename it back to .twb

7. Open the .twb file and now the extracts will refresh in the new order you determined!

04_tableau_conn_order_of_exec_changed

Posted in Tableau | Tagged | Comments Off on Change the order of execution of connections in Tableau