Wednesday, October 1, 2014

Correlation in Tableau

I will start this post out by saying I am by no means a statistics expert.  I took a lot of math courses while getting my physics degree, but statistics was never my forte.  However, I know enough to be able to translate mathematical formulas into Tableau's calculated field syntax.

Before getting into the nitty gritty of the math, I just want to point out that Tableau will automatically compute the r-squared value if you add a trend line to any scatterplot (right-click the trend line and choose 'Describe Trend Model'):

While you can copy/paste this into an annotation or some other calculation, it wont dynamically update when your data updates.  Therefore if we wanted to compute the correlation so we can use it in other places in Tableau we must compute it in Tableau's calculated fields.

First let's start with the formula for correlation ('r'):

Now let's map those to Tableau calculated field functions:
N = size()
Sigma XY = window_sum(sum(X)*sum(Y))
Sigma X = window_sum(sum(X))
Sigma Y = window_sum(sum(Y))
Sigma X^2 = window_sum(sum(X)^2)

Sigma Y^2 = window_sum(sum(Y)^2)

Putting it all together results in a pretty gnarly calc, but it's do-able:

Once you've written your calculation using whatever your X and Y fields represent then drag your new field onto the Detail shelf to include it in the view.

Note:  since this uses table calculation functions, we must tell Tableau what to compute along. After adding the field to your view, right-click it and choose Compute Using->[dimension that makes each point unique in your scatter].  You could probably also leave it on the default 'Table (Across)' but in case that doesn't work, select the specific dimension that is making all the points in your scatter:

Now, where to display it?  While it wont let you insert this field into Area or Point annotations, you can insert it into the Title or into a Mark annotation.  If you use the Title, just edit the title of the view and use the Insert button in the upper right to stick the Correlation field in there:

If you use the mark annotation, just pick any of the marks in your scatter and choose 'Annotate->Mark'.  You can then delete all the other fields in there and just show the correlation.

Lastly, format it to remove the 'line' and 'line end' so it appears as its floating (like an Area annotation):

Final product:


-The Don Data

1 comment:

  1. Nice work. Do you know if there's a possibility to annotate Trend lines in Tableau ?