Making an interactive NBA shot chart with Vega

Here’s the full interactive demo, the following is just an image!

James Harden Shots in the Free Throw area

Motivation

Inspired by @savvas_tj’s post on creating NBA shot charts in python, as well as Kirk Goldsberry’s articles on Grantland, I wanted to prototype an interactive shot chart using vega.

I used vega, because I hadn’t really used it much before and the grammar of graphics model for producing visualizations is super cool and powerful. Also, a recent release of vega added a system for declarative interaction design, which looked like it would make adding interactions easier.

The Data

savvas_tj does a great job detailing the data format, so I’ll just explain what I did with vega to work with the data.

Particularly, I added data for distance from hoop, points scored, the bins to be used in the histograms, brushed field goal percentage, and brushed points per attempt.

The first transformation was to add new columns to the original data, POINTS, MADE_POINTS, hoopdistance, and bins:

{
  "name": "table",
  "transform": [
    {"type": "formula", "field": "POINTS", "expr": "parseInt(datum.SHOT_TYPE)"},
    {"type": "formula", "field": "MADE_POINTS", "expr": "datum.POINTS * datum.SHOT_MADE_FLAG"},
    {"type": "formula", "field": "hoopdistance", "expr": "sqrt(pow(datum.LOC_X, 2) + pow(datum.LOC_Y, 2))/10"},
    {"type": "bin", "field": "hoopdistance", "min": 0, "max": 90, "step": 1, "output": { "bin": "bin_hoopdistance" }},
    {"type": "bin", "field": "LOC_X", "min": -260, "max": 260, "step": 5, "output": { "bin": "bin_LOC_X" }},
    {"type": "bin", "field": "LOC_Y", "min": -50,  "max": 470, "step": 5, "output": { "bin": "bin_LOC_Y" }}
  ]
}

Then I created a new data source that made aggregates for the histograms like:

{
  "name": "distance",
  "source": "table",
  "transform": [
    {
      "type": "aggregate",
      "groupby" : { "field": "bin_hoopdistance" },
      "summarize": {"hoopdistance": ["count"]}
    }
  ]
}

Lastly calculating the field goal percentage and points per attempt:

{
  "name": "percentages",
  "source": "table",
  "transform": [
    {
      "type": "filter",
      "test":  `${ShotChartInteractionFilters.distance} && ${ShotChartInteractionFilters.LOC_X} && ${ShotChartInteractionFilters.LOC_Y} && ${ShotChartInteractionFilters.brush}`
    },
    {
      "type": "aggregate",
      "summarize": {"*": ["count"], "MADE_POINTS": ["sum"], "SHOT_MADE_FLAG": ["sum"]}
    },
    {
      "type": "formula",
      "field": "FGP",
      "expr": "datum.sum_SHOT_MADE_FLAG / datum.count"
    },
    {
      "type": "formula",
      "field": "PPA",
      "expr": "datum.sum_MADE_POINTS / datum.count"
    }
  ]
}

Adding these to the data declaration portion of the vega spec, will now provide data to our visual components that looks like:

// Added columns to the original table
+-----------------------------------------------------------------+
| POINTS | MADE_POINTS | bin_hoopdistance | bin_LOC_X | bin_LOC_Y |
+--------+-------------+------------------+-----------+-----------+
| 2      | 0           | 10               | 60        | 80        |
| 3      | 3           | 25               | 15        | 20        |
| 2      | 2           | 20               | 16        | 12        |
| ...                                                             |
+-----------------------------------------------------------------+


// New table: 'distance' from aggregating the original table
+---------------------------------------+
| bin_hoopdistance | count_hoopdistance |
+------------------+--------------------+
| 0                | 30                 |
| 1                | 60                 |
| 2                | 80                 |
| ....                                  |
+---------------------------------------+

// New table: 'percentages' from filtering and aggregating the original table
+-------------------------------------------------------------+
| count | sum_MADE_POINTS | sum_SHOT_MADE_FLAG | FGP  | PPA   |
+-------+-----------------+--------------------+------+-------+
| 23455 | 30202           | 12000              | .511 | 1.287 |
+-------------------------------------------------------------+

The Visual Components

The visual components for the shot chart break down in to the following sections:

The shot scatterplot is made by stating each shot should be drawn as a circle that is positioned based on the LOC_X and LOC_Y table fields. This is done in vega like:

{
  "type": "symbol",
  "from": {
    "data": "table"
  },
  "key": "shot_id",
  "properties": {
    "update": {
      "shape": { "value": "circle" },
      "x": {"scale": "x", "field": "LOC_X"},
      "y": {"scale": "y", "field": "LOC_Y"},
      "fillOpacity" : { "value": 0.5 }
    }
  }
}

Each component has statements similar to the above. You declare the data and a mark type that represents how the data will be seen visually. The visual properties like position, size, color, shape are then based on the data or set manually.

Adding Interactions

In vega, interactions are specified as signals that listen to a stream of events like mousedown, mouseup. They are defined with vega’s Event Stream Selector syntax that is similar to DOM selectors.

From these events we extract information like mouse position and then transform that in terms of the data’s values. For example, to add the brush+linking interaction to the histogram component, we first define signals like:

[
    {
      "name": "distStart",
      "init": -1,
      "streams": [{
        "type": "@distGroup:mousedown",
        "expr": "eventX(scope)",
        "scale": {"scope": "scope", "name": "x", "invert": true}
      }]
    },
    {
      "name": "distEnd",
      "init": -1,
      "streams": [{
        "type": "@distGroup:mousedown, [@distGroup:mousedown, window:mouseup] > window:mousemove",
        "expr": "clamp(eventX(scope), 0, scope.width)",
        "scale": {"scope": "scope", "name": "x", "invert": true}
      }]
    },
    {"name": "minDist", "expr": "max(min(distStart, distEnd), 0)"},
    {"name": "maxDist", "expr": "min(max(distStart, distEnd), 50)"}
]

This defines 4 signals to be used as data. The signal called distStart, listens to a stream of mousedown events on the distGroup visual component. From the events it calculates the horizontal position with an expression eventX(scope). This horizontal position is then mapped to a data value by inverting the scale that positions a data value to a horizontal position.

Now we can use distStart as a value in formulas, because it has a value in the data space lifted from the visual space. For example, the signals from above are used in a filter like:

{
  "type": "filter",
  "test": "datum.hoopdistance >= minDist && datum.hoopdistance <= maxDist"
}

This filters the data for shots that had a distance from the hoop in between minDist and maxDist.

The shot scatterplot signal is similar, except it listens on events in both the horizontal and vertical direction.

Things to add?

Thoughts

Working with vega was pretty fun, and I think overall made the process for building this prototype probably quicker than it would have been otherwise.

Last thoughts on using vega: