(Read Part 1 of our SportVU series here, where we discussed the technology of SportVU, the data it produces, and potential ways the new information will affect the NBA)
The data revolution in basketball, prompted largely by SportVU, is a perfect case study for the larger ongoing data movement in many other industries. SportVU gives us a new lens with which to analyze the game of basketball. In this post we will attempt to show you that with proper statistical analysis of SportVU data and subject-matter knowledge, powerful trends can be found, and actual changes be made to a game that has been around for years.
The data available to the public is a small subset of the location data that SportVU collects, but we wanted to see what, if any, predictive power or significant information the data available might provide. A few examples of interesting questions we asked before starting our analysis: Does SportVU data indicate that some players should be paid higher than others? Are some players favored more by SportVU statistics than the traditional statistics? We could go on, but given the current relevance of the NBA playoffs, we decided to focus on this:
What differentiates a playoff team from a non-playoff -- or, in NBA lingo a "lottery" -- team?
The approach taken here will analyze data from ESPN.com and NBA.com/stats with the goal of determining which statistics, besides the win and loss columns, truly differentiates teams that make the playoffs from the ones that don't.
We pulled data for the top 500 players in terms of games played during the 2013-2014 regular season. The traditional data, as we'll call it in this article, is comprised of some of the normal statistics we talk about when discussing basketball (points per game, rebounds per game, field goal %). The SportVU data is both more unique and specific. Field goal % is broken down into situations: drives to the basket, pull up shots, catch and shoot. Entirely new statistics are also available, such as average speed while on the court and secondary assists per game, which is when a player makes the pass before the pass that leads to a shot. All told, our analysis included 25 traditional and 61 SportVU variables; a plethora of data from which we found some interesting trends
Back to our original question: Based on the 2013-2014 SportVU data, what might differentiate a playoff team from a non-playoff team? Using statistical tests, we were able to isolate both traditional and SportVU statistics that were significantly different between the players on playoff teams and those on lottery teams. All told we had 217 players in the playoff set, and 185 players in the lottery set (any player who played for multiple teams was assigned to the team that had him for the most games that season).
We first attempted to gain an understanding of our data through visual analysis. Using the statistical program R, we drew two box plots (more on box plots here). The first plot, below, shows two of the most talked about variables, points per game and rebounds per game, graphed for both the lottery and playoff teams.
In the case of these two variables, it seems like the distribution of values is largely similar between the lottery and playoff teams. But whether there is a statistically significant difference--whether we make the statement with confidence that one of the recipes for NBA success is a team who has more players with higher rebounds per game-- we cannot tell just by looking at the graphs. We must use statistical tests to find the difference we are looking for. In order to choose what test to use, we ran a preliminary test called the Anderson-Darling (AD) test, to diagnose the underlying distribution of our data. Testing for the normal distribution, we found that none of our variables could plausibly be normal. Most conventional statistical tests (e.g., the t-test) assume the normal distribution, and though we could use these tests, a more robust approach would be to use the Mann-Whitney-Wilcoxon (MWW) test, which does not assume normally distributed data. The underlying mathematics between these tests is different but the output is the same: a p-value from which we derive the significance of the difference between the lottery and playoff teams.
We ran the MWW test for points per game and rebounds per game, and found that there was no significant difference between lottery and playoff teams. What does this mean? Well for one, we aren't using all players, just the top 500 minute-getters; however, this tells us that for the 2013-2014 season, the differentiating factor wasn't a team who had many players who could individually score, or rebound. At first glance this doesn't seem to make sense, the team that scores the most wins. But we have to consider the multitude of factors that go into winning a basketball game; for instance, a team with players who score a lot of points, but do so inefficiently, may not succeed against a team who can play good defense against them and score more efficiently. At the end of the day, basketball is a team game, and the complexities of the game go beyond a single statistic. From here, we looked at the MWW test for all traditional statistics. When running on the remaining 23 variables, we found that Field Goal % and Assist/Turnover Ratio are the only two variables that we can say with confidence were different between lottery teams and playoff teams.
We can try to spin a narrative from our results: the team who can shoot well and can simultaneously share the ball well while not turning the ball over will end up making the playoffs. It is not quite that simple, but we do have empirical evidence that for the 2013-2014 NBA season, teams with those characteristics did seem to succeed more often than those teams without. Still, the narrative is weak. There are so many unknowns to those statistics (i.e., the situational aspects of a shot: a shot next to the basket is easier to make on average than one from the three point line). Let's take a look at the SportVU data, which as we've said uses player tracking to both track new statistics, and to break down the traditional ones into different game situations. Below is a boxplot matrix of seven of the 61 SportVU variables:
Again, our eyes can't exactly tell whether a significant difference exists. Further, while some of these graphs don't seem to show much of a difference from one group to another, consider that a few percentage points on a field goal % statistic amounts to thousands of shots made and missed over the course of the season. We must use the statistical tests to give us the final verdict on the difference from a lottery team to a playoff team.
We applied the AD test, and the MWW test to the SportVU data, and found the following variables are significantly higher in playoff teams than lottery teams:
- Field Goal % on Drives
- Catch and Shoot Field Goal %
- Catch and Shoot 3-point Field Goal %
- Catch and Shoot Effective Field Goal %
- Close Field Goal %
- Secondary Assists Per Game
- Points Create by Assist per 48 minutes
Before we talk about these results, we should make an important point. If this were a comprehensive analysis, we would now have to adjust our results to account for two important things:
1) Because we have used 0.05 as our level of significance for our statistical tests, there is a 5% chance every time we run a test that we have declared a variable, such as Close Field Goal %, as significant when in fact it is not.
2) The correlation among some of the variables in our analysis (e.g., Assists per game and assists per 48 minutes) could distort the true picture of the difference between playoff and lottery teams.
These two points must be kept in mind as we begin to interpret our results below, and while they could throw off some of our conclusions, they are by no means deal-breakers to our analysis.
If nothing else the SportVU data gives us a richer narrative with which we can explain the differences between a playoff and lottery team. Clearly, a team that can shoot is going to perform better, which is as far as we could get with the traditional statistics, but it's the places and situations that matter here. A team with players that can catch and shoot quickly, shoot well close to the basket, and on drives is going to succeed. This is where SportVU adds value. The traditional stats don't tell us what types of shots make a team better, but SportVU does. Notice that we don't have pull up shots, nor do we see volume of shooting (Field Goals Made or Attempted) come into play. Additionally, while we saw defensive rebounding show up in our traditional analysis, SportVU is able to break the rebounding statistic down into contested & uncontested rebounds (meaning another player is fighting them for the ball), and the results show us that there is no significant difference in the rebounding abilities of a playoff v. lottery team.
Take a second to jump in the shoes of an analyst for an NBA team. If I were to walk up to the head coach with our traditional statistics analysis and say "Coach, look! All you have to do is shoot better!" He'd dismiss me with a, well, let's just hope it would be a wave of the hand. With the SportVU data and proper analysis I'm able to get into the weeds of the game and give a more robust analysis: "Coach, given the significant difference in the shooting percentages for drives, catch and shoot situations, and close-to-the-basket shots, we may want to look to acquire players who can make those shots with efficiency. Further, the data shows that performance on pull up shots was not significantly different, we should stay away from acquiring those types of players in the offseason, and mold our strategy around avoiding those shots if possible." Is that conversation a little ridiculous? Yes. But can SportVU data change the way the game is being played? Of course it can.
Maybe it is that simple after all: the teams who make the playoffs year in and year out are the ones that can shoot efficiently. Obviously, there are more factors that influence the difference between the two types of teams, but this analysis does give us an idea of what happened in the past year, and possibly provides a blueprint for teams going forward. The value of SportVU, as well as any new type of data collection, lies in its ability to provide us more information leading to more questions and hypotheses. SportVU will never be able to give a coach the ability to draw up the perfect strategy, or a GM the perfect team, just as the newest big data collection system will not lead to a perfect understanding of a company's customers. However, these technologies were not created for that purpose. In an environment where a simple leg cramp (see James, LeBron) can turn the tide of a game, that one extra piece of information can make all the difference.