Qualifying the Unquantifiable: Clutch Performance

INTRODUCTION

“There is virtually no evidence that any player or group of players possesses an ability to outperform his established level of ability in clutch situations, however defined.”

Joe Sheehan, Baseball Prospectus

Define clutch performance and find evidence it exists? Challenge accepted.

The “burden of proof fallacy” is a logical fallacy; it occurs when someone tries to evade their burden of proof by denying it, pretending to have fulfilled it, or shifting it to someone else. I suspect that is what is going on in Mr. Sheehan’s quote… and largely what is going on in a lot of baseball conversations these days.

Teams, fans, and journalists are overloaded with data: plain old statistics, measurable ball speeds and angles, biometric output, sleep metrics, etc. It is easy to become over-reliant on the data that is available and fall into the trap of thinking that it captures all variables. What we cannot measure in baseball, we end up calling “intangibles”.

Since we are unable to quantify them (for now), my goal is to qualify the existence and impact of these intangibles; starting with clutch performance.

METHODOLOGY

First, let’s set the parameters clutch performance. There are many ways to do so, I’m sure. High leverage at-bats versus low leverage at bats. Star pitcher facing off against a star hitter. A well-timed defensive gem.

I’m interested in a large dataset, repeatable over several years of independent performances, and a clear dividing line between high pressure situations versus low pressure situations.

Instead of getting too focused on specific situations, let’s first think big picture. Ask any professional player for their highest goal or motivation – I assume most will say they want to win championships. So, simplified a bit, the ultimate goal of a player is to perform in a way that helps his team win. Furthermore, the player would like to help his team win as many games as possible in order to qualify for the postseason. Then the player would like to continue performing in a way that helps his team win with the hopes of winning enough games to achieve a World Series title.

For the purpose of this study, my proposed measurement of clutch is Win Probability Added (WPA) in a season game versus WPA a postseason game. We could review datasets spanning a player’s career, evidence would include potential year-to-year variation, and it is undeniable that a postseason game is certainly higher pressure than a season game. Win Probability Added seems like a reasonable statistic to look at since it can be measured play-to-play, applies appropriate situational context of an individual play or game, and theoretically measures the level to which a player was able to achieve his ultimate goal (helping his team win). I plan to keep my study limited to pitchers only; I imagine their game-to-game WPA flux is a bit more consistent and reliable.

My plan is to first look at a pitcher’s game log WPA in order to establish his level of ability. I then hope to prove that his playoff performance, also on a game-by-game WPA basis outperforms his established level of ability; to a statistically significant degree. To do so, I will perform a two sample t-test with the null hypothesis being that the two datasets have equal means and the one-sided alternative hypothesis being that the pitcher’s mean WPA/game in the postseason dataset is larger than that of his season dataset. If I am able to reject the null hypothesis and prove, to a statistically-significant degree, that the postseason dataset has a larger mean than the season dataset – I will have done my job. In other words, I will have proven that a player is capable of outperforming his established level of ability, measured in WPA/game, during the postseason.

RESULTS

So, the first pitcher I looked at was Madison Bumgarner, whose heroics in the 2010, 2012, and 2014 Giant’s World Series runs cemented his status as a postseason legend. Since Bumgarner’s postseason appearances span from 2010 to 2016, I focused only on his season stats from 2010 to 2017 in order to bookend his postseason sample and ensure the two samples were as representative of each other as possible. When plotted on a per-game basis, Bumgarner’s WPA statistics were visually confirmed to be normally distributed; establishing the robustness of the subsequent t-test. To confirm even further, the distribution also fit closely with the empirical (a.k.a. 68/95/99.7) rule.

After compiling Bumgarner’s game log WPA, I calculated/determined the following parameters to be used for the one-sided t-test (where μ: population mean or average, n: sample size, x: sample mean or average, s: sample standard deviation, df: degrees of freedom, α: significance level, t*: critical value):

Null Hypothesis (H_o) : μ_season = μ_postseason

Alternative Hypothesis: μ_postseason > μ_season

n_season = 234 games, x_season = 0.060 WPA/game, s_season = 0.240 WPA/game

n_postseason = 16 games, x_postseason = 0.171 WPA/game, s_postseason = 0.288 WPA/game

df = 15, α = 0.10, t* = 1.341

Using these parameters, the test statistic (t) is determined to be 1.515.

Because t > t* we are able to reject the Null Hypothesis. In other words, to a statistically significant degree and with 90% confidence, Bumgarner’s average WPA/game in the postseason is higher than his WPA/game in the regular season. Quite simply, Bumgarner showed the ability to “outperform his established level of ability” in the postseason, when it mattered most.

I wanted to ensure that Bumgarner was not the only player that accomplished my goal, so I looked at Curt Schilling next; another pitcher well-known for his postseason prowess. After compiling and analyzing Schilling’s statistics, his t-test also ended with rejecting the Null Hypothesis. In fact, Schilling’s dataset was so compelling that his average postseason output was higher than his season output to a 95% level of confidence.

Mostly satisfied with the results of my study, I wanted to see if any players would fit the opposite Alternative Hypothesis: μ_postseason < μ_season. In other words, was there a player that had an average postseason output that was lower than his season output? I turned to Clayton Kershaw who has, unfortunately, endured some apparent and well-documented struggles in the postseason. After performing the t-test on Kershaw’s statistics, I was able to reject the Null Hypothesis with 99% confidence! Wow.

CONCLUSIONS

From my perspective, I have accomplished my goal. If given the freedom to define clutch situations as postseason versus regular season, using WPA as my tool of measurement – I have no doubt that my analysis successfully proves the existence of clutch performance.

I am not quite sure that this is really a novel concept. Plenty of broadcasts for any sport will mention the “it factor”, a player’s moxie, or an ability to stand tall in big moments. But baseball journalism and sabermetrics analysis has surely left this idea of “clutch” behind; operating under the assumption that it does not exist just because we haven’t figured out a way to appropriately measure it. I hope that my analysis here brings it back to the table. Keep in mind – I did not set out to determine the magnitude of a player’s clutch ability or even figure out how many players demonstrate clutch ability – I simply found evidence “that any player or group of players possesses an ability to outperform his established level of ability in clutch situations, however defined”.

After all, I don’t think of “clutch” as some sort of magical power that some players have or don’t have. I am not a doctor, but I can imagine there is a fairly simple explanation for why one human-being is able to manage stress and perform in high pressure environments more than another human-being. It is why some people find success as surgeons or soldiers.

You can continue denying the existence of clutch performance all you want, but I am taking Bumgarner in my playoff rotation over just about anyone.