Most(ly) Valuable Players: Reranking the Hall of Fame and other flaws in MLB’s award system.

Not much has changed regarding how players are selected for annual or career awards in Major League Baseball. Maybe you think that’s a good thing. I find it incomprehensible knowing the statistical value of advanced analytics over traditional metrics. Why we continue to laud irrelevant categories like a player’s batting average or a pitcher’s walks-plus-hits-per-inning-pitched when they have a distant relationship to runs scored is beyond me.

For the record, batting average only has a 27.8% positive relationship with runs scored, and WHIP a 39.4% relationship. I’ve said it before, but it bears repeating—they are barometers of how well the engine is performing and nothing more. Sure, MLB and the elite voters get awards right more often than not. That’s still not good enough for me. As a stat aficionado, pass-fail is terribly imprecise. I need concrete data.

I’m getting ahead of myself. This column is a narrative of my in-depth statistical review of every player in the Hall of Fame and each major annual award since its inception. I examined MVP winners, that Outstanding Pitcher Award (I henceforth refuse to refer to it by the player it was named for), and its champions, Relievers of the Year, and Rookies of the Year. Lesser awards are not included because, well, I’m only one man, and this ain’t my day job. My data-driven analysis is intended to determine how well great baseball minds have recognized and rewarded achievements on the field. Everything that follows is based on actual evidence that I’ve attached for your review.

SKIP AHEAD IF YOU ARE FAMILIAR WITH MY ALGORITHMS AND EVALUATION TOOLS…

If you’ve read through my other work, you know I’ve developed my own advanced analytics system to evaluate the game properly. If you haven’t read those pieces, shame on you. I kid, of course. They are available here. In a nutshell, I was tired of metrics that looked good on paper but had no meaningful relationship with runs scored. I created the most accurate runs estimation algorithm, well, ever. It’s the equivalent of 4K picture quality to your grandparents’ old black-and-white tube TV. I took what I learned from that project to devise batting and pitching evaluation tools that correlate with runs scored and permit actual comparisons across the batter-pitcher barrier.

Some of the features of my tools look similar to others you may be familiar with. I did so to keep the learning curve to a minimum. Effectiveness scores are built on a 0 to 1 model—similar to on-base percentage—because that’s the industry standard. In an unweighted setting, an EFT score of 0.5000 would be perfectly average. Effectiveness-plus (EFT+) scores are the weighted variety. They represent a percentage above or below the league average for the given years tested. A player with an EFT+ of 120 is 20% above league average. Finally, run value (RVAL) is the all-encompassing volume stat. It takes a player’s EFT+ and multiplies it by the number of plate appearances. A score of 0 means a particular player neither produced more or fewer runs than would be expected of the league-average player over the timeframe tested. For pitchers, RVAL measures run prevention against the league average. Here again, these metrics allow for interchangeable comparisons between batters and pitchers.

The extraordinary element about these tools is how much they correlate with runs scored (for batters, we see a positive 73.8% relationship while run prevention creates a negative 73.8% relationship for pitchers). As a sidebar, weighted-on-base-average, that new and improved end-all measure of offensive prowess only merits a 33.7% relationship with runs scored since 1876. If we narrowed that timeframe to the modern era—essentially 1901 to the present—wOBA moves further away from explaining runs scored with a dwindling 21.1% relationship. No other statistical tool—traditional or sabermetric— comes nearly as close as EFT and its derivatives in measuring performance against runs scored. They are three-and-a-half times more precise than wOBA, almost twice as precise as WHIP, and nearly 6% more precise than on-base-plus-slugging at evaluating total runs scored.

Underneath the hood, the level of detail and precision becomes more noticeable. Total-bases-per-plate-appearance (TB/PA) and total-base-run percentage (TBR%) have a relationship greater than 99% to a batter or pitcher’s EFT score. It means looking at only a single stat line provides an incredible barometer of how well a particular player is doing.

Don’t even get me started on the mysticism associated with win-shares. Perhaps, someday we will have sufficient data to correctly measure each player’s individual contribution to a “win.” For the time being, you’re better off asking a psychic because there is no observable way to individually categorize win-shares because they cannot be sufficiently linked to runs scored. It requires assigning theoretical run-share values from whichever runs estimation algorithm you’re using (save yourself the trouble and use pOR2) to arrive at equally theoretical win-share values that simply don’t add up to the sum of the parts because of noise in any data set. If Ross Gellar from Friends were here, I suspect he’d say, “that’s not a thing” when asked about win-shares.

YOU’VE SKIPPED FAR ENOUGH AHEAD. READ ON AND LEARN!

Okay. I’m done with my preamble. If you skipped ahead to this point, you missed an impassioned monologue about the veracity of my evaluation tools. Not to worry, you’ve arrived just in time for the main event. Sit back, grab some popcorn and enjoy. A baker’s dozen pithy, declarative statements are incoming in three, two, one…

The most valuable players in MLB history are going to score well on any objective tool. The top ten Hall of Famers in run value (RVAL) shouldn’t surprise you, except for possibly the order: Babe Ruth, Hank Aaron, Ted Williams, Ty Cobb, Stan Musial, Lou Gehrig, Willie Mays, Nolan Ryan, Jimmie Foxx, and Tris Speaker. Well, maybe one of those names surprises you.
Babe Ruth is the clear choice for the most valuable baseball player of all time. He is 225 runs ahead of the second-place contestant (1070 to 845). The only player who could have even argued to keep it close is Ted Williams. Even then, you have to speculate on the almost five seasons The Splendid Splinter missed to his service for our country.
The most effective player of all time was never allowed to play in the American League or National League because of the color of his skin. No, I’m not talking about Satchel Paige. I’m referring to Josh Gibson. The Negro Leagues legend has a higher career EFT+ score than even Babe Ruth (132.9% to 129.7%).
Nolan Ryan is the most valuable pitcher of all time. The gap between him and second place (Greg Maddux) is almost as large as Ruth’s margin over Aaron. It’s not surprising when you look at the data, though. The Texas hurler tops all HOFers in the fewest total-bases-per-plate-appearance and ranks third in total-base-run percentage. Add that to 27 years of throwing the hard stuff, and you beat the competition like they were Robin Ventura.
It’s good to be a Yankee. Whether needing that extra push to get into the HOF or edge out a fellow player who outperformed you for MVP, if you’re wearing pinstripes, you get the benefit of the doubt. Don’t believe me? Just look at the number of Yankees on the attached lists who were league average or worse to get the nod.
It’s never been more clear how overrated defense truly is. Ozzie Smith was 250 runs below average for his career as a batter. Even if we could take the sabermetric fielding stats at face value (we can’t…see my previous articles), the Wizard would still be in the red for run value. By my math, he needed to be around 400 runs above average on defense to have a HOF case with hitting like that. Newsflash: There aren’t enough possibilities in a season for a player to average more than 20 defensive runs above average for 19 years.
Twenty percent of the Hall of Fame is made up of players who were average or worse. 24 players were actually below league average. 39 players were less than two percent above average. 15 were favored by more than 80% of ballots; four were favored by more than 90% of ballots. The clear victor here is Derek Jeter. He was 1.6% above average yet garnered 99.7% of the HOF vote. If you’re confused, see rule #5 above.
32 pitchers in the HOF had a better RVAL than Cy Young. 58 pitchers had higher EFT+ scores. In fact, the only thing old Cy did well was keep his walks and hit by pitches down. Tell me again why he has an award named after him…
If Clayton Kershaw retired today, he’d be seventh all-time among pitchers in RVAL. Five of the guys ahead of him are in the HOF. The other guy ahead of him won seven Outstanding Pitcher Awards but may have used PEDs to do so.
Salvador Perez’ 2020 season would rank 75^th in EFT+ for MVP seasons, were he rightly recognized for his remarkable accomplishment. Catchers have won 18 MVPs. Salvy’s 2020 season would have ranked 6^th all-time for catchers in historically-weighted effectiveness. He finished that season 19% above average.
Voters were equally bad when selecting players for yearly awards. Seven MVPs had average or worse seasons since 1911 (less than two percent above average); eleven Outstanding Pitcher Awards were handed out for mediocrity since 1956; three relievers were undeserving of a league award that’s only been around for 45 years; more than thirty rookies were celebrated for underperforming since 1947. Taken as a whole, Baseball still has work to do in evaluating award-worthy seasons.
2003 Eric Gagne was peak-performance pitching. Not only was he a whopping 18% above average, but he also prevented 36 runs above average…as a reliever. Gagne’s 2003 season was more valuable than 80 Outstanding Pitcher Award seasons. But I’m supposed to believe the elite voters know what they’re doing.
Finally, great pitching is much, much more difficult to achieve than great hitting. The average HOF batter was 9% above average for his career. The average HOF pitcher was 2% above average. The same is true at the yearly level. The average MVP batter season was 17% above average. The average MVP and Outstanding Pitcher Award season were 5% above average. It really puts Gagne’s 2003 season into perspective.

So, there you have it. A thorough analysis of MLB award-worthy seasons and career achievements further demonstrates the need to overhaul how we evaluate baseball players. I wish I could say that baseball is getting closer to getting it right. Unfortunately, the data doesn’t support this conclusion. Even recent award winners have less meritorious performances than their peers. It’s time for baseball to get better at using the tools (software-driven regression analysis) we have at our disposal to understand the game. That won’t happen until readers like you demand more.

EFT+ and RVAL by MLB Awards