Early Season Returns Make Sabermetrics Look Crazy

I’ve dedicated a few past articles to explaining basic sabermetric concepts like run expectancy, linear weights, FIP and BABIPHalfway through the second week of the season, there are still some very strange stats out there which will regress to more reasonable levels over the course of the next few weeks.

Stats like weighted on base average (wOBA), weighted runs created and weighted runs created plus (wRC and wRC+), and weighted runs above average (wRAA) are all  based on linear weights and those outcomes.

wRC+ and wRAA are particularly interesting measures because they measure a player’s hitting to league average production, with 100 being a league average wRC+ and 0 being a league average wRAA. So with the hot starts of Adrian Gonzalez, Adam Jones, and Miguel Cabrera, they have incredibly inflated wRC+ measures.  They will come down with time, but as they currently stand Gonzalez is at a 402, Jones a 278, and Cabrera is at 250.  This also leads to some very weird looking wOBA numbers that look almost impossible to believe—743, .579 and .538 respectively.  These guys might all be very good hitters, but it’s certain that these stats will go down to a more reasonable level in the near future.

Embed from Getty Images

Another feature of this small sample size effect is that the pool of outcomes isn’t big enough to stabilize to a player’s performance level, no matter how good or bad he is. This creates incredibly funny looking BABIP numbers. Don’t worry, you aren’t measuring wrong. They just look crazy.

BABIP is very dependent on luck or random variance, which can punish some very good players with unfairly low BABIPs. In the bottom 10 in BABIP with numbers under .130 are names like Jonathan LuCroy, Ryan Zimmerman, Alex Gordon, Albert Pujols, and Russell Martin.  In these cases, regression to the mean applies to the opposite as it did in the case of those guys with insane wOBA and wRC+. Pujols and company are wildly underperforming their reasonable BABIP and will regress up to a much more reasonable level.

Two more key sabermetric concepts include looking not just at BB/K ratio but at individual player’s walk rate (BB%) and strikeout rate (K%).  A perfect example of someone who is over performing his career levels is the Mets Curtis Granderson. Through 32 plate appearances, he has a 31.3 BB% and 9.4 K%, compared to his career numbers which are nearly the opposite 10.5 BB% to 22.9 K%.

Then there’s the case of Evan Gattis, a fairly good but strikeout-prone hitter.  If you include his triple slash line (looking at a players batting average, on-base percentage and slugging percentage to see how much he is producing), it comes out to .071/.103/.071 with a 3.4 BB% and 44.8 K%. Check again in mid-May and he’ll have much more reasonable looking stats.

On the pitching side it is much the same story where small sample sizes create some very weird looking results.  Take Lance Lynn of the St. Louis Cardinals. Before his most recent start, he had what was a fairly incredible stat line:

40.9 K%, 4.6 BB%, 0 % HR/FB and 0.50 WHIP and 0.97 FIP all on a 50% strand rate (LOB%)

Embed from Getty Images

Those are impressive numbers for one start. But it also shows that no matter how good your core numbers are (the strikeout, walk and home run rate),  the other factors interfere.  These peripheral stats include strand rate, BABIP against and just plain luck.

The three pitchers in the bottom three in strand rate so far in 2015 are Clay Buchholz, Masahiro Tanaka, and Kyle Lohse. They all have LOB% rates under 50% and ERAs over 7.  Tanaka may be injured and Buchholz is coming off a bad year, but it’s fair to say that they won’t have either stat that skewed out of the median for much longer.