Sabermetrics 101: Understanding Context and Value

Much like doing their math homework or learning a foreign language, a lot of people can find that understanding and getting into sabermetrics is a daunting prospect.  It’s full of acronyms, long complicated words, deep concepts, and even a little bit of statistical theory.  Learning sabermetrics and advanced analytics doesn’t have to be that hard. Ultimately, it’s little more than looking at the game in a different way.

Most baseball fans still rely on statistics and concepts that were developed as far back as one hundred years ago.  The famous sportswriter Henry Chadwick invented the ERA stat in the 1800s as a way to measure a starting pitcher’s performance. By the beginning of the twentieth century, it was an accepted way to do so.

box score

The individual pitching win is also a relic of the nineteenth century.  In baseball’s early days, pitchers would routinely throw complete games every time they took the mound and would pitch on one or two days of rest. In those days, it was logical to try and pin down a winning and losing pitcher for each team because their performances were so pivotal to a team’s success.

Today where pitchers often go six innings and a cavalcade of relief pitchers take the mound in the late innings, it is much harder to confirm that a pitcher was fully responsible for his team winning or losing on any given night.  It is also hard to give praise for a pitcher who plays for a good offensive club and thus doesn’t have to work as hard for his wins. It’s equally difficult to blame a pitcher who plays on an offensively challenged squad for his tough losses.  The easy win and the tough loss are the perfect example why the win stat is nearly entirely useless.

At the plate, the value of traditional “back of the baseball card” statistics should also be called into question.  Sabermetricians most notably question batting average and the much maligned RBI.  The problem inherent in both of these stats is context or the lack thereof. Batting average counts a seeing-eye single and grand slam as the same, and RBI is so contextual that it has almost no value as an analytical tool at all.

These are the concepts we will deal with today: the idea of context and how it influenced the creation of many of these advanced metrics that are so popular today.

 

Run Expectancy, Context and Value – Improving on RBI

A key concept to understand is that of Run Expectancy, commonly called RE24 (run expectancy per the 24 base-out states).

RE24 is rooted in the idea that baseball is a static game that can be analyzed like a snapshot in time. The 24 in RE24 refers to the three different out states (zero, one or two) and the eight possible baserunner arrangements (none on, one man first, men on first and second, so on).

What RE24 measures is the context of the value of each event.  Using a chart similar to this one called a Run Expectancy Matrix which is weighed to the current run scoring environment

Runners 0 Outs 1 Out 2 Outs
Empty 0.461 0.243 0.095
1 _ _ 0.831 0.489 0.214
_ 2 _ 1.068 0.644 0.305
1 2 _ 1.373 0.908 0.343
_ _ 3 1.426 0.865 0.413
1 _ 3 1.798 1.140 0.471
_ 2 3 1.920 1.352 0.570
1 2 3 2.282 1.520 0.736

 

So take, for example, a hitter who hits a double (0.644 RE) with man on third and one out  (1.426) and drives the run in. To find the RE24 of the event, you take the starting base-out state and subtract it from the resulting one and add the number of runs scored.

1.426 – 0.644 + 1 = 1.782

By driving in that run, the batter added the chance of one run plus another .782 possible (expected) runs to be scored.

RE24 is the opposite of RBI because it is entirely context driven. The run expectancy added is a definite value of how much each batting event during a game either helped or conversely hurt a team’s chance of winning.

 

Linear Weights- the Inherent Weakness of Batting Average as Stat

Another foundational sabermetric concept is that of Linear Weights.  Linear Weights were developed by John Thorn and Pete Palmer in their groundbreaking 1984 The Hidden Game of Baseball.  In it they come up with a stat that more accurately measures a player’s contribution regardless of context.

The idea behind linear weights is simple: You wouldn’t walk up to your friend as ask how many coins do you have in your pocket? You would want to know the exact value of said coins.

It is obvious to say that a penny, nickel, dime, and quarter all have different values just like singles, doubles, triples, and home runs.  That is the weakness of batting average— it measures singles and home runs as if they were exactly the same.

 

Tom Tango expanded on Thorn’s and Palmer’s work and married it to a chart similar to the RE24 matrix above. He came up with these results for the average value of each event

HR 3B 2B 1B RBOE HBP NIBB IBB Bunt Out SB CS PO
Average 1.409 1.063 0.764 0.474 0.546 0.385 0.33 0.102 -0.001 -0.299 0.195 -0.456 -0.255

 

Linear Weights are among the most important concepts to understand because so many of the stats that are used today (wRAA, wOBA, wRC+. UZR and UBR) are based linear weights stats.

It is very important to remember that these are averages and not an absolute number. What each of these numbers represent is the average amount of runs produced, in a given run environment, by each event.

The inherent value of linear weights and thusly the stats that use linear weights is that they can actually empirically measure a player’s contributions.

 

What to Take Away

Run expectancy and linear weights address two big problems with the most popular traditional stats, the lack of context inherent in RBI and the inherent inadequacy of batting average as a stat. RE24 can summarize how much more likely or less likely a team is score and therefore win using a handy chart. Linear weights can be used to accurately measure what each event is worth and how likely it is to help a team score runs.