Steamer (RoS) (?)

benembry · April 13, 2021, 10:58pm

Is there a page that explains some of the projections on the Fangraphs site? I’m specifically interested in learning more about Steamer (RoS).

(I assume RoS stands for Rest of Season. If so, how often does this update? Does it take into account what has happened so far this season?)

If anyone can explain it in this post, that would also work lol

walt526 · April 14, 2021, 1:09am

I would assume that it’s based on some sort of Bayesian analysis, but I don’t actually know for sure.

It’s a little complicated and technical–and I’m not a Bayesian so I’m probably going to a crappy job explaining it–but I’ll try to summarize what I think they’re doing.

First, there are three elements in a Bayesian analysis: prior distribution (original projections), sampling distribution (new data), and posterior distribution (rest of seasons [ROS] projections).

For the sake of simplicity, let’s say that you’re just interested in projecting a Batting Average (a simple binomial has some convenient properties when working within a Bayesian framework). Suppose our prior is .300 but we observe half a season of .250 (75/300). What should the ROS be based on having observed the actual .250 BA over 300 AB compared to an expected .300 based on 3+ years worth of data?

Without getting too far into the weeds, we’d probably estimate what’s called the parameters of a Beta distribution, which yields two shape parameters: alpha and beta. One of the convenient features of a beta distribution is that you can interpret alpha as the number of hits and alpha+beta as the number of at bats. We can then use the BA formula to update the projection.

For sake of simplicity, assume the prior X~Beta(300,1000) and assume 75 hits in 300 AB. Then:

ROS=(300+75)/(1000+300+75)=0.273 (i.e., this is our posterior estimate)

That is, based on having observed 75 hits over 300 at bats (i.e., .250, our sampling distribution) in a half season worth of data, we would make a ROS projection for .273 (our posterior distribution) given those actual data and the prior that he’s a .300 BA hitter (our prior distribution).

Note that I’ve skipped over a few crucial steps to just posit that the prior is X~Beta(300,1000). It’s not as simple as saying a projected .300 hitter implies X~Beta(300,1000). But it keeps the math simple for the purposes of a relatively nontechnical explanation. And you could weaken the prior by assuming/estimating lower values for alpha and beta. If you weaken the strength of the prior, then the posterior will be lower than .273 and closer to a .250 BA. FWIW, one of the reasons why I’m always a little skeptical of Bayesian analysis is that the choice of assumptions about the prior is always a little arbitrary.

Hope that makes some sense. And apologies to any Bayesians who might read this who will be offended by the gross simplifications–it’s a combination of my own limited understanding as well as trying to avoid minimizing the math!

EDIT: Corrected a mistake in my math (I forget to include the alpha=75 in the denominator, so the ROS=.273, not .288 as I originally miscalculated).

valis2374 · April 14, 2021, 1:59am

Of course there’s a whole book on this: “Introduction to Empirical Bayes: Examples from Baseball Statistics” by David Robinson.

@benembry – Steamer is closed source, so you won’t find much explicitly detailing how it works; however, this article offers a good overview on projection systems in general and has some specific details on Steamer from its creator: 2021 Projections – How the Experts are Handling the 2020 Season | RotoGraphs Fantasy Baseball. (Side-note: I swear Ariel Cohen did an episode of Beat the Shift where he basically talked through the contents of that article… and now I cannot find it. Alas.)

Also, unexpectedly, MLB.com has an entry in their glossary on Steamer which suggests it updates daily on FanGraphs.

benembry · April 14, 2021, 3:21am

Perfect. Thanks