My first real model, maybe

6 min readDec 15, 2021

When I started up with this, I had many ideas in my head. I thought I’d improve my knowledge with baseball stats, statistics in general. I thought I’d improve existing skills like writing, database querying, scripting. I really, really wanted to work on developing some sort of ability to work on dataviz as I’ve always loved numbers in a graphical form. And I think I’ve accomplished most of it.

Except for statistical knowledge.

Well. The last month of so has been dedicated to that. I’ve read about so many terms that range from familiar (r-squared, correlation, p-value) to completely unknown (t-statistic, f-statistic) and all in between. In an effort to understand how these work, I tried to build a model that could predict a starting pitchers fWAR. What a wild, challenging, exciting task this was.

Before I started I wanted to complete two tasks: get my data that I wanted to test with and figure out a statistics module within Python I could use. I ultimately decided on using pitchers from 1960–2021 and statsmodel as the package to figure out how good/bad/indifferent my model was.

There was a lot of testing with this, many different thoughts and attempts, and I now feel… ok (?) about my results.

Some data notes:

I did also remove outliers from this data. I wanted to focus on the more common results as a player getting a very high fWAR in a season is so very rare.
When performing the predictions, I adjusted the year 2020 since only 60 of 162 games were played. If I were to have this in place for a long time, I would not have. But since I wanted to see how accurate I could get it, I factored that in.
I made assumptions that both > 10 fWAR and < -2 fWAR were pretty much impossible and > 6.5 fWAR was unlikely in general.
I used 1 year/3 year/5 year trends as well as closest comps and weighted each one.

First let me display the results to the OLS summary and some brief thoughts on that.

So here’s where I start to get overwhelmed. Fortunately there are lots of good writings out there to read to fully understand a lot of this. And, from what I can gather, my numbers aren’t terrible.

Let me start first with the r-squared/adjusted r-squared. This is the big term that I had at least some working knowledge on. And my knowledge says that .417 isn’t really that good. I know that it’s said you want to see a number .75 or higher for it to be statistically significant. But I have also read that a low r-squared number doesn’t entirely mean it’s a bad model. I’ll get into that more in a little bit.

My p-value = 0.000. That’s good! I can ignore null results and accept mine.

A lot of this, though, I’m still working through trying to full understand what I’m reading.

Back to my r-squared comment. I’ve read in a few places that depending on the data you’re looking at, a low number can mean your model is good still. Looking into that, I found out that residplot() and regplot() are good ways to display the error range and your correlation. So that’s what I did:

Looking at the residplot(), it seemed to tell me that while I missed a lot a vast, vast majority were within +/-2.0 WAR. There were certainly some larger misses — and I figured that would be the case. Some times a player got injured, sometimes they were just… awful. It was going to happen.

The regplot() tells me that while I missed, I did trend the correct direction. Again — there were some large misses that were expected.

Now I wanted to breakdown my misses. So what I did was took the abs() value of the difference between my prediction and the actual for the player to determine where most of my data would fall:

+/- 0.5 fWAR: 33.66%
+/- 1.0 fWAR: 58.22%
+/- 1.5 fWAR: 74.57%
+/- 2.0 fWAR: 84.45%

This really surprised me. I was not expecting to see nearly 60% of my predictions being within 1.5 fWAR let alone 1.0, and that made me feel very good. I was more than quite happy with that.

Realizing this, I wanted to then take a look at what seasons were the worst. I broke it up into three different category: greater than +1.5 fWAR, less than -1.5 fWAR, and lastly greater than absolute 1.5 fWAR. My results:

Top 10 seasons with > 1.5 fWAR
2020    0.051101
2005    0.029075
2016    0.029075
2013    0.027313
2010    0.024670
1998    0.023789
2012    0.023789
1991    0.022907
2018    0.022907
2001    0.022907Top 10 seasons with < -1.5 fWAR
1981    0.052727
2004    0.034545
1994    0.032727
1999    0.030909
1979    0.030909
2002    0.030909
2019    0.029091
2017    0.029091
2008    0.029091
2014    0.027273Top 10 seasons > 1.5 fWAR.abs() 
2020    0.034421
2005    0.025519
2004    0.024926
2016    0.024926
2012    0.024332
2010    0.024332
2019    0.023145
1998    0.023145
2008    0.022552
2014    0.022552

All of these numbers are in a percentage basis. So looking at the “Top 10 seasons > 1.5 fWAR.abs(),” 2020 had 3.4421% of the players who were off by more than 1.5 fWAR. And that’s not surprising to me at all, really, since 2020 was such a strange year. But even excluding that, my results didn’t change too much.

My next step was testing. And to test it, I needed to test it against the best.

ZiPS, and I only tested against a few players, for times sake.

Name              Season  WAR  My_WAR  ZiPS   My_diff   ZiPS_diff
Justin Verlander  2015    3.1  1.6     3.8     1.5      -0.7
Justin Verlander  2016    5.4  3.2     2.8     2.2       2.6
Justin Verlander  2017    4.1  4.9     3.5    -0.8       0.6
Justin Verlander  2018    6.6  5.2     3.7     1.4       2.9
Justin Verlander  2019    6.4  5.5     5.3     0.9       1.1
Max Scherzer      2013    5.9  6.1     3.5    -0.2       2.4
Max Scherzer      2014    5.6  5.6     4.8     0.0       0.8
Max Scherzer      2015    6.5  6.1     4.5     0.4       2.0
Max Scherzer      2016    5.6  5.8     6.2    -0.2      -0.6
Max Scherzer      2017    6.4  5.4     5.5     1.0       0.9
Max Scherzer      2019    6.5  7.0     5.5    -0.5       1.0
Max Scherzer      2021    5.4  5.9     4.9    -0.5       0.5
Eduardo Rodriguez 2016    1.2  2.4     2.4    -1.2      -1.2
Eduardo Rodriguez 2017    2.0  0.5     2.2     1.5      -0.2
Eduardo Rodriguez 2018    2.1  3.1     2.3    -1.0      -0.2
Eduardo Rodriguez 2019    3.7  2.3     2.3     1.4       1.4
Eduardo Rodriguez 2021    3.8  5.2     2.6    -1.4       1.2
Gerrit Cole       2014    2.1  2.0     2.2     0.1      -0.1
Gerrit Cole       2015    5.1  1.8     2.8     3.3       2.3
Gerrit Cole       2016    2.5  6.0     4.2    -3.5      -1.7
Gerrit Cole       2017    3.4  1.5     3.1     1.9       0.3
Gerrit Cole       2018    5.9  6.0     3.4    -0.1       2.5
Gerrit Cole       2021    5.3  5.4     4.9    -0.1       0.4

23 different season. Counting up the winners…

Me: 10
ZiPS: 9
Ties: 3

Doing this has given me SO much more respect for people like Dan, who created ZiPS, and their ability to do predictions. Seeing the ups and downs I’ve experienced in my little bit of work and knowing that they post theirs yearly for the public to rip apart… kudos to them!

My next goal is to make something for hitters and relievers next. Then, eventually, I’ll move onto minor leaguers and adjust them for major league projections. I have my “ideas” for starters next year that I can’t wait to test out against their actual totals.

Thanks for reading! And, please, send me any links you might have to help me with the statistic side of things! I really enjoy reading a number of different articles and their way of defining each portion.

My first real model, maybe

Written by jerrymckennan

No responses yet