fWAR calcuations
It’s been a bit since I’ve posted anything on here for a few different reasons. But lately, I’ve been playing around with Fangraph’s WAR calculation (or fWAR for short). Trying to get the ins and outs understood and to have the ability to understand the metric itself a little better. Fangraphs does an amazing write up to explain how they get their WAR as well as a step-by-step layout for both pitchers and position players here: https://library.fangraphs.com/misc/war/
While doing this, I’ve been focusing on positional players minus catchers primarily. The reason for this is because I’m very intrigued by the Fielding Runs portion of their equation. It is the most simple part — it’s the UZR metric. That’s it. It’s short for “Ultimate Zone Rating” and, again, they do an amazing write up: https://blogs.fangraphs.com/the-fangraphs-uzr-primer/
Now I don’t personally use UZR all too often anymore. As has been shown here, I prefer OAA myself. The only problem with this is WAR is runs based and OAA is outs based. Fortunately for us (really, it’s me) the kind folks at baseballsavant have already converted OAA into a runs based metric called Runs Prevented. But even before that I was a big fan of Defensive Runs Saved, or DRS, that could also be found on Fangraphs, which is explained at https://library.fangraphs.com/defense/drs/. (the people at Fangraphs really should be editors of all articles, I swear).
So that’s where my journey started. What would happen if I replaced UZR with DRS? Or how about Runs Prevented? What would the leaderboards look like then?
For this particular project, I wanted to step away from Python and graphs initially and work purely on my MySQL skills. I knew the tables I needed to create. All of the batting and base running stats could go into one table, my Fangraphs defensive stats in a second, OAA/Runs Prevented data in a third, and then the league stats in a final one. It was very simple, straight forward, and I’m not going to full reinvent the wheel here. All the data I needed/wanted was easily available to me that I could put into spreadsheets and import into my database.
And the process truly was that. I had a few minor hiccups along the way, but nothing that held me up for any time. Things like realizing I needed a 5th table for the positional adjustments, trying to run too quickly and not separating the leagues out and instead keeping them as the whole MLB, also not removing pitchers from hitting stats. Simple issues with simple fixes.
Except for one.
When I got the OAA/Runs prevented data, I noticed that a full name was not included. That’s not a big deal since I exported it all to a CSV to import later on, I could create a “full_name” column and do something simple like:
=A2&” “&B2
Easy enough, drag that down and I have full names. But when I tried to include the OAA data, it wouldn’t connect up. I fiddled and fiddled and fiddled. Nothing. After about 30 minutes of staring, blankly, at my screen I found the issue. There was an addtional space before the first name. So when combined instead of having “Carlos Correa” I had “ Carlos Correa” to query… thanks to Google, I could this quick, easy fix to my problem:
UPDATE table SET column1 = TRIM(column1);
Wasn’t expecting an issue like that… oh well, easy fix!
On the flip side. There were also a couple related issues that I came across that I was prepared for and didn’t even think about — Fangraphs (three letter abbreviation) and Baseball savant (team nicknames) refer to a players teams differently. Also, how would I be able to reference which league’s stats I could use? Luckily in a previous project, I had a table with that data already filled in! Easy, peasy.
Once I got my tables created, I walked through each piece of the equation and used specific players to compare my results to what I should have. I could have cheated and downloaded the data and just imported that for each player in the tables, but how would that help me understand?! It didn’t take me too long to have all my ducks in a row and the data I wanted.
Once I felt good about my data, I started my WAR calculator. I used Carlos Correa as my example for this. Ready for this? I wasn’t..
FanGraphs My attempt2016 5.2 5.22017 5.1 5.12018 1.6 1.62019 3.2 3.22020 0.9 0.8
Man. It worked! I was honestly super excited by this. I thought for sure somewhere along the way, I’d have issues. Other than maybe a slight rounding error, which really isn’t a big deal in the grand scheme of it.
The next process is just as simple — copy the WAR portion of my SELECT statement and remove the UZR portion and insert the Runs Prevented. Do it again, this time putting in DRS.
After I was done, I decided to select the top 10 gainers and losers for each of the stats. Here were my results:
A couple of things stood out to me on here. 2018 Nick Ahmed defense… if you prefer DRS or Runs Prevented over UZR as I do, he’s become an All-Star. Also, Carlos Correa no longer has a down year in 2018. DRS *really* liked Jonathan Schoop in 2017.
On the other side, Alcides Escobar in 2018 and DRS.. yikes. A Tigers favorite for some time, Candelario has been helped out a lot with UZR vs DRS.
Thanks for reading! I normally at this point would be posting my code to create such data, but it’s so long currently that I’m not so sure this is the right spot for it!