More OAA stats

6 min readJun 23, 2021

So since I wrote my last piece, I kept on playing with the data. I wanted to be able to compare a player to player from both a total OAA and an OAA per play. I thought that total OAA was a great statistic, but what if a player has a lower one only because they haven’t had as many reps? Does that still make player A better than player B?

And, if I am being honest, I really wanted to see if my favorite Tiger, and fan enemy of the last few months, Niko Goodrum was truly as good as I had thought and hoped at SS. Or was this just me wearing rose colored glasses wanting him to be here.

This one was interesting for a few different reasons and at least one issue needed to be fixed, perhaps, also on my previous script.

Up first: generating my list of players I want to compare. I used a very simple counter and threw it in a while loop so I could specify how many players I wanted to compare at any given time. In that while loop, I could enter in the name of the players. The question here was how would I then be able to pull the stats? Looking at my previous script, I was able to see how I would do that. Here’s the link:

https://baseballsavant.mlb.com/visuals/oaa-data?type=Fielder&playerId=592348&startYear=2015&endYear=2021

So the variable that I need to get updated is the “playerId=” section. This should be easy enough with python to be able to use a variable in the place of that. But, how do I get that ID easily? I know that MLB.com has a page dedicated to players specifically so I started looking there first. And, wouldn’t luck have it, there is a link there that has a players first name, last name, full name, and a name/ID that I can use:

https://statsapi.mlb.com/api/v1/sports/1/players?fields=people,fullName,firstName,lastName,nameSlug

Which, when running through python and doing some magic, can return something as this:

     fullName      firstName lastName  nameSlug             playerID
0    Cory Abbott   Cory      Abbott    cory-abbott-676265   676265
1    Albert Abreu  Albert    Abreu     albert-abreu-656061  656061
...  ...           ...       ...                   ...      ...
1237 Brett de Geus Brett     de Geus   brett-de-geus-676969 676969
1238 Jacob deGrom  Jacob     deGrom    jacob-degrom-594798  594798

[1239 rows x 5 columns]

Perfect! Now I can look up a name, get a playerID! Now I played around with the above link a little more and realized that I could get a lot more than just current MLB players, I could also use any of the minor leagues, some international leagues, and even collegiate players. This would actually help me out significantly later on, I’ll explain that later.

When I started testing my script, I realized that the data I was pulling wasn’t matching with the data I was expecting to see. Using Niko as the testing point, I saw he had +10 OAA at SS, but my results were showing +16 OAA. This had me scratching my head for a little bit, but then I realized what it was that I was doing wrong… I wasn’t filtering out my data to bring in only OAA for SS. D’oh. Applied my filter and re-ran, +10 = +10 now. I’m in business now.

For my list, I primarily wanted to compare Niko to those ahead of him. Being ranked 19th, I was able to get this list of names to run against:

Nick Ahmed
Andrelton Simmons
Francisco Lindor
Addison Russell
Javier Baez
Brandon Crawford
Freddy Galvis
Trevor Story
Jose Iglesias
Orlando Arcia
Adalberto Mondesi
Carlos Correa
JT Riddle
Jose Peraza
Adeiny Hechavarria
Wilmer Difo
Trea Turner
Jordy Mercer
Niko Goodrum

Things went very smoothly until I got to Addison Russell, then an error. Maybe I miss typed the name. Ran it again, same results. So that lead me to go to the JSON page with the players names and search for him. Hmm, doesn’t exist there. Fortunately I quickly remembered that I had access to other league’s list of players as well. So I check out the AAA JSON — there he is! This one wouldn’t be too bad, just create a variable for each league and then combine into one. Easy, peasy. Get that run and figure I better test it again. Run it using on Niko again… failed. What in the world? This one too me a slight bit longer to work on and figure out. Come to find out, he’s in there multiple times — all the exact same data — and my script doesn’t really like that. I don’t like that. I only need one entry. Some quick research and I find this nice, efficient option for pandas: drop_duplicates(). Let’s try again and.. success!

Time to get my big list going once more. First three work, try Addison Russell again and it works. Get almost done and get to Adeiny Hechavarria — failed. Welp. Same error as with Addison Russell. Time to search.

And search I did. This time was at least easy, I could check out my data locally rather than digging through a bunch of separate JSON. He doesn’t exist. In any of these. Come to find out, he’s over in Japan playing this year. Well then.

At this point I really wanted to get some data so this sort of troubleshooting I can do later. So I alter my list to remove Hechavarria and get the following:

Nick Ahmed
Andrelton Simmons
Francisco Lindor
Addison Russell
Javier Baez
Brandon Crawford
Freddy Galvis
Trevor Story
Jose Iglesias
Orlando Arcia
Adalberto Mondesi
Carlos Correa
JT Riddle
Jose Peraza
Wilmer Difo
Trea Turner
Jordy Mercer
Niko Goodrum

And I get through that list with no errors. Score! When doing this, I created two different arrays — one for average OAA/play and one for total OAA — that I sorted where the highest total is the top. From there, I concatenated the two sets together that way I could have both lists side by side to see some of that information. And here are the results!

player_name       num_plays avg_oaa    player_name         oaa_sum
Nick Ahmed        2020.0    0.04596    Nick Ahmed          92.83724
Addison Russell   1381.0    0.03988    Andrelton Simmons   77.68226
Andrelton Simmons 2236.0    0.03474    Francisco Lindor    74.18007
Francisco Lindor  2541.0    0.02919    Addison Russell     55.07214
Wilmer Difo        374.0    0.02813    Javier Baez         37.34850
Javier Baez       1470.0    0.02541    Brandon Crawford    29.98972
Niko Goodrum       469.0    0.02227    Freddy Galvis       28.54004
JT Riddle          574.0    0.02148    Trevor Story        27.16912
Adalberto Mondesi  977.0    0.01524    Jose Iglesias       20.10135
Jose Peraza        958.0    0.01239    Orlando Arcia       19.32260
Freddy Galvis     2529.0    0.01129    Carlos Correa       14.99469
Trevor Story      2423.0    0.01121    Adalberto Mondesi   14.89005
Brandon Crawford  2714.0    0.01105    JT Riddle           12.32999
Orlando Arcia     1885.0    0.01025    Jose Peraza         11.87025
Jose Iglesias     2240.0    0.00897    Wilmer Difo         10.51963
Carlos Correa     1914.0    0.00783    Niko Goodrum        10.44344
Jordy Mercer      1732.0    0.00583    Jordy Mercer        10.10170
Trea Turner       1842.0    0.00525    Trea Turner          9.66452

That’s pretty cool. And it’s interesting as well. I’m honestly loving what I see from Nick Ahmed. The guy is a monster all the way around. Also, three of the bottom 5 make pretty significant jumps when looking at it by an average per play vs. actual total and for that matter, all players with less than 1,000 plays credited to them made jumps. I think I might work on a graph here as well that will display these data points, maybe even attempt my first 3D graph.

Now comes the next piece of research on this topic — when is a sample size too small to determine whether or not the per play even matters. Does it matter? You can reasonably say a batter can get lucky in 100 PA, but can you say a defender does in 100 plays?

Until then, thank you for reading once more. And here’s some more code!

import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
import seaborn as snspd.options.display.max_rows = 9999
pd.options.display.max_columns = 9999counter = 1
player_data = pd.DataFrame({'playerID' : []})
player_array = pd.DataFrame({'player_id' : [], 'player_name' : []})
rankings = pd.DataFrame({'player_name' : [], 'num_plays' : [], 'avg_oaa' : []})
total_rankings = pd.DataFrame({'player_name' : [], 'oaa_sum' : []})mlb = "https://statsapi.mlb.com/api/v1/sports/1/players?fields=people,fullName,firstName,lastName,nameSlug"
aaa = "https://statsapi.mlb.com/api/v1/sports/11/players?fields=people,fullName,firstName,lastName,nameSlug"
aa = "https://statsapi.mlb.com/api/v1/sports/12/players?fields=people,fullName,firstName,lastName,nameSlug"
high_a = "https://statsapi.mlb.com/api/v1/sports/13/players?fields=people,fullName,firstName,lastName,nameSlug"
low_a = "https://statsapi.mlb.com/api/v1/sports/14/players?fields=people,fullName,firstName,lastName,nameSlug"
rookie = "https://statsapi.mlb.com/api/v1/sports/16/players?fields=people,fullName,firstName,lastName,nameSlug"
independent_league = "https://statsapi.mlb.com/api/v1/sports/23/players?fields=people,fullName,firstName,lastName,nameSlug"
international = "https://statsapi.mlb.com/api/v1/sports/51/players?fields=people,fullName,firstName,lastName,nameSlug"url_data = pd.read_json(mlb)
aaa_data = pd.read_json(aaa)
aa_data = pd.read_json(aa)
high_a_data = pd.read_json(high_a)
low_a_data = pd.read_json(low_a)
rookie_data = pd.read_json(rookie)
independent_league_data = pd.read_json(independent_league)
international_data = pd.read_json(international)url_data = url_data.append(aaa_data)
url_data = url_data.append(aa_data)
url_data = url_data.append(high_a_data)
url_data = url_data.append(low_a_data)
url_data = url_data.append(rookie_data)
url_data = url_data.append(independent_league_data)
url_data = url_data.append(international_data)normalized_data = pd.json_normalize(url_data.people)normalized_data.reset_index(drop=True)normalized_data = normalized_data.drop_duplicates()normalized_data['playerID'] = normalized_data['nameSlug'].apply(lambda x: re.sub(r'[a-z]','',str(x)))
normalized_data['playerID'] = normalized_data['playerID'].apply(lambda x: re.sub(r'-','',str(x)))num_players = input("Please enter how many players you want to compare: ")while counter <= int(num_players):
    
    player_name = input("Please choose a player: ")
    player_data = normalized_data[normalized_data['fullName'] == player_name]['playerID'].item()
    test1 = {'player_id': player_data, 'player_name': player_name}
    player_array = player_array.append(test1, ignore_index=True)
    
    counter += 1
    
for x,y in zip(player_array['player_id'],player_array['player_name']):data = pd.read_json('https://baseballsavant.mlb.com/visuals/oaa-data?type=Fielder&playerId='+x+'&startYear=2015&endYear=2021')
    data = data[data['target_id'] == 6]
    
    total_oaa = data['outs_above_average'].sum().round(5)
    total_mean = data['outs_above_average'].mean().round(5)
    total_count = len(data)
    
    player_total = {'player_name':y, 'oaa_sum':total_oaa}
    player_avg = {'player_name':y, 'num_plays':total_count, 'avg_oaa':total_mean}
    
    rankings = rankings.append(player_avg, ignore_index=True).sort_values('avg_oaa', ascending=False).reset_index(drop=True)
    total_rankings = total_rankings.append(player_total, ignore_index=True).sort_values('oaa_sum', ascending=False).reset_index(drop=True)
    
    overall_rankings = pd.concat([rankings, total_rankings], axis=1)
    
print(" ")
print(overall_rankings)

More OAA stats

Written by jerrymckennan

No responses yet