Never leaving the OAA train
Third post in a row talking about OAA. I don’t know, I can’t really help myself though. It’s fascinating to me.
Anyhow. I left off my previous post somewhat joking saying I was going to maybe try to make a 3D graph with the data I had found. Well… not much of a joke — and there were no regrets. Check it out:
Here’s a GIF version for your liking, too:
That wasn’t the only difference I made though. Some smaller changes were made (like creating an array with the player names instead of typing them in every. Single. Time.) And a bigger change.
Though it didn’t require much additional work, I was curious how a sample size would affect the data. As I was looking at it, there were a number of plays who had less than 1,000 plays attempted, some fewer than 500. Similar to hitting and pitching, is there a hot streak or some luck involved that could make that data unusable? So to come up with that, I put in a little IF statement that said if a player had > 500 plays they were responsible for, take the average for the first 500 plays; otherwise, take the career average. And then I subtracted the first 500 play average from the career average to see how much of a difference that made. Here’s that data:
player_name_3 oaa_500 avg_oaa_500 avg_diff
Nick Ahmed 26.0 0.051 -0.005
Addison Russell 19.0 0.038 0.002
Andrelton Simmons 16.0 0.033 0.002
Francisco Lindor 14.0 0.028 0.001
Wilmer Difo 11.0 0.028 0.000
JT Riddle 13.0 0.026 -0.001
Orlando Arcia 12.0 0.023 -0.001
Niko Goodrum 10.0 0.022 -0.001
Jose Iglesias 8.0 0.016 -0.001
Jose Peraza 7.0 0.014 -0.002
Brandon Crawford 6.0 0.013 -0.002
Freddy Galvis 6.0 0.012 -0.001
Trevor Story 4.0 0.008 0.003
Trea Turner 4.0 0.008 0.002
Jordy Mercer 3.0 0.007 0.002
Miguel Rojas 3.0 0.007 0.001
Adalberto Mondesi 3.0 0.006 0.000
Javier Baez 3.0 0.005 0.001
Ketel Marte 0.0 0.000 0.006
Carlos Correa -16.0 -0.032 0.036
So there is definitely some changes among the players. None that are extremely dramatic though except for Carlos Correa. It looked like he was like a rock sitting at SS for the first part of his career. The fact that he’s now in the top 20 of total OAA for SS since 2016 is a miracle based on this sample size.
The important part of me here is that the data itself is usable. Taking Correa out of the picture, the average difference is 0.0003 OAA. So I think that taking OAA/play is something that could be followed and maybe even change the perspective that much more about how good a fielder has been.
This post is going to be a bit of a shorter one if for no other reason than because it’s the third time I’ve dedicated a post to this topic. But, here’s the code I was using:
import pandas as pd
import numpy as np
import re
import matplotlib.pyplot as plt
import seaborn as snspd.options.display.max_rows = 9999
pd.options.display.max_columns = 9999counter = 1
player_data = pd.DataFrame({'playerID' : []})
player_array = pd.DataFrame({'player_id' : [], 'player_name' : []})
rankings = pd.DataFrame({'player_name_2' : [], 'num_plays' : [], 'avg_oaa' : []})
total_rankings = pd.DataFrame({'player_name_1' : [], 'oaa_sum' : []})
graphing_data = pd.DataFrame({'player':[], 'num_plays':[], 'avg_oaa':[], 'oaa_sum':[]})
early_career = pd.DataFrame({'player_name_3':[], 'oaa_500':[], 'avg_oaa_500':[]})
player_names_list = pd.DataFrame({'player': ["Nick Ahmed","Andrelton Simmons","Francisco Lindor","Addison Russell","Javier Baez","Brandon Crawford","Freddy Galvis","Trevor Story","Jose Iglesias","Orlando Arcia","Adalberto Mondesi","Carlos Correa","JT Riddle","Jose Peraza","Wilmer Difo","Jordy Mercer","Niko Goodrum","Trea Turner","Miguel Rojas","Ketel Marte"]})mlb = "https://statsapi.mlb.com/api/v1/sports/1/players?fields=people,fullName,firstName,lastName,nameSlug"
aaa = "https://statsapi.mlb.com/api/v1/sports/11/players?fields=people,fullName,firstName,lastName,nameSlug"
aa = "https://statsapi.mlb.com/api/v1/sports/12/players?fields=people,fullName,firstName,lastName,nameSlug"
high_a = "https://statsapi.mlb.com/api/v1/sports/13/players?fields=people,fullName,firstName,lastName,nameSlug"
low_a = "https://statsapi.mlb.com/api/v1/sports/14/players?fields=people,fullName,firstName,lastName,nameSlug"
rookie = "https://statsapi.mlb.com/api/v1/sports/16/players?fields=people,fullName,firstName,lastName,nameSlug"
independent_league = "https://statsapi.mlb.com/api/v1/sports/23/players?fields=people,fullName,firstName,lastName,nameSlug"
international = "https://statsapi.mlb.com/api/v1/sports/51/players?fields=people,fullName,firstName,lastName,nameSlug"url_data = pd.read_json(mlb)
aaa_data = pd.read_json(aaa)
aa_data = pd.read_json(aa)
high_a_data = pd.read_json(high_a)
low_a_data = pd.read_json(low_a)
rookie_data = pd.read_json(rookie)
independent_league_data = pd.read_json(independent_league)
international_data = pd.read_json(international)url_data = url_data.append(aaa_data)
url_data = url_data.append(aa_data)
url_data = url_data.append(high_a_data)
url_data = url_data.append(low_a_data)
url_data = url_data.append(rookie_data)
url_data = url_data.append(independent_league_data)
url_data = url_data.append(international_data)normalized_data = pd.json_normalize(url_data.people)normalized_data.reset_index(drop=True)normalized_data = normalized_data.drop_duplicates()normalized_data['playerID'] = normalized_data['nameSlug'].apply(lambda x: re.sub(r'[a-z]','',str(x)))
normalized_data['playerID'] = normalized_data['playerID'].apply(lambda x: re.sub(r'-','',str(x)))# num_players = input("Please enter how many players you want to compare: ")# while counter <= int(num_players):
# player_name = input("Please choose a player: ")
# player_data = normalized_data[normalized_data['fullName'] == player_name]['playerID'].item()
# test1 = {'player_id': player_data, 'player_name': player_name}
# player_array = player_array.append(test1, ignore_index=True)
# counter += 1for x in player_names_list['player']:
name = x
player_data = normalized_data[normalized_data['fullName'] == name]['playerID'].item()
test1 = {'player_id': player_data, 'player_name':name}
player_array = player_array.append(test1, ignore_index=True)
for x,y in zip(player_array['player_id'],player_array['player_name']):data = pd.read_json('https://baseballsavant.mlb.com/visuals/oaa-data?type=Fielder&playerId='+x+'&startYear=2015&endYear=2021')
data = data[data['target_id'] == 6]
total_oaa = data['outs_above_average'].sum().round(0)
total_mean = data['outs_above_average'].mean().round(3)
total_count = len(data)
if total_count > 499:
early_data = data.sort_values(by=['year','api_game_date_month_mm'])
early_data = early_data.head(500)
early_total_oaa = early_data['outs_above_average'].sum().round(0)
early_total_mean = early_data['outs_above_average'].mean().round(3)
else:
early_total_oaa = total_oaa
early_total_mean = total_mean
player_total = {'player_name_1':y, 'oaa_sum':total_oaa}
player_avg = {'player_name_2':y, 'num_plays':total_count, 'avg_oaa':total_mean}
first_500_data = {'player_name_3':y, 'oaa_500':early_total_oaa, 'avg_oaa_500':early_total_mean}
graphing = {'player':y, 'num_plays':total_count, 'avg_oaa':total_mean, 'oaa_sum':total_oaa, 'oaa_500':early_total_oaa, 'avg_oaa_500':early_total_mean}
graphing_data = graphing_data.append(graphing, ignore_index=True).reset_index(drop=True)
rankings = rankings.append(player_avg, ignore_index=True).sort_values('avg_oaa', ascending=False).reset_index(drop=True)
total_rankings = total_rankings.append(player_total, ignore_index=True).sort_values('oaa_sum', ascending=False).reset_index(drop=True)
early_career = early_career.append(first_500_data, ignore_index=True).sort_values('avg_oaa_500', ascending=False).reset_index(drop=True)
overall_rankings = pd.concat([rankings, total_rankings,early_career], axis=1)
overall_rankings['avg_diff'] = overall_rankings['avg_oaa'] - overall_rankings['avg_oaa_500']
print("There have been "+str(total_count)+" outs responsible for "+y+" that are averaging "+str(total_mean)+" and totals "+str(total_oaa))print(" ")
print(overall_rankings)fig = plt.figure(figsize=(15,15))
ax = fig.add_subplot(111, projection='3d')for w,x,y,z in zip(graphing_data['player'],graphing_data['avg_oaa'],graphing_data['oaa_sum'],graphing_data['num_plays']):
label = '%s (%d OAA, %s Avg OAA)' % (w, y, "{:.3f}".format(x))
ax.text(x, y, z, label, fontsize=10)x = graphing_data['avg_oaa']
y = graphing_data['oaa_sum']
z = graphing_data['num_plays']ax.set_xlabel("Average OAA by play")
ax.set_xlim((min(graphing_data['avg_oaa']-0.02)), (max(graphing_data['avg_oaa']+0.02)))
ax.set_ylabel("Total OAA")
ax.set_ylim((min(graphing_data['oaa_sum']-10)), (max(graphing_data['oaa_sum']+10)))
ax.set_zlabel("Number of plays")
ax.set_zlim(0, (max(graphing_data['num_plays']+500)))ax.scatter(x,y,z,marker="o")fig.tight_layout()plt.show()