OAA stats
I’ve been digging into Outs Above Average (OAA) a lot lately and since I had only been posting tasks that have been completed, I thought maybe now is the time to look at something I’m currently working on. I think OAA is a great way to look into defensive side of a player. I’ve been working on a few different tasks with using it, some successful, some not. That’s part of the fun with it though, right? Something that I wanted to do, that just doesn’t quite work, is show a players OAA progression by play. I couldn’t find anything for a specific date, but rather only month and year. Which isn’t all bad. It does let me display the data a little bit differently. Like below:
This chart here shows the number of plays that a player was responsible each month and by year. Sadly, like I mentioned, I couldn’t get it a proper progression. But it is cool to see how many above and below average plays a player makes. The above player is the human highlight reel, Jose Iglesias. I am actually quite surprised by the number of plays that he missed as well as how few were determined to be well above average. So to play go along with this, I broke down the data into quadrants above and below 0 to show, which ended up looking like this:
There have been 2226 outs responsible for that are averaging 0.008461630405152805
There have been 1676 outs above average that are averaging 0.0974433485683466
There have been 550 outs below average that are averaging -0.262689932579416These are plays that are hard to make:
There have been 3 outs above average greater than .75 that are averaging 0.8316568617345551
There have been 0 outs below average that are between -.25 and 0 that are averaging nanThese are plays made half the time:
There have been 10 outs above average that are between .50 and .75 that are averaging 0.6026051792992155
There have been 88 outs above average that are between .25 and .50 that are averaging 0.35483298183156314
There have been 28 outs below average that are between -.25 and -.50 that are averaging -0.36956332313683476
There have been 55 outs below average that are between -.50 and -.75 that are averaging -0.6433520961412279These are plays that are pretty easily made:
There have been 1574 outs above average that are between 0 and .25 that are averaging 0.07834741510934647
There have been 105 outs below average less than -.75 that are averaging -0.8593402075653331
Pairing the two together, I feel, gives a good read on the data that is available. You can see the grand total, how many easy, medium, and hard plays were made or missed.
I’m not 100% what the consensus is on this, but I would say well above average and highlight reel plays should occur on plays that are made on 50% of the time or less. For a guy who is a walking highlight reel, 13 outs out of 2,226 plays made seems like a very low comparison. And that’s not to take anything away from him — at +19 OAA since 2016, he’s certainly well above average at SS. But let’s compare it him to a player whom I’ve been a fan of for a bit and, I feel, is an underrated defender:
There have been 1052 outs responsible for that are averaging 0.01664252352907377
There have been 821 outs above average that are averaging 0.1055795428958943
There have been 231 outs below average that are averaging -0.2994496535278945These are plays that are hard to make:
There have been 2 outs above average greater than .75 that are averaging 0.9546588472231576
There have been 0 outs below average that are between -.25 and 0 that are averaging nanThese are plays made half the time:
There have been 17 outs above average that are between .50 and .75 that are averaging 0.5861988184135658
There have been 51 outs above average that are between .25 and .50 that are averaging 0.35698783648433513
There have been 16 outs below average that are between -.25 and -.50 that are averaging -0.3766275021302521
There have been 24 outs below average that are between -.50 and -.75 that are averaging -0.6436886379687329These are plays that are pretty easily made:
There have been 750 outs above average that are between 0 and .25 that are averaging 0.07513365376318425
There have been 52 outs below average less than -.75 that are averaging -0.871992471224503
So this particular player has some very similar numbers as Iglesias, but he also doesn’t have nearly as many dips. Overall he’s work +0.002 additional outs on a per play average. The biggest hit on him — ~50% fewer plays made. Regardless, this isn’t a player that should at all be slept on. His +11 OAA would be good for 14th, only 5 places behind Iglesias, should he have qualified. But again, it’s in 50% fewer plays. If he kept this going for another 1,174 attempts, it would put him at +23 OAA.
Oh, this player is Niko Goodrum. I might have a soft spot for him and it’s been hard for many to get behind him given his defensive ineptitude lately. But those are some really, really good numbers in his career at SS!
One thing I wish I could would be to connect this to specific plays that have taken place. Being able to do that would be more beneficial from a true progression standpoint. As it is though, it’s really cool to see each play still be displayed.
As I continue to work on this, and as always, he’s my code:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt# To get OAA data use a link like this:
# https://baseballsavant.mlb.com/visuals/oaa-data?type=Fielder&playerId=592348&startYear=2021&endYear=2021
# To get play data to try to link OAA to, use a link like this:
# https://baseballsavant.mlb.com/player-services/gamelogs?playerId=592348&playerType=6&viewType=statcastGameLogsFielding&season=2021&_=1622053374998pd.options.display.max_rows = 999
pd.options.display.max_columns = 999# playerId=592348 -- Niko Goodrum
# playerId=578428 -- Jose Iglesiasdata = pd.read_json('https://baseballsavant.mlb.com/visuals/oaa-data?type=Fielder&playerId=592348&startYear=2015&endYear=2021')unknown_var = pd.DataFrame(columns=['launch_speed','oaa'])
unknown_var['launch_speed'] = data['h_launch_speed']
unknown_var['oaa'] = data['outs_above_average']
data = data.sort_values(by=['year','api_game_date_month_mm'])data2 = pd.DataFrame(columns=["player_name","counter","game_month","year","oaa","oaa_cumsum","runs_prevented","runs_prev_cumsum","speed","position"])
data2['player_name'] = data['entity_name']
data2['oaa'] = data['outs_above_average']
data2['runs_prevented'] = data['fielding_runs_prevented']
data2['game_month'] = data['api_game_date_month_mm']
data2['year'] = data['year']
data2['speed'] = data['h_launch_speed']
data2['position'] = data['target_id']oaa = data2[data2['oaa'] > 0]
oba = data2[data2['oaa'] < 0]oaa_0 = oaa[oaa['oaa'].between(0,.249)]
oba_0 = oaa[oaa['oaa'].between(-.249,0)]
oaa_25 = oaa[oaa['oaa'].between(.25,.499)]
oba_25 = oba[oba['oaa'].between(-.499,-.25)]
oaa_50 = oaa[oaa['oaa'].between(.50,.749)]
oba_50 = oba[oba['oaa'].between(-.749,-.50)]
oaa_75 = oaa[oaa['oaa'].between(.75,1)]
oba_75 = oba[oba['oaa'].between(-1,-.75)]total_mean = data2['oaa'].mean()
total_count = len(data2)
oaa_mean = oaa['oaa'].mean()
oaa_count = len(oaa)
oaa_0_mean = oaa_0['oaa'].mean()
oaa_0_count = len(oaa_0)
oaa_25_mean = oaa_25['oaa'].mean()
oaa_25_count = len(oaa_25)
oaa_50_mean = oaa_50['oaa'].mean()
oaa_50_count = len(oaa_50)
oaa_75_mean = oaa_75['oaa'].mean()
oaa_75_count = len(oaa_75)
oba_mean = oba['oaa'].mean()
oba_count = len(oba)
oba_0_mean = oba_0['oaa'].mean()
oba_0_count = len(oba_0)
oba_25_mean = oba_25['oaa'].mean()
oba_25_count = len(oba_25)
oba_50_mean = oba_50['oaa'].mean()
oba_50_count = len(oba_50)
oba_75_mean = oba_75['oaa'].mean()
oba_75_count = len(oba_75)print("There have been "+str(total_count)+" outs responsible for that are averaging "+str(total_mean))
print("There have been "+str(oaa_count)+" outs above average that are averaging "+str(oaa_mean))
print("There have been "+str(oba_count)+" outs below average that are averaging "+str(oba_mean))
print("These are plays that are hard to make: ")
print("There have been "+str(oaa_75_count)+" outs above average greater than .75 that are averaging "+str(oaa_75_mean))
print("There have been "+str(oba_0_count)+" outs below average that are between -.25 and 0 that are averaging "+str(oba_0_mean))
print("These are plays made half the time: ")
print("There have been "+str(oaa_50_count)+" outs above average that are between .50 and .75 that are averaging "+str(oaa_50_mean))
print("There have been "+str(oaa_25_count)+" outs above average that are between .25 and .50 that are averaging "+str(oaa_25_mean))
print("There have been "+str(oba_25_count)+" outs below average that are between -.25 and -.50 that are averaging "+str(oba_25_mean))
print("There have been "+str(oba_50_count)+" outs below average that are between -.50 and -.75 that are averaging "+str(oba_50_mean))
print("These are plays that are pretty easily made: ")
print("There have been "+str(oaa_0_count)+" outs above average that are between 0 and .25 that are averaging "+str(oaa_0_mean))
print("There have been "+str(oba_75_count)+" outs below average less than -.75 that are averaging "+str(oba_75_mean))march_data = data2[data2['game_month'] == 3]
april_data = data2[data2['game_month'] == 4]
may_data = data2[data2['game_month'] == 5]
june_data = data2[data2['game_month'] == 6]
july_data = data2[data2['game_month'] == 7]
august_data = data2[data2['game_month'] == 8]
sept_data = data2[data2['game_month'] == 9]
october_data = data2[data2['game_month'] == 10]march_data = march_data.loc[::-1].reset_index(drop = True)
march_data['oaa_cumsum'] = march_data.groupby('year')['oaa'].cumsum()
march_data['oaa_cumsum'] = march_data['oaa_cumsum'].round(0).astype(int)
march_data['runs_prev_cumsum'] = march_data.groupby('year')['runs_prevented'].cumsum()
march_data['runs_prev_cumsum'] = march_data['runs_prev_cumsum'].round(0).astype(int)
march_data['counter'] = april_data.groupby('year').cumcount()+1april_data = april_data.loc[::-1].reset_index(drop = True)
april_data['oaa_cumsum'] = april_data.groupby('year')['oaa'].cumsum()
april_data['oaa_cumsum'] = april_data['oaa_cumsum'].round(0).astype(int)
april_data['runs_prev_cumsum'] = april_data.groupby('year')['runs_prevented'].cumsum()
april_data['runs_prev_cumsum'] = april_data['runs_prev_cumsum'].round(0).astype(int)
april_data['counter'] = april_data.groupby('year').cumcount()+1may_data = may_data.loc[::-1].reset_index(drop = True)
may_data['oaa_cumsum'] = may_data.groupby('year')['oaa'].cumsum()
may_data['oaa_cumsum'] = may_data['oaa_cumsum'].round(0).astype(int)
may_data['runs_prev_cumsum'] = may_data.groupby('year')['runs_prevented'].cumsum()
may_data['runs_prev_cumsum'] = may_data['runs_prev_cumsum'].round(0).astype(int)
may_data['counter'] = may_data.groupby('year').cumcount()+1june_data = june_data.loc[::-1].reset_index(drop = True)
june_data['oaa_cumsum'] = june_data.groupby('year')['oaa'].cumsum()
june_data['oaa_cumsum'] = june_data['oaa_cumsum'].round(0).astype(int)
june_data['runs_prev_cumsum'] = june_data.groupby('year')['runs_prevented'].cumsum()
june_data['runs_prev_cumsum'] = june_data['runs_prev_cumsum'].round(0).astype(int)
june_data['counter'] = june_data.groupby('year').cumcount()+1july_data = july_data.loc[::-1].reset_index(drop = True)
july_data['oaa_cumsum'] = july_data.groupby('year')['oaa'].cumsum()
july_data['oaa_cumsum'] = july_data['oaa_cumsum'].round(0).astype(int)
july_data['runs_prev_cumsum'] = july_data.groupby('year')['runs_prevented'].cumsum()
july_data['runs_prev_cumsum'] = july_data['runs_prev_cumsum'].round(0).astype(int)
july_data['counter'] = july_data.groupby('year').cumcount()+1august_data = august_data.loc[::-1].reset_index(drop = True)
august_data['oaa_cumsum'] = august_data.groupby('year')['oaa'].cumsum()
august_data['oaa_cumsum'] = august_data['oaa_cumsum'].round(0).astype(int)
august_data['runs_prev_cumsum'] = august_data.groupby('year')['runs_prevented'].cumsum()
august_data['runs_prev_cumsum'] = august_data['runs_prev_cumsum'].round(0).astype(int)
august_data['counter'] = august_data.groupby('year').cumcount()+1sept_data = sept_data.loc[::-1].reset_index(drop = True)
sept_data['oaa_cumsum'] = sept_data.groupby('year')['oaa'].cumsum()
sept_data['oaa_cumsum'] = sept_data['oaa_cumsum'].round(0).astype(int)
sept_data['runs_prev_cumsum'] = sept_data.groupby('year')['runs_prevented'].cumsum()
sept_data['runs_prev_cumsum'] = sept_data['runs_prev_cumsum'].round(0).astype(int)
sept_data['counter'] = sept_data.groupby('year').cumcount()+1october_data = october_data.loc[::-1].reset_index(drop = True)
october_data['oaa_cumsum'] = october_data.groupby('year')['oaa'].cumsum()
october_data['oaa_cumsum'] = october_data['oaa_cumsum'].round(0).astype(int)
october_data['runs_prev_cumsum'] = october_data.groupby('year')['runs_prevented'].cumsum()
october_data['runs_prev_cumsum'] = october_data['runs_prev_cumsum'].round(0).astype(int)
october_data['counter'] = october_data.groupby('year').cumcount()+1fig, axes = plt.subplots(4,2,sharex=True,figsize=((30,14)))sns.lineplot(ax=axes[0,0], data=march_data, x="counter", y="oaa", hue="year");
sns.scatterplot(ax=axes[0,0], data=march_data, x="counter", y="oaa", hue="year", legend=False).set_title('OAA by play responsible March');
axes[0,0].set_xlabel('Play');
axes[0,0].set_ylabel('OAA');
axes[0,0].set_yticks(np.arange(-1.20,1.20,.20));
axes[0,0].set_xticks(np.arange(0,210,10));sns.lineplot(ax=axes[0,1], data=april_data, x="counter", y="oaa", hue="year");
sns.scatterplot(ax=axes[0,1], data=april_data, x='counter', y='oaa', hue="year", legend=False).set_title('OAA by play responsible April');
axes[0,1].set_xlabel('Play');
axes[0,1].set_ylabel('OAA');
axes[0,1].set_yticks(np.arange(-1.20,1.20,.20));sns.lineplot(ax=axes[1,0], data=may_data, x="counter", y="oaa", hue="year");
sns.scatterplot(ax=axes[1,0], data=may_data, x="counter", y="oaa", hue="year", legend=False).set_title('OAA by play responsible May');
axes[1,0].set_xlabel('Play');
axes[1,0].set_ylabel('OAA');
axes[1,0].set_yticks(np.arange(-1.20,1.20,.20));sns.lineplot(ax=axes[1,1], data=june_data, x="counter", y="oaa", hue="year");
sns.scatterplot(ax=axes[1,1], data=june_data, x='counter', y='oaa', hue="year", legend=False).set_title('OAA by play responsible June');
axes[1,1].set_xlabel('Play');
axes[1,1].set_ylabel('OAA');
axes[1,1].set_yticks(np.arange(-1.20,1.20,.20));sns.lineplot(ax=axes[2,0], data=july_data, x="counter", y="oaa", hue="year");
sns.scatterplot(ax=axes[2,0], data=july_data, x="counter", y="oaa", hue="year", legend=False).set_title('OAA by play responsible July');
axes[2,0].set_xlabel('Play');
axes[2,0].set_ylabel('OAA');
axes[2,0].set_yticks(np.arange(-1.20,1.20,.20));sns.lineplot(ax=axes[2,1], data=august_data, x="counter", y="oaa", hue="year");
sns.scatterplot(ax=axes[2,1], data=august_data, x='counter', y='oaa', hue="year", legend=False).set_title('OAA by play responsible August');
axes[2,1].set_xlabel('Play');
axes[2,1].set_ylabel('OAA');
axes[2,1].set_yticks(np.arange(-1.20,1.20,.20));sns.lineplot(ax=axes[3,0], data=sept_data, x="counter", y="oaa", hue="year");
sns.scatterplot(ax=axes[3,0], data=sept_data, x="counter", y="oaa", hue="year", legend=False).set_title('OAA by play responsible September');
axes[3,0].set_xlabel('Play');
axes[3,0].set_ylabel('OAA');
axes[3,0].set_yticks(np.arange(-1.20,1.20,.20));sns.lineplot(ax=axes[3,1], data=october_data, x="counter", y="oaa", hue="year");
sns.scatterplot(ax=axes[3,1], data=october_data, x='counter', y='oaa', hue="year", legend=False).set_title('OAA by play responsible October');
axes[3,1].set_xlabel('Play');
axes[3,1].set_ylabel('OAA');
axes[3,1].set_yticks(np.arange(-1.20,1.20,.20));axes[0,0].axhline(y=0, color="red", linestyle='dashed', alpha=0.5);
axes[0,1].axhline(y=0, color="red", linestyle='dashed', alpha=0.5);
axes[1,0].axhline(y=0, color="red", linestyle='dashed', alpha=0.5);
axes[1,1].axhline(y=0, color="red", linestyle='dashed', alpha=0.5);
axes[2,0].axhline(y=0, color="red", linestyle='dashed', alpha=0.5);
axes[2,1].axhline(y=0, color="red", linestyle='dashed', alpha=0.5);
axes[3,0].axhline(y=0, color="red", linestyle='dashed', alpha=0.5);
axes[3,1].axhline(y=0, color="red", linestyle='dashed', alpha=0.5);