Hit heatmaps
I’m coming back to something slightly older that I was working on.
I remember what drove me down this road was when MLB was talking about banning the shift, a prominent report (I can’t think of who it was now…) made a comment that left handed hitters would be much, much happier since they are shifted on so much compared to right handed hitters. So I went to my number one go to site, baseballsavant.com, and gathered all the data I could. I wanted to see if 1) left handed hitters were actually shifted more often. If so, how much more? And 2) did left handed hitters really pull the ball that much more than right handed hitters?
Since this was a claim that had seemed as though it wasn’t a new story, I figured that getting data from the shortened 2020 season was surely not enough. So I gathered all the same data dating back to 2017. Still a semi-small dataset, but I should also have enough information to get an idea of what my answer would look like.
This is where my first issue entered. There was so much data for even individual years of 2017 through 2019 that it all wasn’t exporting. So what I needed to do was separate left handed hitters and right handed hitters for each year, then export that data. From there, since everything was the same, I could append each year together. Few additional steps, but nothing too drastic or terrible.
Once I had my data, I needed to figure out just how I was going to plot this.. I knew I wanted a heatmap to show how frequently a ball landed in a specific area. But… how was I going to do that? I knew somewhere in there, there was useful data to figure that out. A bit of research on this and I came across this lovely page:
Yes! I had a feeling that hc_x and hc_y was what I needed. I mean, “hc” must have been short for “hit chart” right? But the numbers had looked wonky to me so I thought I was way off on that. Lucky for me someone else had figured that out. Time to grind this out!
So I knew for sure I wanted to plot the total difference between LHH and RHH. That would be fairly easy — create a variable for each on that would follow what their stance was. I was also somewhat curious about year-to-year data too. If I plotted out each individual year, would I see a difference in the heatmaps as far as where balls landed? I mean, if LHH actually pulled the ball a lot more than RHH, then I should see a lot more red on the right side of the field than the left. So that’s what I did first. I got some heatmaps there as well as a scatterplot for each, just for fun.
With my eyes, that all looks pretty symmetrical. There is a slight bit more red on the left side, but in all I think it looks pretty much the same. Not bad at all, really. So now let’s see based on type of hitter between 2017–2020:
Some graphing notes: with the scatterplot, I was wanting to show where opposite field, straight away, and pull markers would be as well as show that by the color changing of the dots. I thought that visually, it would make more sense and be easier to read. The bottom dot where all four lines connect is supposed to represent home plate.
This data here certainly makes it seem like it should be pretty even overall. I wouldn’t say that the LHH pull the ball that much more than RHH. I think they do overall, but by the slightest of margin. So now it was time to see how often they actually did. For that, I was able to use the print function to print out some of that data:
Lefty total pitches: 160000 vs. Righty total pitches: 160000
Balls in play by lefties: 26657 vs. balls in play by righties: 27567
Lefty pull percent: 42.27% vs. Righty pull percent: 37.9%
Lefty straight percent: 32.91% vs. Righty straight percent: 35.0%
Lefty push percent: 24.82% vs. Righty push percent: 27.1%
Lefty outs on pull percent: 59.27% vs. Righty outs on pull percent: 51.56%
Lefty IF shift total: 74092 vs. Righty IF shift total: 29625
Lefty IF shift percent: 46.31% vs. Righty IF shift percent: 18.52%
Lefty OF shift total: 11799 vs. Righty OF shift total: 11512
Lefty OF shift percent: 7.37% vs. Righty OF shift percent: 7.2%
There it is. I was able to get 160,000 pitches faced by each. RHH put the ball in play slightly more, LHH pulled the ball more. Neither by a crazy amount more though.
The shift, with a main focus on the infield shift however, happened a wild amount more for LHH like had been said. Of the 160,000 pitches seen, the infield shifted nearly half the time for LHH compared to nearly 20% for RHH. That’s a difference of 44,464 times. I question how that could make any sense at all? LHH pulled the ball at a rate of 4.37% more, yet they were shifted on 2.5x more?
Something else I calculated because I was curious was how often an out was made when the ball was pulled. LHH got out about 8% more when they pulled the ball vs. RHH.
So to recap this. LHH pull rate was ~4% higher than that of RHH, yet they recorded an out rate of 8% higher because they were shifted on 2.5x more.I mean. I don’t know. That doesn’t make sense to me. Let’s put this from percentage to whole numbers:
LHH RHH
Balls in play 26657 27567
Pull percentage 0.4227 0.3790
Number of times ball pulled 11268 10448
Out percent on pulled ball 0.5927 0.5156
Outs on pull 6678 5387
LHH managed to get out 1,292 more times when pulling the ball vs RHH while actually pulling the ball 820 additional times. I’d say that’s a good reason for shifting. And here’s the same for the number of shifts:
LHH shift rates RHH shift rates
Pitches seen 160000 160000
Percent of shift 0.4631 0.1852
Number of shifts 74096 29632
This is the part that bothers me. I think the shift 100% has it’s place in baseball. I think it’s a useful tactic that forces the hitter to do something they normally wouldn’t and get them out of their comfort zone. I think it’s asinine that there would be such a gap between the frequency of the two different types of hitters.
I would like to think there is a simple answer to fix this. Somehow a way to limit the number of shifts a defense can do each plate appearance to equal it out.
Time for my code!
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import warningswarnings.filterwarnings("ignore")import matplotlib as mpl# Notes from the website, https://visualbaseballinfo.blogspot.com/2017/06/spray-angle-horizontal-angle-for.html
#
# Go the following information from here: https://tht.fangraphs.com/research-notebook-new-format-for-statcast-data-export-at-baseball-savant/
#
# Is this perfect? No, but it is pretty good in the absence of official sensor-based spray angle data.
# Note that -45 degrees is the left field line and 45 degrees is the right field line.
# (Note also that this calculation was originally produced by Jeff and Darrell Zimmerman.)# This code is using R.
# spray_angle <- with(df, round(
# (atan(
# (hc_x-125.42)/(198.27-hc_y)
# )*180/pi*.75)
# ,1)
# )data_2020_righties = pd.read_csv('/Users/user/savant_data_2020_righties.csv')
data_2019_righties = pd.read_csv('/Users/user/savant_data_2019_righties.csv')
data_2018_righties = pd.read_csv('/Users/user/savant_data_2018_righties.csv')
data_2017_righties = pd.read_csv('/Users/user/savant_data_2017_righties.csv')
data_2020_lefties = pd.read_csv('/Users/user/savant_data_2020_lefties.csv')
data_2019_lefties = pd.read_csv('/Users/user/savant_data_2019_lefties.csv')
data_2018_lefties = pd.read_csv('/Users/user/savant_data_2018_lefties.csv')
data_2017_lefties = pd.read_csv('/Users/user/savant_data_2017_lefties.csv')data_2020 = data_2020_righties
data_2020 = data_2020.append(data_2020_lefties)
data_2019 = data_2019_righties
data_2019 = data_2019.append(data_2019_lefties)
data_2018 = data_2018_righties
data_2018 = data_2018.append(data_2018_lefties)
data_2017 = data_2017_righties
data_2017 = data_2017.append(data_2017_lefties)data_2020['x_coords'] = (data_2020['hc_x']-125.42)
data_2020['y_coords'] = (206.27-data_2020['hc_y'])
data_2020['maths'] = np.arctan(data_2020['x_coords']/data_2020['y_coords']*180*3.14*.75)data_2019['x_coords'] = (data_2019['hc_x']-125.42)
data_2019['y_coords'] = (206.27-data_2019['hc_y'])
data_2019['maths'] = np.arctan(data_2019['x_coords']/data_2019['y_coords']*180*3.14*.75)data_2018['x_coords'] = (data_2018['hc_x']-125.42)
data_2018['y_coords'] = (206.27-data_2018['hc_y'])
data_2018['maths'] = np.arctan(data_2018['x_coords']/data_2018['y_coords']*180*3.14*.75)data_2017['x_coords'] = (data_2017['hc_x']-125.42)
data_2017['y_coords'] = (206.27-data_2017['hc_y'])
data_2017['maths'] = np.arctan(data_2017['x_coords']/data_2017['y_coords']*180*3.14*.75)data_total = data_2020
data_total = data_total.append(data_2019)
data_total = data_total.append(data_2018)
data_total = data_total.append(data_2017)data_total_lefties = data_total[data_total['stand'] == 'L']
data_total_righties = data_total[data_total['stand'] == 'R']lefties_count = len(data_total_lefties[data_total_lefties['description'] == 'hit_into_play'])
righties_count = len(data_total_righties[data_total_righties['description'] == 'hit_into_play'])
count_diff = righties_count - lefties_counta = np.array([0,0])
b = np.array([-55,200])
c = np.array([55,200])
d = np.array([-200,200])
e = np.array([200,200])rf_slope = (c[1]-a[1])/(c[0]-a[0])
lf_slope = (b[1]-a[1])/(b[0]-a[0])
rf_slope2 = (d[1]-a[1])/(d[0]-a[0])
lf_slope2 = (e[1]-a[1])/(e[0]-a[0])data_total_lefties['data_pull'] = data_total_lefties['y_coords'] > lf_slope * (data_total_lefties['x_coords']-a[1])
data_total_lefties['data_push'] = data_total_lefties['y_coords'] > rf_slope * (data_total_lefties['x_coords']-a[1])data_total_lefties.loc[data_total_lefties['data_pull'] == True, 'data_pull_num'] = 9
data_total_lefties.loc[data_total_lefties['data_push'] == True, 'data_push_num'] = 7
data_total_lefties['data_pull_num'].fillna(0, inplace=True)
data_total_lefties['data_push_num'].fillna(0, inplace=True)
data_total_lefties['data_straight_num'] = data_total_lefties['data_pull_num'] + data_total_lefties['data_push_num']data_total_righties['data_pull'] = data_total_righties['y_coords'] > rf_slope * (data_total_righties['x_coords']-a[1])
data_total_righties['data_push'] = data_total_righties['y_coords'] > lf_slope * (data_total_righties['x_coords']-a[1])data_total_righties.loc[data_total_righties['data_pull'] == True, 'data_pull_num'] = 7
data_total_righties.loc[data_total_righties['data_push'] == True, 'data_push_num'] = 9
data_total_righties['data_pull_num'].fillna(0, inplace=True)
data_total_righties['data_push_num'].fillna(0, inplace=True)
data_total_righties['data_straight_num'] = data_total_righties['data_pull_num'] + data_total_righties['data_push_num']lefties_pull_count = len(data_total_lefties[data_total_lefties['data_straight_num'] == 9])
lefties_push_count = len(data_total_lefties[data_total_lefties['data_straight_num'] == 7])
lefties_straight_count = len(data_total_lefties[data_total_lefties['data_straight_num'] == 16])
lefties_pull_perc = round((lefties_pull_count/(lefties_pull_count+lefties_push_count+lefties_straight_count)*100),2)
lefties_push_perc = round((lefties_push_count/(lefties_pull_count+lefties_push_count+lefties_straight_count)*100),2)
lefties_straight_perc = round((lefties_straight_count/(lefties_pull_count+lefties_push_count+lefties_straight_count)*100),2)righties_pull_count = len(data_total_righties[data_total_righties['data_straight_num'] == 7])
righties_push_count = len(data_total_righties[data_total_righties['data_straight_num'] == 9])
righties_straight_count = len(data_total_righties[data_total_righties['data_straight_num'] == 16])
righties_pull_perc = round((righties_pull_count/(righties_pull_count+righties_push_count+righties_straight_count)*100),2)
righties_push_perc = round((righties_push_count/(righties_pull_count+righties_push_count+righties_straight_count)*100),2)
righties_straight_perc = round((righties_straight_count/(righties_pull_count+righties_push_count+righties_straight_count)*100),2)lefties_pull_data_out = data_total_lefties[data_total_lefties['data_straight_num'] == 9]
rights_pull_data_out = data_total_righties[data_total_righties['data_straight_num'] == 7]lefties_pull_outs = len(lefties_pull_data_out[lefties_pull_data_out['events'] == 'field_out'])
righties_pull_outs = len(rights_pull_data_out[rights_pull_data_out['events'] == 'field_out'])
lefties_pull_out_perc = round((lefties_pull_outs/lefties_pull_count)*100,2)
righties_pull_out_perc = round((righties_pull_outs/righties_pull_count)*100,2)lefties_total_pitches = len(data_total_lefties)
righties_total_pitches = len(data_total_righties)
lefties_shift_1 = len(data_total_lefties[data_total_lefties['if_fielding_alignment'] == "Strategic"])
lefties_shift_2 = len(data_total_lefties[data_total_lefties['if_fielding_alignment'] == "Infield shift"])
lefties_shift_tot = (lefties_shift_1+lefties_shift_2)
righties_shift_1 = len(data_total_righties[data_total_righties['if_fielding_alignment'] == "Strategic"])
righties_shift_2 = len(data_total_righties[data_total_righties['if_fielding_alignment'] == "Infield shift"])
righties_shift_tot = (righties_shift_1+righties_shift_2)lefties_of_shift_1 = len(data_total_lefties[data_total_lefties['of_fielding_alignment'] == "Strategic"])
lefties_of_shift_2 = len(data_total_lefties[data_total_lefties['of_fielding_alignment'] == "4th outfielder"])
lefties_of_shift_tot = (lefties_of_shift_1+lefties_of_shift_2)
righties_of_shift_1 = len(data_total_righties[data_total_righties['of_fielding_alignment'] == "Strategic"])
righties_of_shift_2 = len(data_total_righties[data_total_righties['of_fielding_alignment'] == "4th outfielder"])
righties_of_shift_tot = (righties_of_shift_1+righties_of_shift_2)lefties_shift_perc = round((lefties_shift_tot/lefties_total_pitches)*100,2)
righties_shift_perc = round((righties_shift_tot/righties_total_pitches)*100,2)lefties_of_shift_perc = round((lefties_of_shift_tot/lefties_total_pitches)*100,2)
righties_of_shift_perc = round((righties_of_shift_tot/righties_total_pitches)*100,2)lefties_outs_shift_0 = data_total_lefties[data_total_lefties['outs_when_up'] == "0"]
lefties_outs_shift_1 = data_total_lefties[data_total_lefties['outs_when_up'] == "1"]
lefties_outs_shift_2 = data_total_lefties[data_total_lefties['outs_when_up'] == "2"]righties_outs_shift_0 = data_total_righties[data_total_righties['outs_when_up'] == "0"]
righties_outs_shift_1 = data_total_righties[data_total_righties['outs_when_up'] == "1"]
righties_outs_shift_2 = data_total_righties[data_total_righties['outs_when_up'] == "2"]lefties_balls_shift_0 = data_total_lefties[data_total_lefties['balls'] == "0"]
lefties_balls_shift_1 = data_total_lefties[data_total_lefties['balls'] == "1"]
lefties_balls_shift_2 = data_total_lefties[data_total_lefties['balls'] == "2"]
lefties_balls_shift_3 = data_total_lefties[data_total_lefties['balls'] == "3"]righties_balls_shift_0 = data_total_righties[data_total_righties['balls'] == "0"]
righties_balls_shift_1 = data_total_righties[data_total_righties['balls'] == "1"]
righties_balls_shift_2 = data_total_righties[data_total_righties['balls'] == "2"]
righties_balls_shift_3 = data_total_righties[data_total_righties['balls'] == "3"]lefties_strikes_shift_0 = data_total_lefties[data_total_lefties['strikes'] == "0"]
lefties_strikes_shift_1 = data_total_lefties[data_total_lefties['strikes'] == "1"]
lefties_strikes_shift_2 = data_total_lefties[data_total_lefties['strikes'] == "2"]righties_strikes_shift_0 = data_total_righties[data_total_righties['strikes'] == "0"]
righties_strikes_shift_1 = data_total_righties[data_total_righties['strikes'] == "1"]
righties_strikes_shift_2 = data_total_righties[data_total_righties['strikes'] == "2"]print("Lefty total pitches: "+str(lefties_total_pitches)+" vs. Righty total pitches: "+str(righties_total_pitches))
print("Balls in play by lefties: "+str(lefties_count)+" vs. balls in play by righties: "+str(righties_count))
print("Lefty pull percent: "+str(lefties_pull_perc)+"% vs. Righty pull percent: "+str(righties_pull_perc)+"%")
print("Lefty straight percent: "+str(lefties_straight_perc)+"% vs. Righty straight percent: "+str(righties_straight_perc)+"%")
print("Lefty push percent: "+str(lefties_push_perc)+"% vs. Righty push percent: "+str(righties_push_perc)+"%")
print("Lefty outs on pull percent: "+str(lefties_pull_out_perc)+"% vs. Righty outs on pull percent: "+str(righties_pull_out_perc)+"%")
print("Lefty IF shift total: "+str(lefties_shift_tot)+" vs. Righty IF shift total: "+str(righties_shift_tot))
print("Lefty IF shift percent: "+str(lefties_shift_perc)+"% vs. Righty IF shift percent: "+str(righties_shift_perc)+"%")
print("Lefty OF shift total: "+str(lefties_of_shift_tot)+" vs. Righty OF shift total: "+str(righties_of_shift_tot))
print("Lefty OF shift percent: "+str(lefties_of_shift_perc)+"% vs. Righty OF shift percent: "+str(righties_of_shift_perc)+"%")with sns.axes_style("white"):
fig, axes = plt.subplots(2,2,figsize=((15,15)), sharey=True);
sns.kdeplot(ax=axes[0,0], data=data_2020, x='x_coords', y='y_coords', fill=True, cmap="seismic").set_title("2020 stats");
sns.kdeplot(ax=axes[0,1], data=data_2019, x='x_coords', y='y_coords', fill=True, cmap="seismic").set_title("2019 stats");
sns.kdeplot(ax=axes[1,0], data=data_2018, x='x_coords', y='y_coords', fill=True, cmap="seismic").set_title("2018 stats");
sns.kdeplot(ax=axes[1,1], data=data_2017, x='x_coords', y='y_coords', fill=True, cmap="seismic").set_title("2017 stats");
sns.despine(ax=axes[0,0], top=True, bottom=True, left=True, right=True)
axes[0,0].set_yticks([])
axes[0,0].set_xticks([])
axes[0,0].set_ylabel('')
axes[0,0].set_xlabel('')
sns.despine(ax=axes[0,1], top=True, bottom=True, left=True, right=True)
axes[0,1].set_yticks([])
axes[0,1].set_xticks([])
axes[0,1].set_ylabel('')
axes[0,1].set_xlabel('')
sns.despine(ax=axes[1,0], top=True, bottom=True, left=True, right=True)
axes[1,0].set_yticks([])
axes[1,0].set_xticks([])
axes[1,0].set_ylabel('')
axes[1,0].set_xlabel('')
sns.despine(ax=axes[1,1], top=True, bottom=True, left=True, right=True)
axes[1,1].set_yticks([])
axes[1,1].set_xticks([])
axes[1,1].set_ylabel('')
axes[1,1].set_xlabel('')
plt.savefig('/Users/user/season_heatmap.jpg')fig2, axes2 = plt.subplots(2,2,figsize=((15,15)));
sns.scatterplot(ax=axes2[0,0], data=data_2020, x='x_coords', y='y_coords').set_title("2020 stats");
sns.scatterplot(ax=axes2[0,1], data=data_2019, x='x_coords', y='y_coords').set_title("2019 stats");
sns.scatterplot(ax=axes2[1,0], data=data_2018, x='x_coords', y='y_coords').set_title("2018 stats");
sns.scatterplot(ax=axes2[1,1], data=data_2017, x='x_coords', y='y_coords').set_title("2017 stats");
sns.despine(ax=axes2[0,0], top=True, bottom=True, left=True, right=True)
axes2[0,0].plot([a[0],b[0]],[a[1],b[1]], marker="o", color="k")
axes2[0,0].plot([a[0],c[0]],[a[1],c[1]], marker="o", color="k")
axes2[0,0].set_yticks([])
axes2[0,0].set_xticks([])
axes2[0,0].set_ylabel('')
axes2[0,0].set_xlabel('')
sns.despine(ax=axes2[0,1], top=True, bottom=True, left=True, right=True)
axes2[0,1].plot([a[0],b[0]],[a[1],b[1]], marker="o", color="k")
axes2[0,1].plot([a[0],c[0]],[a[1],c[1]], marker="o", color="k")
axes2[0,1].set_yticks([])
axes2[0,1].set_xticks([])
axes2[0,1].set_ylabel('')
axes2[0,1].set_xlabel('')
sns.despine(ax=axes2[1,0], top=True, bottom=True, left=True, right=True)
axes2[1,0].plot([a[0],b[0]],[a[1],b[1]], marker="o", color="k")
axes2[1,0].plot([a[0],c[0]],[a[1],c[1]], marker="o", color="k")
axes2[1,0].set_yticks([])
axes2[1,0].set_xticks([])
axes2[1,0].set_ylabel('')
axes2[1,0].set_xlabel('')
sns.despine(ax=axes2[1,1], top=True, bottom=True, left=True, right=True)
axes2[1,1].plot([a[0],b[0]],[a[1],b[1]], marker="o", color="k")
axes2[1,1].plot([a[0],c[0]],[a[1],c[1]], marker="o", color="k")
axes2[1,1].set_yticks([])
axes2[1,1].set_xticks([])
axes2[1,1].set_ylabel('')
axes2[1,1].set_xlabel('')
plt.savefig('/Users/user/spray_chart.jpg')
fig3, axes3 = plt.subplots(2,2,figsize=((15,15)), sharey=True);
sns.kdeplot(ax=axes3[0,0], data=data_total_lefties, x='x_coords', y='y_coords', fill=True, cmap="seismic").set_title("Lefties");
sns.kdeplot(ax=axes3[0,1], data=data_total_righties, x='x_coords', y='y_coords', fill=True, cmap="seismic").set_title("Righties");
sns.scatterplot(ax=axes3[1,0], data=data_total_lefties, x='x_coords', y='y_coords', hue='data_straight_num', hue_norm=(7,10), legend=False);
sns.scatterplot(ax=axes3[1,1], data=data_total_righties, x='x_coords', y='y_coords', hue='data_straight_num', hue_norm=(7,10), legend=False);
sns.despine(ax=axes3[0,0], top=True, bottom=True, left=True, right=True)
axes3[0,0].set_yticks([])
axes3[0,0].set_xticks([])
axes3[0,0].set_ylabel('')
axes3[0,0].set_xlabel('')
sns.despine(ax=axes3[0,1], top=True, bottom=True, left=True, right=True)
axes3[0,1].set_yticks([])
axes3[0,1].set_xticks([])
axes3[0,1].set_ylabel('')
axes3[0,1].set_xlabel('')
sns.despine(ax=axes3[1,0], top=True, bottom=True, left=True, right=True)
axes3[1,0].plot([a[0],b[0]],[a[1],b[1]], marker="o", color="k")
axes3[1,0].plot([a[0],c[0]],[a[1],c[1]], marker="o", color="k")
axes3[1,0].plot([a[0],d[0]],[a[1],d[1]], marker="o", color="k")
axes3[1,0].plot([a[0],e[0]],[a[1],e[1]], marker="o", color="k")
axes3[1,0].set_yticks([])
axes3[1,0].set_xticks([])
axes3[1,0].set_ylabel('')
axes3[1,0].set_xlabel('')
sns.despine(ax=axes3[1,1], top=True, bottom=True, left=True, right=True)
axes3[1,1].plot([a[0],b[0]],[a[1],b[1]], marker="o", color="k")
axes3[1,1].plot([a[0],c[0]],[a[1],c[1]], marker="o", color="k")
axes3[1,1].plot([a[0],d[0]],[a[1],d[1]], marker="o", color="k")
axes3[1,1].plot([a[0],e[0]],[a[1],e[1]], marker="o", color="k")
axes3[1,1].set_yticks([])
axes3[1,1].set_xticks([])
axes3[1,1].set_ylabel('')
axes3[1,1].set_xlabel('')
plt.savefig('/Users/user/heatmap.jpg')