CSW% and catcher framing

jerrymckennan
4 min readJun 29, 2021

--

I think this particular post is going to be a quick, easy going post. I don’t have a fancy graph for this and this was more a curiosity than something I wanted to really research into.

Basically I wanted to know how much of a correlation there was between CSW% for teams and their rankings for “Runs Extra Strikes” stat on baseballsavant.com

If you unsure of what these stats are, here’s a quick overview of each one.

CSW% is the percent of pitches a pitcher throws that’s either a called strike (CS) of whiffed (W) on and it is divided by the total pitches thrown (%). Very straight forwarded. There is a great write up on the PitcherList website (https://www.pitcherlist.com/csw-rate-an-intro-to-an-important-new-metric/) from of the people who came up with this stat.

Runs Extra Strikes turns strikes called into runs saved/given up. Baseballsavant has a little more explanation of it above their leaderboard for catcher framing (https://baseballsavant.mlb.com/catcher_framing?year=2021&team=&min=q&sort=4,1)

Now. Back to the correlation.

To make this somewhat quick I used only this years stats thus far. I knew I wanted to determine R squared between the two (small) datasets. I thought that the higher the CSW%, then the higher the Runs Extra Strikes should be.

A quick tutorial showed me that you can determine that quickly with NumPY. I’m semi-familiar with NumPY, thought admitted I’m far, far better with pandas. But regardless, this is all I would need to do:

var1 = np.corrcoef(data1, data2)
var2 = var1[0,1]
r_sq = var2**2

I did not at all expect that to be the case here. I was a bit intimidated by this initially. My thoughts were all over the place when I first dove into this, but I knew I wanted to get it done. Seeing how simple it would be allowed me to let out a nice sigh of relief.

So I moved on to my data gathering. First, I went to FanGraphs and exported a report that listed the Team, CSW%, and number of pitches thrown. I then went to baseballsavant’s site and used the above link to export all of the data that was made available. When I viewed the CSV file I realized a very import piece of information was missing for me… there was no team associations. I spent the next 10–15 minutes going through each player and associating the team that they played for. But, I also needed to make sure I used the same format as FanGraphs used. Fortunately I thought about that ahead of time instead of receiving the wrong data.

From there I was able to drop a lot of the data that was not needed with the catcher framing stats. I didn’t want the individual players and I didn’t want the individual zones where they were best at. All I wanted was team, how many pitches were called, how many of them were strikes, and Run Extra Strikes. From there, I grouped the data together by the team name and got a sum of the rest of the data. That would tell me what the teams had done.

Since I had modified the table on FanGraphs, all I needed to do with that data was drop the “%” from CSW% numbers and convert them string to a float. Very simple, very easy going.

Once I had my two datasets, catcher framing and CSW%, I merged them together with using the Team abbreviations as my link to the two datasets. I did need to add a little bit of data to this dataset though. I wanted to have in there the Strike Rate, like what was available for each individual player, and then I also wanted the percent of all pitched thrown by a staff that was called strikes. Time to start my correlation run!

I was thinking about this and decided that I would see what the correlation was between CSW% and each of these four:

Runs Extra Strikes
Percent of called pitches that were called strikes
Percent of all pitches that were called strikes
Number of pitches that were called balls or strikes

Here are the results:

Runs Extra Strikes: 0.010222976350566416
Percent of called strikes (only called pitches): 0.04172549576642242
Percent of called strikes (all pitches): 0.00037006838717856
Number of called pitches: 0.02688041166391131

So there we have it. Little to no correlation between CSW% and catcher framing stats from baseballsavant thus far for the 2021 season. Perhaps I will need to expand my dataset to include multiple years and see if that changes at all. But thus far it seems like having a great framing catcher will not increase your odds of you pitching staff having a higher CSW%.

Last up… my code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
catcher_framing = pd.read_csv('/User/catcher-framing.csv')
pitcher_csw = pd.read_csv('/User/starter_CSW.csv')
catcher_framing['strikes'] = catcher_framing['n_called_pitches'] * catcher_framing['strike_rate'] / 100
catcher_framing['strikes'] = catcher_framing['strikes'].round(0)
catcher_framing = catcher_framing.drop(columns=['last_name', ' first_name', 'fielder_2', 'year', 'strike_rate','strike_rate_11','strike_rate_12','strike_rate_13','strike_rate_14','strike_rate_16','strike_rate_17','strike_rate_18','strike_rate_19'])
catcher_framing = catcher_framing.groupby(['team']).sum()
catcher_framing = catcher_framing.sort_values('runs_extra_strikes', ascending=False).reset_index()
pitcher_csw['CSW%'] = pitcher_csw['CSW%'].map(lambda x: x.rstrip('%'))
pitcher_csw['CSW%'] = pitcher_csw['CSW%'].astype(float)
combined_data = catcher_framing.merge(pitcher_csw, left_on='team', right_on='Team')
combined_data['perc_called_strikes'] = combined_data['strikes']/combined_data['n_called_pitches']
combined_data['all_pitches_strikes'] = combined_data['strikes']/combined_data['Pitches']
corr_res = np.corrcoef(combined_data['runs_extra_strikes'],combined_data['CSW%'].astype(int))
corr_xy_res = corr_res[0,1]
r_sq_res = corr_xy_res**2
corr_ps = np.corrcoef(combined_data['perc_called_strikes'],combined_data['CSW%'].astype(int))
corr_xy_ps = corr_ps[0,1]
r_sq_ps = corr_xy_ps**2
corr_cp = np.corrcoef(combined_data['n_called_pitches'],combined_data['CSW%'].astype(int))
corr_xy_cp = corr_cp[0,1]
r_sq_cp = corr_xy_cp**2
corr_cpt = np.corrcoef(combined_data['all_pitches_strikes'],combined_data['CSW%'].astype(int))
corr_xy_cpt = corr_cpt[0,1]
r_sq_cpt = corr_xy_cpt**2
print("Runs Extra Strikes: "+str(r_sq_res))
print("Percent of called strikes (only called pitches): "+str(r_sq_ps))
print("Percent of called strikes (all pitches): "+str(r_sq_cpt))
print("Number of called pitches: "+str(r_sq_cp))

--

--

jerrymckennan
jerrymckennan

Written by jerrymckennan

Learning and writing about the journey

No responses yet