Some pitching thoughts
This particular subject was one that I wanted to create these graphs for a specific reason/cause. Most leading up were just to get visuals of data, not so much for analysis. I think it was the first time I had done that, the first time I had done a real deep dive into some data to get an answer instead of creating a graph to pictures of the numbers.
When the Tigers signed Derek Holland, I was wondering if there was an age where a pitcher was no longer viable as a starter and should be used purely as a reliever. At 34, he wasn’t a spry up and comer who was taking the baseball world by storm, but he also wasn’t at an age where he should be thinking about retirement. His previous 4 years hadn’t been good (totaled up 0.1 fWAR in 431 IP) and he was being used a sort of a hybrid type of pitcher. He made 69 starts and 128 appearances. It made me wonder what he role in the organization would be — especially with the likes of Mize and Skubal being used in a limited fashion.
It seemed like the news that would come up about aging starters was typically not good. Obviously there are exceptions (looking at you, Randy Johnson), but as a general rule of thumb is there a specific age you should start avoiding? To take it a step further, is there an age where a pitcher might be much better off as a reliever to further their career? If I am running an organization and I have a young rotation that would be be on an innings and pitch limit, would a 6 man rotation be best or would acquiring an opener-esque pitcher to eat some of those innings be better?
One of the first things I wanted to figure out was a kind of more difficult one — how do you know what type of starter that you have? What WAR range is equal to what type of pitcher in MLB? A very nice, long conversation on Twitter had me thinking this could be a good breakdown for any given season:
AAA: less than 1.25 WAR
Number 5: 1.25–2 WAR
Number 4: 2–2.75 WAR
Number 3: 2.75–3.5 WAR
Number 2: 3.5–4.25 WAR
Number 1: 4.25–4.99 WAR
Ace: 5+ WAR
Seemed like a really good start at the very least! After getting that down, I wanted to see display this data but by the age of the pitcher. Let’s see how it looks:
Some data notes: these are pitchers who pitched between 2000 and 2019. I wanted to make sure that I had a minimum of 100 IP for the starters as I felt that would separate them enough from relievers.
Really nice breakdown. And I think it breaks down about how you might expect. There are more seasons of Ace level starters from age 26–31, then again there are a lot more seasons in total for this age group.
So now I wanted to break this down and compare it to relievers. How to starters compare to relievers at the same age. To get this, I broke it down a couple of ways. I wanted to see fWAR and fWAR per inning pitched to compare these numbers between the starters and relievers. I also did velocity and innings pitched to see the breakdown over time for each type of player and if the eventual downfall is greater for one or the other. Here’s what I was able to come up with:
More data notes: Since I wanted to compare fastball velocity, I removed anyone who did not register an average fastball velocity on FanGraphs, where I gathered the data, from my dataset (ended up being 2 pitchers in total). I wanted to also make sure that major outliers were removed. Since they don’t follow the norms, I didn’t want any chances of skewing the data (again, I’m looking at you Randy). So I used the NumPY’s quantile function to set a ceiling and floor based on fWAR of the starters. In all this gave me 2,968 relievers and 2,615 qualified starters for comparing.
Some aesthetics notes: I did have the starters as all blue initially. However, I thought it would be more fun to see it via the pitcher status instead when viewing them all at once. I also wanted to include the averages in there for each. That way you can see how far the deviate from there.
A couple of things stood out to me with this data. First is the closeness of the fWAR/IP averages by age until age 38. Then the data varies quite a bit by the average and the which position is better. Relievers, as volatile as they are, stay right with the overall average, deviating very little. The trend line rides right close to the average line, whereas starters begin to decrease there at age 32 only to rise back up again. And it’s very interesting how close those numbers get between ages 32–37.
I’m assuming it’s because the amount of data is much less at those ages, but it was interesting nonetheless. Also I found it really interesting the average IP for starters ages 40–42.
So now I want to compare starters and relievers in each of these datasets. I took the averages by age for starters and subtracted the averages by age for relievers:
So there we have it. Not including any monetary analysis, only from the data points, it would seem that starters would have an advantage here, even if it’s a slight one.
And as always, here’s my code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import os.path
from os import path# 2010-2019:
# 1585 qualified relievers w/ average FB velocity (20-43)
# 1320 starters w/ IP >= 100 (20-47)# 2002-2019:
# 2968 qualified relievers w/ average FB velocity (20-43)
# 2615 starters w/ IP >= 100 (20-47)st_data = pd.read_csv('/Users/user/Documents/JSON/starters_totals.csv')
re_data = pd.read_csv('/Users/user/Documents/JSON/relievers_totals.csv')
st_data = st_data[st_data['IP'] >= 150]
re_data = re_data[re_data['Season'] < 2020]
st_data['WAR_over_IP'] = st_data['WAR']/st_data['IP']
re_data['WAR_over_IP'] = re_data['WAR']/re_data['IP']st_data_war = st_datast_data.loc[st_data.WAR > 4.99, 'pitcher_num'] = "Ace"
st_data.loc[st_data.WAR.between(4.25,4.99), 'pitcher_num'] = "1"
st_data.loc[st_data.WAR.between(3.5,4.25), 'pitcher_num'] = "2"
st_data.loc[st_data.WAR.between(2.75,3.5), 'pitcher_num'] = "3"
st_data.loc[st_data.WAR.between(2,2.75), 'pitcher_num'] = "4"
st_data.loc[st_data.WAR.between(1.25,2), 'pitcher_num'] = "5"
st_data.loc[st_data.WAR < 1.25, 'pitcher_num'] = "AAA"st_data_war.loc[st_data_war.WAR > 4.99, 'pitcher_num'] = "Ace"
st_data_war.loc[st_data_war.WAR.between(4.25,4.99), 'pitcher_num'] = "1"
st_data_war.loc[st_data_war.WAR.between(3.5,4.25), 'pitcher_num'] = "2"
st_data_war.loc[st_data_war.WAR.between(2.75,3.5), 'pitcher_num'] = "3"
st_data_war.loc[st_data_war.WAR.between(2,2.75), 'pitcher_num'] = "4"
st_data_war.loc[st_data_war.WAR.between(1.25,2), 'pitcher_num'] = "5"
st_data_war.loc[st_data_war.WAR < 1.25, 'pitcher_num'] = "AAA"st_data_q1_war = st_data['WAR'].quantile(.25)
st_data_q3_war = st_data['WAR'].quantile(.75)st_data_IQR = st_data_q3_war - st_data_q1_warst_low_outlier = st_data_q1_war - (1.5 * st_data_IQR)
st_high_outlier = st_data_q3_war + (1.5 * st_data_IQR)st_data = st_data[st_data['WAR'] < st_high_outlier]
st_data = st_data[st_data['WAR'] > st_low_outlier]re_data_q1_war = re_data['WAR'].quantile(.25)
re_data_q3_war = re_data['WAR'].quantile(.75)re_data_IQR = re_data_q3_war - re_data_q1_warre_low_outlier = re_data_q1_war - (1.5 * re_data_IQR)
re_high_outlier = re_data_q3_war + (1.5 * re_data_IQR)re_data = re_data[re_data['WAR'] < re_high_outlier]
re_data = re_data[re_data['WAR'] > re_low_outlier]st_data_velo = st_data[st_data['FBv'] > 0]
re_data_velo = re_data[re_data['FBv'] > 0]st_counter = 19
re_counter = 19st_age_mean = pd.DataFrame(columns=['ERA','FIP','xFIP','WAR','IP','Age','FBv', 'WAR_over_IP'])
re_age_mean = pd.DataFrame(columns=['ERA','FIP','xFIP','WAR','IP','Age','FBv', 'WAR_over_IP'])
st_age_mean_velo = pd.DataFrame(columns=['ERA','FIP','xFIP','WAR','IP','Age','FBv', 'WAR_over_IP'])
re_age_mean_velo = pd.DataFrame(columns=['ERA','FIP','xFIP','WAR','IP','Age','FBv', 'WAR_over_IP'])
war_diff = pd.DataFrame(columns=['WAR', 'Age'])
velo_diff = pd.DataFrame(columns=['FBv', 'Age'])
war_IP_diff = pd.DataFrame(columns=['IP', 'Age'])while st_counter <= max(st_data['Age']):
st_age = st_data[st_data['Age'] == st_counter].mean()
st_age_mean = st_age_mean.append(st_age, ignore_index=True)
st_counter += 1
while re_counter <= max(re_data['Age']):
re_age = re_data[re_data['Age'] == re_counter].mean()
re_age_mean = re_age_mean.append(re_age, ignore_index=True)
re_counter += 1
st_counter = 19
re_counter = 19
while st_counter <= max(st_data_velo['Age']):
st_age_velo = st_data_velo[st_data_velo['Age'] == st_counter].mean()
st_age_mean_velo = st_age_mean_velo.append(st_age_velo, ignore_index=True)
st_counter += 1
while re_counter <= max(re_data_velo['Age']):
re_age_velo = re_data_velo[re_data_velo['Age'] == re_counter].mean()
re_age_mean_velo = re_age_mean_velo.append(re_age_velo, ignore_index=True)
re_counter += 1
war_diff['WAR'] = st_age_mean['WAR'] - re_age_mean['WAR']
war_diff['Age'] = st_age_mean['Age']
velo_diff['FBv'] = st_age_mean['FBv'] - re_age_mean['FBv']
velo_diff['Age'] = st_age_mean['Age']
war_IP_diff['WAR'] = st_age_mean['WAR'] - re_age_mean['WAR']
war_IP_diff['IP'] = st_age_mean['IP'] - re_age_mean['IP']
war_IP_diff['WAR/IP'] = war_IP_diff['WAR']/war_IP_diff['IP']
war_IP_diff['Age'] = st_age_mean['Age']
st_age_mean.drop(columns=['Name','Season','Team'], inplace=True)
re_age_mean.drop(columns=['Name','Season','Team'], inplace=True)fig, axes = plt.subplots(1, 3, sharey=True, figsize=((30,7)), constrained_layout=True)
sns.stripplot(ax=axes[0], data=st_data_velo, x="Age", y="FBv", size=4, hue="pitcher_num", hue_order=['Ace','1','2','3','4','5','AAA'], palette="coolwarm_r", linewidth=0).set_title("FB velocity range by age for starters");
axes[0].set_ylabel("FB velocity");
axes[0].legend(title="Rotation spot");
sns.stripplot(ax=axes[1], data=re_data_velo, x="Age", y="FBv", size=4, color="purple", linewidth=0).set_title("FB velocity range by age for relievers");
axes[1].set_ylabel(": ");
sns.regplot(ax=axes[2], data=st_age_mean_velo, x="Age", y="FBv", color="blue").set_title("Starter vs Reliever average FB velocity by age");
sns.regplot(ax=axes[2], data=re_age_mean_velo, x="Age", y="FBv", color="purple");
axes[2].set_ylabel(": ");
axes[2].set_xticks(np.arange(19,49,1));
axes[0].axhline(y=st_data_velo['FBv'].mean(), color="blue", linestyle='dashed');
axes[1].axhline(y=re_data_velo['FBv'].mean(), color="purple", linestyle='dashed');
axes[2].axhline(y=st_data_velo['FBv'].mean(), color="blue", linestyle='dashed');
axes[2].axhline(y=re_data_velo['FBv'].mean(), color="purple", linestyle='dashed');
axes[2].legend(['Starter', 'Reliever', str(round(st_data_velo['FBv'].mean(),1)), str(round(re_data_velo['FBv'].mean(),1))]);fig2, axes2 = plt.subplots(1, 3, sharey=True, figsize=((30,7)), constrained_layout=True)
sns.stripplot(ax=axes2[0], data=st_data, x="Age", y="WAR", size=4, hue="pitcher_num", hue_order=['Ace','1','2','3','4','5','AAA'], palette="coolwarm_r", linewidth=0).set_title("fWAR by age for starters");
axes2[0].set_ylabel("fWAR");
axes2[0].legend(title="Rotation spot");
sns.stripplot(ax=axes2[1], data=re_data, x="Age", y="WAR", size=4, color="purple", linewidth=0).set_title("fWAR by age for relievers");
axes2[1].set_ylabel(": ");
sns.regplot(ax=axes2[2], data=st_age_mean, x="Age", y="WAR", color="blue").set_title("Starter vs Reliever average fWAR by age");
sns.regplot(ax=axes2[2], data=re_age_mean, x="Age", y="WAR", color="purple");
axes2[2].set_ylabel(": ");
axes2[2].set_xticks(np.arange(19,49,1));
axes2[0].axhline(y=st_data['WAR'].mean(), color="blue", linestyle='dashed');
axes2[1].axhline(y=re_data['WAR'].mean(), color="purple", linestyle='dashed');
axes2[2].axhline(y=st_data['WAR'].mean(), color="blue", linestyle='dashed');
axes2[2].axhline(y=re_data['WAR'].mean(), color="purple", linestyle='dashed');
axes2[2].legend(['Starter', 'Reliever', str(round(st_data['WAR'].mean(),1)), str(round(re_data['WAR'].mean(),1))]);fig3, axes3 = plt.subplots(1, 3, sharey=True, figsize=((30,7)), constrained_layout=True)
sns.stripplot(ax=axes3[0], data=st_data, x="Age", y="WAR_over_IP", size=4, hue="pitcher_num", hue_order=['Ace','1','2','3','4','5','AAA'], palette="coolwarm_r", linewidth=0).set_title("fWAR/IP by age for starters");
axes3[0].set_ylabel("fWAR");
axes3[0].legend(title="Rotation spot");
sns.stripplot(ax=axes3[1], data=re_data, x="Age", y="WAR_over_IP", size=4, color="purple", linewidth=0).set_title("fWAR/IP by age for relievers");
axes3[1].set_ylabel(": ");
sns.regplot(ax=axes3[2], data=st_age_mean, x="Age", y="WAR_over_IP", color="blue").set_title("Starter vs Reliever average fWAR/IP by age ");
sns.regplot(ax=axes3[2], data=re_age_mean, x="Age", y="WAR_over_IP", color="purple");
axes3[2].set_ylabel(": ");
axes3[2].set_xticks(np.arange(19,49,1));
axes3[0].axhline(y=st_data['WAR_over_IP'].mean(), color="blue", linestyle='dashed');
axes3[1].axhline(y=re_data['WAR_over_IP'].mean(), color="purple", linestyle='dashed');
axes3[2].axhline(y=st_data['WAR_over_IP'].mean(), color="blue", linestyle='dashed');
axes3[2].axhline(y=re_data['WAR_over_IP'].mean(), color="purple", linestyle='dashed');
axes3[2].legend(['Starter', 'Reliever', str(round(st_data['WAR_over_IP'].mean(),4)), str(round(re_data['WAR_over_IP'].mean(),4))]);fig4, axes4 = plt.subplots(1, 3, sharey=True, figsize=((30,7)), constrained_layout=True)
sns.stripplot(ax=axes4[0], data=st_data, x="Age", y="IP", size=4, hue="pitcher_num", hue_order=['Ace','1','2','3','4','5','AAA'], palette="coolwarm_r", linewidth=0).set_title("IP by age for starters");
axes4[0].set_ylabel("IP");
axes4[0].legend(title="Rotation spot");
sns.stripplot(ax=axes4[1], data=re_data, x="Age", y="IP", size=4, color="purple", linewidth=0).set_title("IP by age for relievers");
axes4[1].set_ylabel(": ");
sns.regplot(ax=axes4[2], data=st_age_mean, x="Age", y="IP", color="blue").set_title("Starter vs Reliever average IP by age");
sns.regplot(ax=axes4[2], data=re_age_mean, x="Age", y="IP", color="purple");
axes4[2].set_ylabel(": ");
axes4[2].set_xticks(np.arange(19,49,1));
axes4[0].axhline(y=st_data['IP'].mean(), color="blue", linestyle='dashed');
axes4[1].axhline(y=re_data['IP'].mean(), color="purple", linestyle='dashed');
axes4[2].axhline(y=st_data['IP'].mean(), color="blue", linestyle='dashed');
axes4[2].axhline(y=re_data['IP'].mean(), color="purple", linestyle='dashed');
axes4[2].legend(['Starter', 'Reliever', str(round(st_data['IP'].mean(),1)), str(round(re_data['IP'].mean(),1))]);fig5, axes5 = plt.subplots(1, 3, figsize=((30,7)), constrained_layout=True)
sns.regplot(ax=axes5[0], data=war_diff, x="Age", y="WAR", color="orange").set_title("Average starter WAR minus Average reliever WAR by age");
sns.regplot(ax=axes5[1], data=velo_diff, x="Age", y="FBv", color="orange").set_title("Average starter FB velocity minus Average reliever FB velocity by age");
sns.regplot(ax=axes5[2], data=war_IP_diff, x="Age", y="WAR/IP", color="orange").set_title("Average starter WAR/IP minus Average reliever WAR/IP by age");
axes5[0].set_ylabel("WAR");
axes5[1].set_ylabel("Fastball Velocity");
axes5[2].set_ylabel("Innings Pitched");
axes5[0].set_xticks(np.arange(19,45,1));
axes5[1].set_xticks(np.arange(19,45,1));
axes5[2].set_xticks(np.arange(19,45,1));fig6, axes6 = plt.subplots(1, 1, figsize=((30,7)), constrained_layout=True)
sns.swarmplot(ax=axes6, data=st_data_war, x="Age", y="WAR", size=3, hue="pitcher_num", hue_order=['Ace','1','2','3','4','5','AAA'], palette="coolwarm_r", linewidth=0).set_title("fWAR by age for starters");
axes6.legend(title="Rotation spot")
axes6.axhline(y=5, color="#dc5e4b", linestyle='dashed');
axes6.axhline(y=4.26, color="#f39879", linestyle='dashed');
axes6.axhline(y=3.51, color="#f4c3ab", linestyle='dashed');
axes6.axhline(y=2.76, color="#dddcdb", linestyle='dashed');
axes6.axhline(y=2.01, color="#b8cff8", linestyle='dashed');
axes6.axhline(y=1.26, color="#8daffd", linestyle='dashed');plt.show()
Edit: forgot the code!