What do I know?: contracts

jerrymckennan
6 min readOct 29, 2021

--

My previous five posts have been a blast to make. Getting to practice my new found skills with Python and data visualization has been satisfying and worthwhile. For the next, and final part, I wanted to explain better what my thought process was.

First I’ll start with my final thoughts on each players “contract”:

Player          Years    Salary    AAV
Marcus Semien 6 $121.2 $20.2
Corey Seager 10 $352.8 $35.3
Carlos Correa 9 $370.1 $41.1
Javier Baez 7 $111.0 $15.9
Trevor Story 8 $138.0 $17.3

I feel like I need to clarify something from my previous posts: I don’t think these are contracts that will be offered to each player, but rather trying to find some historical comparisons to get an idea of the average number of years played and their dollar value during that time. Using Semien as an example: he might have six years worth ~15 fWAR, but a team might very well think a majority of that value comes in three years and that’s what he gets offered.

What I’m not curious on is how does this compare to figuring out WAR values using an aging curve. To do this, I first gathered all seasons that were qualified in MLB. This should allow for me to view when a player might experience a change in WAR by age and roughly how much.

Age   WAR   diff
15.0 0.1 0.0
16.0 -0.4 -0.5
17.0 -0.2 0.2
18.0 0.7 0.9
19.0 0.8 0.1
20.0 2.2 1.4
21.0 2.4 0.2
22.0 2.6 0.2
23.0 2.7 0.1
24.0 2.8 0.1
25.0 2.8 0.0
26.0 2.9 0.1
27.0 2.9 0.0
28.0 2.9 0.0
29.0 2.9 0.0
30.0 2.7 -0.2
31.0 2.8 0.1
32.0 2.8 0.0
33.0 2.7 -0.1
34.0 2.6 -0.1
35.0 2.7 0.1
36.0 2.6 -0.1
37.0 2.5 -0.1
38.0 2.3 -0.2
39.0 2.1 -0.2
40.0 2.0 -0.1
41.0 1.1 -0.9
42.0 2.0 0.9
43.0 2.0 0.0
44.0 2.1 0.1
45.0 1.0 -1.1

It seems like a player shouldn’t experience much of a drop throughout their career, which really doesn’t make sense. So I also took a look at those who only played SS

Age   WAR   diff
16.0 -0.5 0.0
18.0 0.7 1.2
19.0 -0.8 -1.5
20.0 3.4 4.2
21.0 2.0 -1.4
22.0 2.4 0.4
23.0 2.3 -0.1
24.0 2.4 0.1
25.0 2.6 0.2
26.0 2.6 0.0
27.0 2.4 -0.2
28.0 2.6 0.2
29.0 2.8 0.2
30.0 2.3 -0.5
31.0 2.3 0.0
32.0 2.8 0.5
33.0 2.5 -0.3
34.0 2.4 -0.1
35.0 2.7 0.3
36.0 2.6 -0.1
37.0 2.1 -0.5
38.0 2.5 0.4
39.0 2.9 0.4
40.0 2.5 -0.4
41.0 4.6 2.1
42.0 5.2 0.6

Same sort of results. That doesn’t help much as at all. Can’t really say that any of these players will be the same at age 36 as they were at 26. There is always some regression. So I wanted to try to view this data a little differently. I wondered how much the data was skewed using all those years. To figure that out, I broke the data up into decades and displayed them in a relplot. First up all players:

It seemed that I might be onto something there. The latter decades showed far more of a decline in average compared to earlier ones. And then SS:

This one still showed not much of a drop generally. The obvious answer that I should have known all along: SS is a premium position and teams are not going to let just anyone play. They are going to need to perform. So there shouldn’t be much of a drop really. Don’t get me wrong, you should expect the younger players to out perform older players still! But if a player isn’t good at SS, they will find a new position.

Looking at the first relplot, I noticed that in the 1980s is when we could start to see the decline with age. My next step was to pull all data from the 1980s until now and then perform similar steps as above. And the results for all players are:

Age   WAR   diff
19.0 3.4 0.0
20.0 4.4 1.0
21.0 2.7 -1.7
22.0 2.8 0.1
23.0 2.9 0.1
24.0 3.0 0.1
25.0 3.0 0.0
26.0 3.0 0.0
27.0 3.0 0.0
28.0 3.0 0.0
29.0 3.0 0.0
30.0 2.8 -0.2
31.0 2.9 0.1
32.0 2.7 -0.2
33.0 2.5 -0.2
34.0 2.5 0.0
35.0 2.4 -0.1
36.0 2.4 0.0
37.0 2.4 0.0
38.0 1.9 -0.5
39.0 1.7 -0.2
40.0 1.4 -0.3
41.0 0.0 -1.4
42.0 1.5 1.5

And SS only:

Age   WAR   diff
19.0 0.1 0.0
20.0 3.2 3.1
21.0 2.0 -1.2
22.0 2.8 0.8
23.0 2.3 -0.5
24.0 2.3 0.0
25.0 2.6 0.3
26.0 2.8 0.2
27.0 2.3 -0.5
28.0 2.7 0.4
29.0 2.3 -0.4
30.0 2.1 -0.2
31.0 2.3 0.2
32.0 2.9 0.6
33.0 2.1 -0.8
34.0 2.0 -0.1
35.0 2.5 0.5
36.0 1.5 -1.0
37.0 2.0 0.5
38.0 2.0 0.0
39.0 2.5 0.5
40.0 1.1 -1.4

This here makes a lot more sense to me. And now I’m wanting to see this trend applied to the players and see what happens. So what I did was I put in each players fWAR total for 2021 next to the age season it was and applied the growth/loss for each year of the values above for all players, and this is what returned:

Player          Years    Salary    AAV
Marcus Semien 6 $305.6 $50.9
Corey Seager 10 $268.8 $26.9
Carlos Correa 9 $400.0 $44.4
Javier Baez 7 $184.0 $26.3
Trevor Story 8 $201.6 $25.2

A bit different there. Some big differences, really. Primarily with Semien (who grew by 150%) and Seager (who drops 25%). The big reason for the drop with Seager is, as you might have guessed, due to not playing a full season because of injuries. So just for fun, I decided to prorate their 2021 fWAR for if they all played 150 games, see what the results would be then:

Player          Years    Salary    AAV
Marcus Semien 6 $282.1 $47.0
Corey Seager 10 $440.2 $44.0
Carlos Correa 9 $405.6 $45.1
Javier Baez 7 $201.5 $28.8
Trevor Story 8 $214.2 $26.8

If Corey Seager can stay healthy…

I think this will be the end of me little series here. At least for now. And I can’t wait to see what the actual contracts these players end up with and which of these models will fit closest.

Thanks again for reading! And here’s my code!

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
data = pd.read_csv('/dir/file.csv')
data = data[data['Season'] != 2020]
data = data[data['Season'] != 2021]
data['decade'] = data['Season']/10
data['decade'] = data['decade'].apply(np.floor)*10
data['decade'] = data['decade'].astype(int)
data = data.drop(columns=['Season','playerid'])
data = data.dropna()
data = data.sort_values(['Name','Age'])
data_1980 = data[data['decade'] >= 1980]data_SS = pd.read_csv('/dir/file.csv')
data_SS = data_SS[data_SS['Season'] != 2020]
data_SS = data_SS[data_SS['Season'] != 2021]
data_SS['decade'] = data_SS['Season']/10
data_SS['decade'] = data_SS['decade'].apply(np.floor)*10
data_SS['decade'] = data_SS['decade'].astype(int)
data_SS = data_SS.drop(columns=['Season','playerid'])
data_SS = data_SS.dropna()
data_SS = data_SS.sort_values(['Name','Age'])
data_1980_SS = data_SS[data_SS['decade'] >= 1980]data2 = data.sort_values('Age')
data2 = data2.groupby(['Age']).mean().reset_index()
data2['WAR'] = round(data2['WAR'],1)
data2['diff'] = data2['WAR'].diff().fillna(0)
data2['diff'] = round(data2['diff'],2)
data2_SS = data_SS.sort_values('Age')
data2_SS = data2_SS.groupby(['Age']).mean().reset_index()
data2_SS['WAR'] = round(data2_SS['WAR'],1)
data2_SS['diff'] = data2_SS['WAR'].diff().fillna(0)
data2_SS['diff'] = round(data2_SS['diff'],2)
data3 = data_1980.sort_values('Age')
data3 = data3.groupby(['Age']).mean().reset_index()
data3['WAR'] = round(data3['WAR'],1)
data3['diff'] = data3['WAR'].diff().fillna(0)
data3['diff'] = round(data3['diff'],2)
data3 = data3.drop(columns=['decade'])
data3_SS = data_1980_SS.sort_values('Age')
data3_SS = data3_SS.groupby(['Age']).mean().reset_index()
data3_SS['WAR'] = round(data3_SS['WAR'],1)
data3_SS['diff'] = data3_SS['WAR'].diff().fillna(0)
data3_SS['diff'] = round(data3_SS['diff'],2)
data3_SS = data3_SS.drop(columns=['decade'])
print(data3.to_string(index=False))
print(data3_SS.to_string(index=False))
sns.relplot(data=data, x="Age", y="WAR", col="decade", hue="decade", kind="line", legend=False, col_wrap=3)
plt.savefig('/dir/file.jpg')
sns.relplot(data=data_SS, x="Age", y="WAR", col="decade", hue="decade", kind="line", legend=False, col_wrap=3)
plt.savefig('/dir/file.jpg')
plt.show()

--

--

jerrymckennan
jerrymckennan

Written by jerrymckennan

Learning and writing about the journey

No responses yet