A little database querying

jerrymckennan
3 min readJun 9, 2021

--

Once I decided to create a database and make graphs from that, I realized that I could do fun stuff outside of data visualization as well. Since I don’t have a ton of drive space (for now), I couldn’t get all the data as I would have liked. But I was able to get some players. This was really for more fun and to refine some of my querying skills.

For most of my queries that I have saved, I thought they were cool fun stats to play with. Like this one:

SELECT
player_name,
YEAR(game_date),
SUM(ball_in_play) as balls_in_play,
SUM(distance)/5280 as distance_miles,
SUM(distance)/SUM(ball_in_play) as avg_distance_ft
FROM player
GROUP BY YEAR(game_date), player_name;

I liked this one. It’s not really something that’s super useful, but it shows the average distance for balls in play as well as distance in miles. It groups it together by year and then again by player. Gives some results like this:

Like I said. It’s probably not super useful, but it is pretty nifty! However, this next set I did find to be useful:

CREATE TEMPORARY TABLE test SELECT
py.player_name,
YEAR(py.game_date) as year,
py.distance,
py.ball_in_play AS bip,
pr.ev,
pr.launch_angle,
pr.hits_1b_per_hip,
py.is_1b,
pr.hits_2b_per_hip,
py.is_2b,
pr.hits_3b_per_hip,
py.is_3b,
pr.hits_hr_per_hip,
py.is_hr
FROM player AS py
JOIN prob AS pr
ON pr.year = YEAR(py.game_date)
WHERE pr.ev = py.ev AND pr.launch_angle = py.launch_angle
GROUP BY YEAR(py.game_date), py.player_name, py.game_date, py.ab_num, event
ORDER BY py.game_date;
SELECT
player_name,
year,
SUM(distance)/COUNT(distance) as avg_distance,
SUM(ev)/COUNT(ev) as avg_ev,
SUM(launch_angle)/COUNT(launch_angle) as avg_la,
SUM(is_1b) as singles,
ROUND(SUM(hits_1b_per_hip)/100,0) as exp_1b,
(1-(SUM(is_1b)/(SUM(hits_1b_per_hip)/100)))*100*(-1) as ‘1b_%_diff’,
SUM(is_2b) as doubles,
ROUND(SUM(hits_2b_per_hip)/100,0) as exp_2b,
(1-(SUM(is_2b)/(SUM(hits_2b_per_hip)/100)))*100*(-1) as ‘2b_%_diff’,
SUM(is_3b) as triples,
ROUND(SUM(hits_3b_per_hip)/100,0) as exp_3b,
(1-(SUM(is_3b)/(SUM(hits_3b_per_hip)/100)))*100*(-1) as ‘3b_%_diff’,
SUM(is_hr) as home_runs,
ROUND(SUM(hits_hr_per_hip)/100,0) as exp_hr,
(1-(SUM(is_hr)/(SUM(hits_hr_per_hip)/100)))*100*(-1) as ‘hr_%_diff’,
SUM(is_1b)+SUM(is_2b)+SUM(is_3b)+SUM(is_hr) as total_hits,
ROUND((SUM(hits_1b_per_hip)/100)+(SUM(hits_2b_per_hip)/100)+(SUM(hits_3b_per_hip)/100)+(SUM(hits_hr_per_hip)/100),0) as exp_hits,
(1-((SUM(is_1b)+SUM(is_2b)+SUM(is_3b)+SUM(is_hr))/((SUM(hits_1b_per_hip)/100)+(SUM(hits_2b_per_hip)/100)+(SUM(hits_3b_per_hip)/100)+(SUM(hits_hr_per_hip)/100))))*100*(-1) as ‘hits_%_diff’,
SUM(bip)
FROM test
WHERE ev > 0 AND player_name LIKE '%Cabrera%'
GROUP BY player_name, year;

Now for some background before I display the data. Again using baseball savant as my reference, you should be able to gather expected stats based on the exit velocity and launch angle of every ball that goes in play. Like I showed with my previous post and the wOBA involved, each one also have a batting average and chance of a single, double, triple, and home run. I wondered how it would look if you added up each percent of possibility for each ball in play for the season and then compared it to what actually happened. These were the results:

Very interesting! So based on this Miguel Cabrera has hit more single than he should and fewer extra base hits than he should except for in 2018. Which fits the narrative for him. He’s an older, slower player who plays in a ballpark not known for being a home run hitters fantasy. It also fits by the fact that he should be in a severe decline, but he is still clobbering the ball. Using these stats, you should be able to calculate most of the rest of the expected stats (usually listed as the stat with an “x” before it e.g. xBA for expected batting average).

What’s also interesting about that is you can get the necessary expected stats and could get a new, updated xfWAR. Making the assumptions that the defensive stats wouldn’t change and also stolen bases wouldn’t change, while also knowing this would not have affected walks/intentional walks/hit by pitch, you can update wOBA and singles for FanGraphs WAR calculation.

I have many other queries that I have saved as well, but these were some of queries that I found fun and interesting that I wanted to share.

--

--

jerrymckennan
jerrymckennan

Written by jerrymckennan

Learning and writing about the journey

No responses yet