My First Graph

jerrymckennan
5 min readJun 7, 2021

The journey I’ve taken for create my graphs has been an incredibly satisfying one. I can honestly say I’ve seen my skills grow and, really, what’s more anyone could ask for?

My first graph that I made was a histplot using matplotlib, which I then also converted over using seaborn. I’m not sure why, but heatmaps and the like are most eye appealing to me. I think it has to do with the different colors showing the how the different values fall in place, the frequency of the particular values. It felt like it took me such a long time to get a successful graph. In actuality, I think it went pretty quickly given my exact 0 time spent on doing it before hand.

The graph itself was based on the Hit Probability chart available at baseball savant (https://baseballsavant.mlb.com/statcast_hit_probability) and would display a graph that was based on either hits or wOBA, which you would need to specify at the beginning of the script. I knew where to find the data I wanted, I wasn’t sure exactly how I wanted to go about it. Ultimately I decided to expand my project to include a little bit of database work and web scraping, both of which are tools that I could certainly use elsewhere as well. Given the data I was looking for was already organized, creating the table and importing the data was pretty basic once I got the scraping portion down.

After an unknown amount of time, the data was saved in a .CSV file. Score! I could have left just the CSV file and not created the database, which I do some times for other amounts of data. But as I mentioned before, these are skills I can use elsewhere as well so why not work on them as well? I went ahead and used MySQL for my practicing purposes. I was somewhat familiar with it and it was easy to set up.

Next step was getting the database and table created, then my data imported. Now I needed my query. Just about as much spend on figuring out how to get my data was spent on coming up with my database queries. Eventually I realized I needed to go through and get all of the data separately. From there, I was able to get the my “hue” for my graphs from a regular query. Now came the funnest part: plotting! And, well, check it out!

Not bad, right? At least for a first timer. This really took a lot of time playing with the different bins and figuring out what would look best. And that might have been the coolest part. I could see all of the data and in different forms. But, ultimately, this is what I liked the best. For the record — this chart is by wOBA, not hits.

And as much as I liked it, it needed something else for me to be satisfied with it. And that’s when I ended up with this:

I decided to put over the histplot a scatterplot for a specific player and their ball in play. I thought this would show some sort of luck, or lack thereof. Color scheme could have changed some… but honestly, I’m happy with how it turned out. The legend helps with knowing what type of hit had been made, what the expected wOBA would have been for each hit.

I think that with the end of each one of my posts, I shall include my code. That way if someone wants to try to replicate it or if there is some criticism with the code:

import pandas as pdimport pymysql.cursorsimport getpassimport seaborn as snsimport numpy as npimport matplotlib.pyplot as pltimport os.pathimport mysql_checker# A few variables needed to continue onconnection = mysql_checker.mysql_Server_check()name = input("Specify the last name of the player: ")year = input("Select a year: ")# This is the section where the queries are ran to get the needed data. It runs the queries and then keeps the data in a Pandas DataFrametry:with connection.cursor() as cursor:sql_mode = "SET SESSION sql_mode=''"cursor.execute(sql_mode)ev_table = "CREATE TEMPORARY TABLE ev SELECT DISTINCT ev FROM prob WHERE year = "+year+";"cursor.execute(ev_table)la_table = "CREATE TEMPORARY TABLE launch_angle SELECT DISTINCT launch_angle FROM prob WHERE year = "+year+";"cursor.execute(la_table)data_query = "SELECT ev, launch_angle, n_hip, n_hits, woba FROM prob WHERE year = "+year+" AND ev > 0 ORDER BY year, ev, launch_angle;"ev_query = "SELECT DISTINCT COUNT(ev) FROM ev;"la_query = "SELECT DISTINCT COUNT(launch_angle) FROM launch_angle;"data = pd.read_sql(data_query, connection)ev_data = pd.read_sql(ev_query, connection)la_data = pd.read_sql(la_query, connection)player_query = "SELECT ev, launch_angle, hit, event FROM player WHERE at_bat > 0 AND player_name LIKE '%"+name+"%' AND YEAR(game_date) = "+year+" AND ball_in_play > 0 AND ev > 0;"player_data = pd.read_sql(player_query, connection)player_data.loc[player_data['hit'] == 0, 'hit'] = 'out'player_data.loc[player_data['hit'] == 1, 'hit'] = 'hit'player_data.loc[player_data['event'] == "field_out", 'event'] = "out"player_data.loc[player_data['event'] == "force_out", 'event'] = "out"player_data.loc[player_data['event'] == "field_error", 'event'] = "error"player_data.loc[player_data['event'] == "sac_fly", 'event'] = "out"player_data.loc[player_data['event'] == "fielders_choice", 'event'] = "out"player_data.loc[player_data['event'] == "grounded_into_double_play", 'event'] = "out"player_data.loc[player_data['event'] == "double_play", 'event'] = "out"player_data.loc[player_data['event'] == "sac_bunt", 'event'] = "out"player_data.loc[player_data['event'] == "fielders_choice_out", 'event'] = "out"player_data.loc[player_data['event'] == "sac_fly_double_play", 'event'] = "out"player_data.loc[player_data['event'] == "home_run", 'event'] = "home run"# The next 6 is to also increase the size of the dot based on type of hit. It's not used right now, but it is set up to play with.player_data.loc[player_data['event'] == "single", 'size'] = (1*.25)player_data.loc[player_data['event'] == "double", 'size'] = (2*.25)player_data.loc[player_data['event'] == "triple", 'size'] = (3*.25)player_data.loc[player_data['event'] == "home run", 'size'] = (4*.25)player_data.loc[player_data['event'] == "out", 'size'] = (1*.25)player_data.loc[player_data['event'] == "error", 'size'] = (1*.25)finally:connection.close()# Using this to make sure the correct number of bins are displayed each timeev_int = int(ev_data.iloc[0])la_int = int(la_data.iloc[0])# This creates a proper legend that includes colors for out, error, and each type of hitsns.histplot(data, x="ev", y="launch_angle", hue="woba", bins=(ev_int,la_int), weights="woba", palette="Reds", legend='brief')sns.scatterplot(data=player_data, x="ev", y="launch_angle", hue="event", hue_order=['out', 'error', 'single', 'double', 'triple', 'home run'], palette="flare", legend='full')plt.legend(bbox_to_anchor=(1.01, 1), borderaxespad=0, loc='upper left')plt.xlabel('EV')plt.ylabel('Launch Angle')plt.title("EV and LA histogram with "+name+" ball in play overlay for "+year)plt.show()

Thank you for reading! Can’t wait to show what I have next!

--

--