Pole-Vault and Big Data

A lot of sports are ahead of track and field. When you hear of statistics and data what sport do you think of? Baseball, football, and basketball come to my mind. I think for a majority of others that holds true. So why is it that in track and field, a sport that relies so heavily on numbers that you don’t see much coverage, or data analysis into the sport?

I can name a lot of reasons, but there is one topic that rules over all the others; money. Yes money, why pour extra money into a resource that doesn’t generate that much revenue in the first place? I always had a couple of questions, or I was just always curious to see what trends there might be in the pole-vault world. What would be even cooler, what if we could answer them using data? (You know that thing that all decisions are based upon nowadays…)

So, I did that. I was able to grab a dataset and I was able to determine some pretty cool things.

But first a shout-out

Before I get into the data analysis and pole-vault stuff, I have to give credit to those who helped get the data in the first place. Pretty much without it, I wouldn’t be able to do any of this so far.

Evan Keating – Big shoutout to Evan one of my co-workers who is responsible for getting me the data to start. He’s been able to help in everything from SQL, python, web scrapping, and crypto 😀 this guy is a pro.

Gathering the data

To help you understand what is possible with the dashboards that we have created you have to understand the data that we have to work with. Here’s the problem – there really isn’t an easily accessible data source for this information AND different levels of data are managed by different leagues, organizations, entities etc.. This made the start of this process particularly difficult.

There is one source that rules over all others, and in this case it happens to be for the better, the NCAA. Collegiately all marks are recorded and presented in a uniform format over at https://www.tfrrs.org/. This became the source for our data used in this dataset.

A taste of the vault

Now to be more clear on what we were able to collect, I want to break that down here;

  1. The Top 500 marks of both men and women for
    • Indoor seasons from 2010-2020
    • Outdoor seasons from 2010-2020
    • Divison I, II, & III

OK, so what does this mean? If your best mark of the listed seasons was high enough to reach the Top 500 leaderboard then you will be recorded. So there is not an athlete progression year over year.

Additionally, from 2010-2012 athletes with their recorded marks are marked as NULL. This data inconsistency is something that has skewed some of our initial results as we continue to work through those challenges. After looking at the raw data, it’s from a very early on set of information from 2010-2012ish… Most likely early days of entering and recording data. Unfortunately, there isn’t much that can be done about that.


About our dataset.

So the dataset used for each of the visualizations that you see here and on my tableau dashboard uses the data collected from https://www.tfrrs.org/. I have to give credit to my roommate and co-worker, Evan Keating who was able to gather the data off of the website from the top 500 vaulters from 2010-2020 leaderboard. We were able to collect the top 500 marks in each division, and the top mark for that vaulter.

Additionally, we can tell from each record at what meet this jump occurred, and on what date this jump was taken. So this provides us with some valuable insights to help us get some answers to the questions that we are trying to answer….

What do we ask?

Influence

Aside from those who were able to make this possible I also wanted to acknowledge the coaches, mentors, friends, and others who I reached out to at the start of the this endeavor. I reached out to a couple of coaches, friends, and other influences in the pole-vault community. Asking similar questions to see what needs to be answered and what would be helpful for others to take a look at.

There is a lot of information that this dataset is able to tell us, from average heights to best marks at meets. However, it’s always what we can make from this inferences that tell us a-little more about the sport, and can start some great discussions.

Questions each dashboard is trying to answer…

  1. What is the average height for each of the three divisions? (Top 500)
  2. What is the average height across each division for each gender?
  3. What is the average top 10 height across each year for gender/year?
  4. Have the heights gotten any higher/lower?
  5. Which teams have the highest averages for heights cleared in the top 500?
  6. Where are the highest averages from across the US?
  7. Where are the lowest averages?
  8. Is there any correlation between schools, conference, and others relating to vault heights?

Tableau Dashboards

Check out the Tableau Dashboard here! Completely free and interactive for to share with your friends!

  1. Jump Search – This dashboard is used to search up your favorite athlete or friend and do a quick search and compare of their best jumps year over year.
  2. Heights Average – This top portion of the dashboard represents an average jump height by division. You can naturally see that the highest average jump has been all completed by the Div I athletes.
  3. Divisions Distance of 500 – This worksheet is for fun mainly, and shows the total running height distance jumped each year by broken down by division and then gender.
  4. Average Heights Across Divisions – This final dashboard shows an average height for each gender and division. Use the right hand filters to adjust your results.
    The other piece of this dashboard shows brings the forecasting feature of Tableau to preview what the average height would be for 2020 and 2021 based on the data set that we have loaded. Largely due to the corona-virus this 2020 value is purely theoretical at the time of this data export.
What happened in 2018 in Division 3 pole-vault?

Cool Stat.

Now if you’ve made it this far, then you are probably a vaulter, and you’re waiting for something cool. Well, after looking at some of the jump data all the pole-vaulters with a Top 500 mark since 2010 covered is 36.6 miles. That’s the equivalent of traveling round-trip from 242nd to the South Ferry station in New York City.


What’s next?

I will continue to update this project and tableau visualization over the next couple of weeks. So stay tuned. The second endeavor is a full fledged pole-vault notebook app, that has been in the works. All completely free that I hope is going to come out by the start of the indoor season.


Let’s build something together.

Check out what you can find with #data in the #polevault! #tableau

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s