Metrics, Measure, and Sports Science: Moving the Needle

To measure, or not to measure, that is NOT the question.  Measurement is great.  It enables objective feedback, provides transparency, and articulates results. This article is not meant to bash metrics or technology, as both are important parts of our everyday programming and decision making.  However, one needs to think critically about what to measure, the quality of the measure, and the unintended consequences of the measure. 

Every time I hear the need for technology and “objective” measure, I typically hear quotes such as:

“You can’t manage, what you can’t measure.” – Peter Drucker

“Without data you’re just another person with an opinion.” -Edwards Deming

“The unexamined life is not worth living.” – Socrates

All great quotes.  My response to these three gentlemen:

Mr. Drucker

What gets measured in performance, doesn’t always get managed. In fact, many times:

  • The measurement doesn’t matter

  • Is noise

  • Is collected and never applied

  • Is not related to the task at hand

  • Is not contextualized

  • Is “show” without the “tell”

Mr. Deming

Opinion does matter!  Specifically, expert opinion. Data is just a number hamburger.  The patty, or meat, is the number, and the buns surrounding the patty represent subjective opinion.  That is:  Assumptions on what should be measured, and the analyzation and narrative created by the number.  Data also represents what happened (past tense), not what happens, or what may happen in the future.  This is the gamblers fallacy. Gerd Gigernzer states: “In an uncertain world, a complex strategy can fail exactly because it explains too much in hindsight.”  Heuristics in a complex world such as sports science may be more important than metrics. 

Socrates:

The unexamined life is not worth living, but so may the overexamined life.  Sleep tracking, HRV, calories counted, muscle oxygen saturation, body comp, TRIMP scores, jump measures, blood draws, strength tests, and the list goes on.  At what point is the line of demarcation between healthy measure and paranoia?  Picking the pepper out of fly shit may lead to unintended consequences.  Human beings are not robots.   

 

Below are 5 Questions Critical Thinker’s should rationalize prior to the use of metrics:

1.) Animate vs. Inanimate:  As Karl Popper states “clouds or clocks.”  One is complex (no two are the same, one in complicated). Animate beings can respond to the measures being conducted.  No one lives in a test tube.  There are 1000’s of confounding variables at play in the landscape of high performance.

Advice:  Standardize procedures, use technology with excellent reliability (budget pending)



2.) Utility of Significance:  Ease vs. Significance?  I strongly recommend creating a theoretical framework (2)for each sport re:  the physical qualities needed to play at a high level. As Jerry Muller states: “What gets measured is what’s most easily measured, & since the outcome is more difficult to measure than the inputs, it’s the inputs that get attention…the tendency here (w/so many performance metrics) is to glean the low hanging fruit & then expect a bountiful harvest.

Advice:  Start from the scoreboard and work backwards.  Many of these attributes are difficult to measure



3.) How Important are the measures? Apart from outliers, does the marginal cost of continuing to measure outweigh the benefits?

Advice:  Choose your measures carefully.  Rifle like approach vs shotgun approach.



4.) What are the costs?  Every specialist has constraints.  What are your time constraints?  For every hour you spend creating data, you will spend an hour away from what’s being measured.

Advice: Prioritize based on your current constraints.



5.) Not all problems are solvable (even w/more metrics):  As Jerry Muller states: “Recognizing the limits of the possible is the beginning of wisdom. Not all problems are soluble.  It is not true that everything can be improved by measurement, or that everything that can be measured can be improved.”

Advice:  Stay humble.

 

What to measure

The pendulum has swung in sport science and strength and conditioning to specialty in tools and measurement dashboards.  Having said that, in my opinion, this should not come at the expense of first principles such as a thorough understanding of biomechanics, physiological adaptation, planning strategies, nutrition, and communication.  In addition, the technical and tactical elements of the sport are extremely important.  Communicating to players and coaches void of context may ruin buy in regardless of how accurate and important the measurement is. 

For our hockey playing population, we measure:

  • Internal Load:  TRIMP, TRIMP/min

  • External Load:  Player Load

  • Peak Force:  IMTP

  • Power:  CMJ (force plate)

  • Body Comp:  BF%

I have found that the communication of these metrics, may be more important than the metrics themselves.  As Gerd Gigerenzer states: “A physician who takes away anxiety from his patient is a good doctor.”  Well, a coach that creates confidence for his/her players, is a good professional. Feelings > figures and numbers.  Communication is critical. 

 

Quality of Measure

Validity is the degree to which an instrument truly measures the construct.  There are several sources of validity.

·  Content Validity: focuses on whether the content of the instrument corresponds with the construct to be measured (Based on judgment, and no statistical testing is involved: Face Validity).  Content validation starts the process! 

·  Criterion Validity: Applicable in situations in which there is a gold standard of the construct to be measured, refers to how well the scores of the measurement instrument agree with the scores of the gold standard (the second step in the process)

·   Construct Validity: Applicable in situations in which there is no gold standard, refers to how well the instrument provides the expected scores, based on existing knowledge about the construct


All things being equal.  Choose a gold standard measurement tool, or an instrument that provides excellent reliability when compared to a gold standard. 


Apart from having a firm understanding of the sport at hand, reliability is an important consideration in metric selection. Reliability the degree to which the measure is free from measurement error.  At the time of this writing, there are currently 59 metrics for the Hawkin Dynamics force plate system, 8-Key Metrics for FirstBeat and 30 + hockey metrics in the Catapult GPS system. When choosing metrics from pre-existing literature (1), look for CV (coefficient of variation scores). The COV is the variability relative to the mean on repeated tests.  A smaller is better (<10% is best). 


CV: SD/Mean x 100%


Interclass correlation is another consideration when viewing pre-existing literature.  Relative reliability of repeated measures examines how the individual maintained his rank within the sample over time.  Values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.9 are indicative of poor, moderate, good, and excellent reliability respectively (3). 


Communicating Measure

Tell a story. Create a narrative.  What are the trends, what are the outliers?  As coaches, we are Texas sharpshooters.  We draw the bullseye after the shots have been fired.  We rarely perform experimental, lab-based experiments.  Rather, we perform observational research. We look for patterns and weave narratives.  Here’s an example narrative I like to use for TRIMP.  


Player:  Why the hell is any of this important for me?  I’m a hockey player, not a marathon runner. Why am I wearing a HR monitor? what’s TRIMP?

Performance Coach:  That’s a great question.  We want to get an idea of how hard you work during practice.  The best analogy I can give you for TRIMP is:  If you’re a car, how much gas would you use driving from Columbus to Cleveland?  Every car is different.  We use this number to track your personal workload (gas mileage).  If an injury occurs, we can create objective markers to get you back on the ice as safe as possible. 


Player: What’s a good TRIMP number?  Will this information be used against me? 

Performance Coach:  Another great question!  Your TRIMP number is personal.  We take team averages (typically b/w 80-110), but your unique gas mileage is different.  Every car is different.  these numbers are yours, and are used for RTP, not performance measures. 


Create your narrative.  Speak coach!  Speak sport!  Speak athlete! 


Moving the Needle

I recently viewed a post on Twitter from a former NBA coach and now color analyst that stated:

 

“90’s NBA teams had just a trainer and a SC, they practiced more often and harder, and played more back to backs.  Teams now have huge medical & “performance” staffs and value rest over practice.  Yet injuries and games missed are way up.  Something isn’t working.” -Stan Van Gundy

 

Naturally, I pondered the same question in the sport of ice hockey.  Are injuries up, down, or relatively stable?  In addition, sport science in the NHL is still relatively young.  According to an article published in in The Athletic written in 2017, 15 teams in the NHL have someone with a job title related to sports science. That is roughly half the league.  It can be assumed that this number is growing. So, are MGL (man games lost to injury) decreasing, increasing, or staying the same?  I looked at these numbers via https://nhlinjuryviz.blogspot.com

 

NHL Injury History Chart (Click MGL boxplot)

I further broke down the box plot into a standard table below.

                      Table 1: NHL MGL by Season/Avg/Team

 Notes:

  • *Assumption:  Almost half the league had a dedicated Sports Scientist by the 2015-2016 season

  • *The 2012-2013 season was cut short due to lockout

  • *The 2019-2020 season was cut short due to COVID

  • *The 2020-2021 season was cut short due to COVID

  • In 2000-2001, the league had 30 teams.  In 2017-2018, **Vegas was added to the league and in 2021-2022, ***Seattle was added, making the league a 32 team

By the Numbers

Pre 2015: Avg MGL/Season: 7276.3 (Median MGL/Season: 7163), Avg MGL/Team 243 (Median MGL/Team: 238.8)

Since 2015: Avg MGL/Season: 7551.1 (Median MGL/Season: 6830), AvgMGL/Team 248.2 (Median MGL/Team: 220.3)

Comments:

  • It appears that 2020-2022 have the highest NHL MGL/season and Avg MGL/team in the last 22 years.  Based on the trend numbers in 2022-2023, these numbers may increase.

  • The 21-year MGL/season average is 7,903 (note the NHL had 30 teams in 2000, Vegas (2017) and Seattle (2021) bring the total to 32 NHL teams. 

  • Based on 32 teams, the 21-year Avg MGL/Team is 247 games

 

I don’t believe we can definitively answer this question as it depends on how one views the numbers.  Using averages, it would seem that injuries have increased since the role of the sports science position in professional hockey, but this is likely due to outliers, specifically the 2020-2021, and 2021-2022 seasons.  Outliers impact the ability to use averages.  When interpreting the numbers via the median, it appears as though both MGL and AvgMGL/Team have decreased (delta 333 MGL, 17.7 AvgMGL/Team).  It would seem that time will ultimately tell in the proceeding years how much of an impact the role will play in professional hockey.  The game has also changed as players are bigger, faster, and stronger increasing kinetic energy. This has a tangible effect on impact forces and the need for improved protective equipment.

Injuries are very challenging.  They live in the world of uncertainty. There may be four drivers of injury such as (poor biomechanics, poor lifestyle, poor programing, or poor medical intervention) most of which are difficult for any coach to control.  So, how should the position be evaluated? How do we define success? What are the roles and responsibilities of the sports performance coach?  Wins and losses?  The ability to be liked by the players?  MGL to injury?  All important considerations for team staff to consider. 

Metrics and measurement are great.  They provide objective information, but it’s not without unintended consequences.  Science starts and ends with problems.  Too many times we have solutions that are desperately seeking problems.  As Aaron Haspel states: “Those who believe that what you cannot quantify does not exist also believe that what you can quantify, does.”  “Ideal conditions” are not found in the real world.  Use numbers, use gut feeling, use your pragmatic experience.  ALL are important in the coaching profession. All may be important for moving the needle.     

 

REFERENCES:

1.         Bishop C, Turner A, Jordan M, Harry J, Loturco I, Lake J, and Comfort P. A framework to guide practitioners for selecting metrics during the countermovement and drop jump tests. Strength and Conditioning Journal 44: 95-103, 2022.

2.         Impellizzeri FM and Marcora SM. Test validation in sport physiology: lessons learned from clinimetrics. Int J Sports Physiol Perform 4: 269-277, 2009.

3.         Koo TK and Li MY. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15: 155-163, 2016.

 

Previous
Previous

COMPLEXITY, CHAOS, AND IPADS