Regression Towards the Mean

“Wow, athlete A had a poor year.  It must have been a sophomore slump. He needs to approach things differently in the offseason.” 


“Wow, athlete A had an amazing year.  He must have had a new performance coach. He needs to keep doing what he’s doing.” 

 

“The aggregate of injuries on team X this year is unacceptable.  Over 500-man games lost (MGL).  We need to do a deep dive and hire a consulting company.”

 

“We fixed it.  Our injuries are down to less than 280-man games lost.  These guys know what they’re talking about.” 

 

We hear these narratives quite commonly by pundits in high performance.  Press, coaches, front office staff are constantly seeking answers to complex questions.  After all, their job is to win, and you can’t win without production from top players, and healthy teams.  It would be great to pinpoint the exact cause of each of these attributes, but sport and injury are extremely complex, unpredictable, and random.

 

“History is not what happened, but what survives the shipwrecks of judgment and chance.”  -Maria Popova

 

If we alter the quote above and insert the word “injury”, “production” or “scoreboard outcome” in place of “History”, we can better understand their complexities.  Luck, chance, skill, timing etc., etc. all affect the outcome.  Thousands of confounders at play. 

 

Perhaps a better way to look at these snapshots (i.e., injury and production) is what is known in statistics as regression towards the mean. 

 

“Regression toward the mean simply says that, following an extreme random event, the next random event is likely to be less extreme. In no sense does the future event "compensate for" or "even out" the previous event.” -Wikipedia

 

Using Regression Towards the mean for Injury Probability

Last year Team X was hit hard by injuries.  A total of 550 MGL were accounted for.  Let’s take a look at 16 years of Team X’s longitudinal data. My best prediction based on using the regression towards the mean for next season is approximately 299 MGL next year (4784/16).  Obviously, this may not be the case, but when viewing the aggregate, I believe that number best approximates my hypothesis.  Yes, injures are complex and may be multifactorial, but these numbers can be used to compare league wide team numbers in better evaluating what’s “fair”, “good” or “unacceptable “ in making difficult decisions. One season may be noise, a drastic outlier, chance, or simply a fluke.

 Using Regression Towards the mean for Production

An elite center ice man had a breakout season in 2017-2018 scoring 43 goals in the NHL regular season.  Is this sustainable? 50 next year? The year after? He has played a total of 8.25 seasons. Using regression towards the mean my best guess would be that his goal production would be in the range of 17-18 goals (140/8.25) for the upcoming hockey season. 

Sports, scoring production, and injury are independent, random events.  Extreme numbers may simply be variable fluctuations in the sea of noise while setting voyage on the ships of chance and judgment.  Use regression towards the mean when dealing with random, complex issues in attempts to avoid the gamblers fallacy.

 

“The Gamblers Fallacy occurs when an individual erroneously believes that a certain random event is less likely or more likely to happen based on the outcome of a previous event or series of events.” 

 

 

Previous
Previous

The Case Against Adductor Squeeze Tests

Next
Next

The Metric Hamburger