Pythagorean Expectation in Football Analytics ⚽⚽

Pythagorean Expectation in Football Analytics ⚽⚽

Sports analytics is the application of statistical analysis and data-driven techniques to understand and improve performance in sports. It involves using data to gain insights, make informed decisions, and gain a competitive edge in various aspects of sports, from player evaluation to game strategy.

In this article, I will review the problem of predicting the percentage of games won by a team in (Indian) football – the Indian Super League (ISL).

Pythagorean expectation provides an estimate of a team's winning percentage based on the number of goals scored and conceded. It helps evaluate the effectiveness of a team's performance and can be a useful tool for assessing their overall quality.

The formula for Pythagorean expectation in football is as follows:

Winning Percentage = Goals For^2 / (Goals For^2 + Goals Against^2)

In this formula, "Goals For" represents the total number of goals scored by a team during a specified period, while "Goals Against" represents the total number of goals conceded by that team. By squaring these values and dividing the squared goals by the sum of squared goals for and against, we obtain an estimated winning percentage for the team.

The Pythagorean expectation is based on the principle that a team's scoring differential (goals for minus goals against) is a strong indicator of its success. Teams that consistently outscore their opponents tend to have higher winning percentages. By using this formula, we can quantify the relationship between goals scored and goals conceded, allowing us to make predictions about a team's performance.

It's important to note that the Pythagorean expectation formula assumes that goals scored and conceded follow a certain distribution pattern and that the observed goals for and against accurately reflect a team's true underlying strength. However, football is a complex sport influenced by various factors, such as tactics, player skills, and luck, which can sometimes lead to deviations from the expected outcomes.

Despite these limitations, Pythagorean expectation remains a valuable tool for assessing a team's performance and comparing it to their actual results. It provides a framework for understanding the underlying factors that contribute to a team's success and can be a valuable tool for making predictions or evaluating a team's future prospects.

Having such a win percentage expectation allows you to potentially forecast future winning/losing streaks if a team is currently under/overperforming.

Let’s say you have the data for all of the team’s games in the first half of the season. How can you use that data to predict the winning percentage of the second half of the season?

One approach would be to get the win percentage from the first half and expect the same win percentage for the second half.

Link to the code: Football_Analytics/ISL_pythagorean_expectation.ipynb at master · AntoTomAbraham/Football_Analytics (github.com)