Thursday, February 08, 2024

Probabilistic Predictive Model Applied To Rugby Six Nations (2024)

WCMI Predictive Model For Rugby Six Nations

Dirichlet-Multinomial Model

Diniz et al. (2017) outline a simple Bayesian Dirichlet-Multinomial mixture model for predicting match outcomes in the 2014 Brazilian Championship A Series.

Here, we will use a very simple Python script to illustrate the Multinomial-Dirichet 1 model in section 2.4:

# [Diniz et al. (2017) "Comparing probabilistic predictive models applied to football", pp. 5-6.](https://arxiv.org/pdf/1705.04356.pdf)

from scipy.stats import dirichlet_multinomial
from tabulate import tabulate

# Define the count vectors for Gremio (home) and Atletico-PR (away)
# Results from 2013-2024 Six-Nations matches.
h = [6, 2, 1]
a = [2, 3, 4]

n = sum(h)

# Add 1 to each element of count vectors to get parameters for Dirichlet-Multinomial distributions
h_params = [x + 1 for x in h]
a_params = [x + 1 for x in a]

# Create Dirichlet-Multinomial distributions for Gremio (home) and Atletico-PR (away)
dm_home = dirichlet_multinomial(h_params, n)
dm_away = dirichlet_multinomial(a_params, n)

# Calculate predictive probabilities for match outcome
p_win =  0.5 * dm_home.mean()[0] / n + 0.5 * dm_away.mean()[2] / n
p_draw = 0.5 * dm_home.mean()[1] / n + 0.5 * dm_away.mean()[1] / n
p_loss = 0.5 * dm_home.mean()[2] / n + 0.5 * dm_away.mean()[0] / n

print()
print('Soccer Brazil, (2014)')
print()
print('September 10, 2014, 07:30 PM')
print('Gremio vs Atletico-PR')
print()

# Create table with probabilities and implied odds
table = [["Gremio Win", f"{(p_win * 100):.2f}%", f"{(1 / p_win):.2f}"],
        ["Draw", f"{(p_draw * 100):.2f}%", f"{(1 / p_draw):.2f}"],
        ["Atletico-PR Win", f"{(p_loss * 100):.2f}%", f"{(1 / p_loss):.2f}"]]

print(tabulate(table, headers=["Outcome", "Probability", "Implied Odds"], tablefmt="pretty", colalign=("left", "right", "right")))
print()

"""
Expected Output:  
+-----------------+-------------+--------------+
| Outcome         | Probability | Implied Odds |
+-----------------+-------------+--------------+
| Gremio Win      |      50.00% |         2.00 |
| Draw            |      29.17% |         3.43 |
| Atletico-PR Win |      20.83% |         4.80 |
+-----------------+-------------+--------------+
Note that the probabilities sum to 1.00!
"""

# Snippet generated using Bing-CoPilot.
# English is the new lingua franca for prompting LLM code generation!

and also to price up the upcoming rugby Six-Nations contest between Scotland and France at Murrayfield:

# [Diniz et al. (2017) "Comparing probabilistic predictive models applied to football", pp. 5-6.](https://arxiv.org/pdf/1705.04356.pdf)

from scipy.stats import dirichlet_multinomial
from tabulate import tabulate

# Define the count vectors for Scotland (home) and France (away)
# Results from 2013-2024 Six-Nations matches.
h = [14, 0, 14]
a = [11, 1, 16]

n = sum(h)

# Add 1 to each element of count vectors to get parameters for Dirichlet-Multinomial distributions
h_params = [x + 1 for x in h]
a_params = [x + 1 for x in a]

# Create Dirichlet-Multinomial distributions for Scotland (home) and France (away)
dm_home = dirichlet_multinomial(h_params, n)
dm_away = dirichlet_multinomial(a_params, n)

# Calculate predictive probabilities for match outcome
p_win =  0.5 * dm_home.mean()[0] / n + 0.5 * dm_away.mean()[2] / n
p_draw = 0.5 * dm_home.mean()[1] / n + 0.5 * dm_away.mean()[1] / n
p_loss = 0.5 * dm_home.mean()[2] / n + 0.5 * dm_away.mean()[0] / n

print()
print('Rugby Six-Nations, (2024)')
print()
print('February 10, 2024, 03:15 PM')
print('Scotland vs France')
print()

# Create table with probabilities and implied odds
table = [["Scotland Win", f"{(p_win * 100):.2f}%", f"{(1 / p_win):.2f}"],
        ["Draw", f"{(p_draw * 100):.2f}%", f"{(1 / p_draw):.2f}"],
        ["France Win", f"{(p_loss * 100):.2f}%", f"{(1 / p_loss):.2f}"]]

print(tabulate(table, headers=["Outcome", "Probability", "Implied Odds"], tablefmt="pretty", colalign=("left", "right", "right")))
print()

"""
Expected Output:  
+--------------+-------------+--------------+
| Outcome      | Probability | Implied Odds |
+--------------+-------------+--------------+
| Scotland Win |      51.61% |         1.94 |
| Draw         |       4.84% |        20.67 |
| France Win   |      43.55% |         2.30 |
+--------------+-------------+--------------+
Note that the probabilities sum to 1.00!
"""
# Snippet generated using Bing-CoPilot.
# English is the new lingua franca for prompting LLM code generation!
The model is a Bayesian approach to predicting sports outcomes. It considers the past performance of teams in their respective home or away games to estimate the likelihood of winning, drawing, or losing future matches. This model uses the Dirichlet distribution, a generalisation of the Beta distribution for multiple outcomes (in this case, wins, draws, and losses) for both the conjugate prior and posteriot distributions.

We use the historical data of Scotland playing at home and France playing away to create our prediction. We will assume an equal weight for the mix of the two Dirichlet posterior distributions - one for Scotland's home games and one for France's away games.

Six-Nations (2013-2024):

Scotland (Home record): 14 wins, 14 losses, 0 draws.

France (Away record): 11 wins, 16 losses, 1 draw.

Now, we can use the following formula to calculate the predictive probabilities:

P(Xn+1=ih,a)=12(hi+αiht+αt)+12(a3i+1+α3i+1at+αt)P(X_{n+1} = i|h, a) = \frac{1}{2} \left( \frac{h_i + \alpha_i}{h_t + \alpha_t} \right) + \frac{1}{2} \left( \frac{a_{3-i+1} + \alpha_{3-i+1}}{a_t + \alpha_t} \right)

where ii corresponds to the outcome (1 for a win, 2 for a draw, 3 for a loss), hih_i and aia_i are the historical win/draw/loss counts, and αi\alpha_i are the parameters of the prior Dirichlet distribution (which are all 1s in our case).

Given that the Dirichlet prior is uniform - D(1,1,1)D(1, 1, 1), we can update this with our historical data to obtain the posterior distributions. For Scotland at home, the posterior distribution is D(15,1,15)D(15, 1, 15), and for France away, it's D(12,2,17)D(12, 2, 17). These distributions take into account the uniform prior and the observed wins, draws, and losses.

The Multinomial-Dirichlet model suggests that the most likely outcome is a win for Scotland, with a probability of approximately 5252%.

It's important to note that these predictions are solely based on historical performance data and do not consider other potentially influential factors such as player injuries, current team form, weather conditions, or tactical changes.

As ever, the scripts have little or no "errr0r" handling and are only starting points for your own explorations.

Enjoy!