### Dirichlet-Multinomial Model

Diniz et al. (2017) outline a simple Bayesian Dirichlet-Multinomial mixture model for predicting match outcomes in the 2014 Brazilian Championship A Series.

Here, we will use a very simple Python script to illustrate the
`Multinomial-Dirichet 1`

model in section 2.4:

```
# [Diniz et al. (2017) "Comparing probabilistic predictive models applied to football", pp. 5-6.](https://arxiv.org/pdf/1705.04356.pdf)
from scipy.stats import dirichlet_multinomial
from tabulate import tabulate
# Define the count vectors for Gremio (home) and Atletico-PR (away)
# Results from 2013-2024 Six-Nations matches.
h = [6, 2, 1]
a = [2, 3, 4]
n = sum(h)
# Add 1 to each element of count vectors to get parameters for Dirichlet-Multinomial distributions
h_params = [x + 1 for x in h]
a_params = [x + 1 for x in a]
# Create Dirichlet-Multinomial distributions for Gremio (home) and Atletico-PR (away)
dm_home = dirichlet_multinomial(h_params, n)
dm_away = dirichlet_multinomial(a_params, n)
# Calculate predictive probabilities for match outcome
p_win = 0.5 * dm_home.mean()[0] / n + 0.5 * dm_away.mean()[2] / n
p_draw = 0.5 * dm_home.mean()[1] / n + 0.5 * dm_away.mean()[1] / n
p_loss = 0.5 * dm_home.mean()[2] / n + 0.5 * dm_away.mean()[0] / n
print()
print('Soccer Brazil, (2014)')
print()
print('September 10, 2014, 07:30 PM')
print('Gremio vs Atletico-PR')
print()
# Create table with probabilities and implied odds
table = [["Gremio Win", f"{(p_win * 100):.2f}%", f"{(1 / p_win):.2f}"],
["Draw", f"{(p_draw * 100):.2f}%", f"{(1 / p_draw):.2f}"],
["Atletico-PR Win", f"{(p_loss * 100):.2f}%", f"{(1 / p_loss):.2f}"]]
print(tabulate(table, headers=["Outcome", "Probability", "Implied Odds"], tablefmt="pretty", colalign=("left", "right", "right")))
print()
"""
Expected Output:
+-----------------+-------------+--------------+
| Outcome | Probability | Implied Odds |
+-----------------+-------------+--------------+
| Gremio Win | 50.00% | 2.00 |
| Draw | 29.17% | 3.43 |
| Atletico-PR Win | 20.83% | 4.80 |
+-----------------+-------------+--------------+
Note that the probabilities sum to 1.00!
"""
# Snippet generated using Bing-CoPilot.
# English is the new lingua franca for prompting LLM code generation!
```

and also to price up the upcoming rugby `Six-Nations`

contest
between *Scotland* and *France* at *Murrayfield*:

```
# [Diniz et al. (2017) "Comparing probabilistic predictive models applied to football", pp. 5-6.](https://arxiv.org/pdf/1705.04356.pdf)
from scipy.stats import dirichlet_multinomial
from tabulate import tabulate
# Define the count vectors for Scotland (home) and France (away)
# Results from 2013-2024 Six-Nations matches.
h = [14, 0, 14]
a = [11, 1, 16]
n = sum(h)
# Add 1 to each element of count vectors to get parameters for Dirichlet-Multinomial distributions
h_params = [x + 1 for x in h]
a_params = [x + 1 for x in a]
# Create Dirichlet-Multinomial distributions for Scotland (home) and France (away)
dm_home = dirichlet_multinomial(h_params, n)
dm_away = dirichlet_multinomial(a_params, n)
# Calculate predictive probabilities for match outcome
p_win = 0.5 * dm_home.mean()[0] / n + 0.5 * dm_away.mean()[2] / n
p_draw = 0.5 * dm_home.mean()[1] / n + 0.5 * dm_away.mean()[1] / n
p_loss = 0.5 * dm_home.mean()[2] / n + 0.5 * dm_away.mean()[0] / n
print()
print('Rugby Six-Nations, (2024)')
print()
print('February 10, 2024, 03:15 PM')
print('Scotland vs France')
print()
# Create table with probabilities and implied odds
table = [["Scotland Win", f"{(p_win * 100):.2f}%", f"{(1 / p_win):.2f}"],
["Draw", f"{(p_draw * 100):.2f}%", f"{(1 / p_draw):.2f}"],
["France Win", f"{(p_loss * 100):.2f}%", f"{(1 / p_loss):.2f}"]]
print(tabulate(table, headers=["Outcome", "Probability", "Implied Odds"], tablefmt="pretty", colalign=("left", "right", "right")))
print()
"""
Expected Output:
+--------------+-------------+--------------+
| Outcome | Probability | Implied Odds |
+--------------+-------------+--------------+
| Scotland Win | 51.61% | 1.94 |
| Draw | 4.84% | 20.67 |
| France Win | 43.55% | 2.30 |
+--------------+-------------+--------------+
Note that the probabilities sum to 1.00!
"""
# Snippet generated using Bing-CoPilot.
# English is the new lingua franca for prompting LLM code generation!
```

The model is a Bayesian approach to predicting sports outcomes. It
considers the past performance of teams in their respective home or away
games to estimate the likelihood of winning, drawing, or losing future
matches. This model uses the Dirichlet distribution, a generalisation of
the Beta distribution for multiple outcomes (in this case, wins, draws,
and losses) for both the conjugate prior and posteriot distributions.
We use the historical data of Scotland playing at home and France playing away to create our prediction. We will assume an equal weight for the mix of the two Dirichlet posterior distributions - one for Scotland's home games and one for France's away games.

Six-Nations (2013-2024):

Scotland (Home record): 14 wins, 14 losses, 0 draws.

France (Away record): 11 wins, 16 losses, 1 draw.

Now, we can use the following formula to calculate the predictive probabilities:

$P(X_{n+1} = i|h, a) = \frac{1}{2} \left( \frac{h_i + \alpha_i}{h_t + \alpha_t} \right) + \frac{1}{2} \left( \frac{a_{3-i+1} + \alpha_{3-i+1}}{a_t + \alpha_t} \right)$

where $i$ corresponds to the outcome (1 for a win, 2 for a draw, 3 for a loss), $h_i$ and $a_i$ are the historical win/draw/loss counts, and $\alpha_i$ are the parameters of the prior Dirichlet distribution (which are all 1s in our case).

Given that the Dirichlet prior is uniform - $D(1, 1, 1)$, we can update this with our historical data to obtain the posterior distributions. For Scotland at home, the posterior distribution is $D(15, 1, 15)$, and for France away, it's $D(12, 2, 17)$. These distributions take into account the uniform prior and the observed wins, draws, and losses.

The Multinomial-Dirichlet model suggests that the most likely outcome is a win for Scotland, with a probability of approximately $52%$.

It's important to note that these predictions are solely based on historical performance data and do not consider other potentially influential factors such as player injuries, current team form, weather conditions, or tactical changes.

As ever, the scripts have little or no "errr0r" handling and are only starting points for your own explorations.

Enjoy!