SportsTrader: Kullback-Leibler Divergence

Showing posts with label Kullback-Leibler Divergence. Show all posts

Monday, February 22, 2021

Trader Probabilities Derivation And K-L Divergence (Part 3)

Returning one last time to the trader win probabilities derivation (TWPD) question Trader Probabilities Derivation And K-L Divergence (Part 1) and Trader Probabilities Derivation And K-L Divergence (Part 2):

How do we create a coherent set of trader win probabilities that does not stray too far from the implied win market probabilities while taking into account our own limited and possibly vague insights?

On this occasion, we will focus on soccer and add a late-breaking injury report from social media to our own 'gut instinct' about the likely outcome.

As before, we are using our 'no-frills' python script to access APMonitor-GEKKO:

import json
import os
import shutil

from gekko import GEKKO

d_path = os.path.dirname(os.path.realpath(__file__))
file = open(d_path + '/apm')
model = file.read()
file.close()

m = GEKKO(remote=False)

m.Raw(model)

m.solve(disp=False)

print('Objective: ', m.options.OBJFCNVAL)

shutil.copy(m.path+'/Results.json', d_path + '/json')

with open(m.path+'/Results.json') as f:
    results = json.load(f)
print(results)

with the input - apm:

Model 20210223_Follis_LeagueMatch

  Variables

    ! --- Public market information ---

    ! Trader win probabilities.
    ! Initial values set to market price implied probabilities.
    ! Lower-bound set to 17% - half the natural-odds implied 
    ! probability (1/3).
    ! Upper-bound set to 100%.
    home = 1.00/4.20, >=0.17, <=1.00
    away = 1.00/3.60, >=0.17, <=1.00
    draw = 1.00/1.98, >=0.17, <=1.00

    kld

  End Variables

  Equations
  
    ! Total trader win probabilities must sum to one.
    home + &
    away + &
    draw = 1.00
    

    ! --- Private trader opinions ---

    ! Gut instinct.
    draw >= 0.50
    draw <= 0.60

    ! Late Twitter away-team injury report.
    home >= 0.25
    home <= 0.30


    ! --- Combination of public and private information ---

    ! Minimize K-LD.
    kld=(1.00/4.20)*log(((1.00/4.20)/home)) + &
        (1.00/3.60)*log(((1.00/3.60)/away)) + &
        (1.00/1.98)*log(((1.00/1.98)/draw))

  End Equations

End Model

and the generated output - json:

{
  "home" : [   2.7046072411E-01],
  "away" : [   2.0172750176E-01],
  "draw" : [   5.2781177413E-01],

  "kld" : [   3.6252143687E-02]
}

Once again, as a sanity check. we can test the derived set of trader win probabilities against our constraints:

min(home, away, draw) >= 17% [0.20173];
0.25 <= home <= 0.30 [0.27046];
0.50 <= draw <= 0.60 [0.52781].

In sum, we have expertly combined the wisdom of the crowds with our own limited insights to derive a coherent and valid set of trader win probabilities (TWPD) Historically, professional traders have 'melded' (e.g. Bill Benter) their own odds-line with the win-market prices to gain a more informed opinion. We have revised that process by baselining on the win-market prices and adding a few constraints where we believe we have additional information (e.g. late-breaking injury or weather reports from social media) while relying on our automatic method to generate the updated trader win probabilities and, of course, indirectly our own odds-line:

Outcome  Private   Public
Home:    3.70      4.20
Away:    4.95      3.60
Draw:    1.89      1.98

For completion here is an updated horse-racing model template file:

Model 20210223_14-30_RacingPark

  Variables

    ! --- Public market information ---

    ! Trader win probabilities.
    ! Initial values set to market price implied 
    ! probabilities.
    ! Lower-bound set to 6% - half the natural-odds 
    ! implied probability (1/8).
    alpha   = 1/25.00, >=0.06, <=1.00
    bravo   = 1/34.00, >=0.06, <=1.00
    charlie = 1/13.50, >=0.06, <=1.00
    delta   =  1/2.78, >=0.06, <=1.00
    echo    =  1/3.60, >=0.06, <=1.00
    foxtrot = 1/17.00, >=0.06, <=1.00
    golf    =  1/8.80, >=0.06, <=1.00
    hotel   = 1/17.50, >=0.06, <=1.00

    kld

  End Variables

  Equations
  
    ! Total trader win probabilities must sum to one.
    alpha + &
    bravo + &
    charlie + &
    delta + &
    echo + &
    foxtrot + &
    golf + &
    hotel = 1.00
    

    ! --- Private trader opinions ---

    ! Half the field have a better chance.
    (alpha + echo + foxtrot + hotel) > (bravo + charlie + delta + golf)

    ! Two horses have between 30% and 40% combined win probability.
    echo + foxtrot >= 0.30
    echo + foxtrot <= 0.40
    
    ! Between 3% and 7% edge on one horse.
    (17.00 * foxtrot) - 1.00 >= 0.03
    (17.00 * foxtrot) - 1.00 <= 0.07


    ! --- Combination of public and private information ---

    ! Minimize K-LD.
    kld=(1/25.00)*log(((1/25.00)/alpha)) + &
        (1/34.00)*log(((1/34.00)/bravo)) + &
        (1/13.50)*log(((1/13.50)/charlie)) + &
        (1/2.78)*log(((1/2.78)/delta)) + &
        (1/3.60)*log(((1/3.60)/echo)) + &
        (1/17.00)*log(((1/17.00)/foxtrot)) + &
        (1/8.80)*log(((1/8.80)/golf)) + &
        (1/17.50)*log(((1/17.50)/hotel))

  End Equations

End Model

Tuesday, February 16, 2021

Trader Probabilities Derivation And K-L Divergence (Part 2)

Returning to the trader win probabilities derivation (TWPD) question we asked last time in Trader Probabilities Derivation And K-L Divergence (Part 1):

How do we derive a coherent and valid distribution of trader win probabilities that deviates as little as possible from the implied win market probabilities distribution while taking into account our own limited and possibly vague insights?

To illustrate our approach, let us assume (once again) that we have identified a horse-race with five runners that meets our WCMI trading threshold.

As previously observed, we never know the true win probabilities of individual horses. However, we almost always have some opinions to work with. For example:

70% chance that winner will come from one of three horses - Alpha, Bravo, or Charlie;
5% edge on implied win market probability for Charlie, and
All horses have at least 7% chance of winning.

Given the implied win market probability distribution $P$ and the trader win probability distribution $Q$ on the countable set $X = \{x1, x2,...\}$ of horses in a specific race with $P_i = P(x_i)$ and $Q_i = Q(x_i)$, the Kullback-Leibler Divergence (K-LD) is defined as $$ D_{KL} (P||Q) = \sum_{x \in X} P(x)log(\frac {P(x)}{Q(x)}) $$ and is the metric we wish to minimize. In so doing, we guarantee that the derived distribution $Q$ will be as close as possible to the original distribution $P$.

For those readers who would like an alternative to the Excel solution, we can strongly recommend the excellent APMonitor-GEKKO optimization suite. Using the following 'no-frills' Python script and APM model file, we can derive the same set of trader win probabilities by minimizing the K-LD from the impied win market probabilities distribution while meeting our additional constraints.

import json
import os
import shutil

from gekko import GEKKO

file = open('APM-Gekko_TWPD_Input.apm')
model = file.read()
file.close()

m = GEKKO(remote=False)

m.Raw(model)

m.solve(disp=False)

print('Objective: ', m.options.OBJFCNVAL)

dir_path = os.path.dirname(os.path.realpath(__file__))
shutil.copy(m.path+'/Results.json', dir_path + '/APM-Gekko_TWPD_Output.json')

with open(m.path+'/Results.json') as f:
    results = json.load(f)
print(results)

Model APM-Gekko TWPD Input
  Variables
    ! Trader win probabilities.
    ! Initial values set to win market implied probabilities.
    ! Lower-bound set to 7%.
    ! Upper-bound set to 100%.
    x1 = 1/2.625, >=0.07, <=1.00
    x2 = 1/3.250, >=0.07, <=1.00
    x3 = 1/5.500, >=0.07, <=1.00
    x4 = 1/6.000, >=0.07, <=1.00
    x5 = 1/21.000, >=0.07, <=1.00
    obj
  End Variables
  Equations
    ! Total trader win probabilities must sum to one.
    x1 + x2 + x3 + x4 + x5 = 1.00
    ! First three horses have approximately 70% combined win probability.
    x1 + x2 + x3 <= 0.70
    ! Assume 5% edge on third horse.
    (5.500 * x3) - 1.00 >= 0.05
    ! Minimize K-LD.
    obj=(1/2.625)*log(((1/2.625)/x1)) + &
        (1/3.250)*log(((1/3.250)/x2)) + &
        (1/5.500)*log(((1/5.500)/x3)) + &
        (1/6.000)*log(((1/6.000)/x4)) + &
        (1/21.000)*log(((1/21.000)/x5))
  End Equations
End Model

and the output - APM-Gekko_TWPD_Output.json - should be, as follows:

{
  "time" : [0.00],
  "apmonitorgekkoequus-kldhaighexample.x1" : [   2.8162476473E-01],
  "apmonitorgekkoequus-kldhaighexample.x2" : [   2.2746615613E-01],
  "apmonitorgekkoequus-kldhaighexample.x3" : [   1.9090908912E-01],
  "apmonitorgekkoequus-kldhaighexample.x4" : [   2.2999992603E-01],
  "apmonitorgekkoequus-kldhaighexample.x5" : [   7.0000063981E-02],
  "apmonitorgekkoequus-kldhaighexample.obj" : [   1.2714140699E-01],
  "apmonitorgekkoequus-kldhaighexample.slk_2" : [ 0.00              ],
  "apmonitorgekkoequus-kldhaighexample.slk_3" : [ 0.00              ]
}

which gives us exactly the same results as with Excel Solver!

As a sanity check. we can (once again) test the derived set of trader win probabilities against our constraints:

(28% + 23% + 19%) <= 70%;
((5.500 * 19%) - 1.000) >= 0.050 (rounding up);
MIN(28%, 23%, 19%, 23%, 7%) >= 7%.

Notes:

1. Round all probabilities to zero. Any additional precision is irrelevant.
2. Python script and APM files are a minimum set. You will have to install additional modules (e.g. GEKKO), as required.
3. Initial run of Python script may be quite slow (e.g 60 secs.). Subsequent runs should be approximately five to seven seconds.

In sum, we have expertly combined the wisdom of the crowds with our own limited insights to derive a coherent and valid set of trader win probabilities (TWPD)!

Monday, January 25, 2021

Trader Probabilities Derivation And K-L Divergence (Part 1)

For any sports event, we need to come up with a set of probabilities against which to match the current market prices. We do this in order to identify whether or not we have at least one positive expectation in that event. Obviously, the market itself reflects in its prices (and implied probabilities) the combined insights of all those who wish to trade on that specific event. Naturally, we have our own ideas as to the likely contenders but these views may be quite vague.

So, how do we create a coherent set of probabilities that does not stray too far from the implied market probabilities while taking into account our own insights?

To illustrate our approach, let us assume that we have identified a horse-race with five runners that meets our WCMI trading threshold.

In reality, we never know the true win probabilities of individual horses. However, we almost always have some opinions to work with. For example:
* 70% chance that winner will come from one of three horses - Alpha, Bravo, or Charlie;
* 5% edge on implied market probability for Charlie, and
* All horses have at least 7% chance of winning.

First, anchoring our initial range [$H8:$H12] (set as starting values) to the set of implied market probabilities, we can use Excel Solver and Kullback-Leibler Divergence calculations to search for the set of coherent win probabilities that minimizes the distance from the set of implied market probabilities while simultaneously satisfying our constraints.

As a sanity check. we can test the solved set of probabilities against our constraints:
* (28% + 23% + 19%) <= 70%;
* ((5.500 * 19%) - 1.000) >= 0.050 (rounding up);
* MIN(28%, 23%, 19%, 23%, 7%) >= 7%.

In sum, we have expertly combined the wisdom of the crowds with our own limited insights to produce a coherent and valid set of win probabilities!

Note that the market odds, trader probabilities, and constraints are not necessarily realistic!

Sunday, August 16, 2015

Doubling Rate Entropy And Kullback-Leibler Divergence

Cover and Thomas (2006) show that, in a horse race, a handicapper has an expected wealth growth-rate equal to that of an investor who wins every race minus a measure of uncertainty of the race and minus the difference in the win probability distribution used by the handicapper and the distribution of true win probabilities. Intuitively, this makes sense. To reduce the race uncertainty, we should focus on open betting markets with as few runners as possible. To reduce the win estimates difference, we should meld our betting line estimates with that of the crowd (Benter, 2004). In terms of a simplistic equation:

Long-Term Profit = Clairvoyance – Race Uncertainty – Market Divergence.

My experience is that most handicappers focus too much on trying to become clairvoyant and not enough on selecting open races and factoring in the “wisdom of crowds”.