Monday, February 22, 2021

Trader Probabilities Derivation And K-L Divergence (Part 3)

Returning one last time to the trader win probabilities derivation (TWPD) question Trader Probabilities Derivation And K-L Divergence (Part 1) and Trader Probabilities Derivation And K-L Divergence (Part 2):

How do we create a coherent set of trader win probabilities that does not stray too far from the implied win market probabilities while taking into account our own limited and possibly vague insights?

On this occasion, we will focus on soccer and add a late-breaking injury report from social media to our own 'gut instinct' about the likely outcome.

As before, we are using our 'no-frills' python script to access APMonitor-GEKKO:

import json
import os
import shutil

from gekko import GEKKO

d_path = os.path.dirname(os.path.realpath(__file__))
file = open(d_path + '/apm')
model = file.read()
file.close()

m = GEKKO(remote=False)

m.Raw(model)

m.solve(disp=False)

print('Objective: ', m.options.OBJFCNVAL)

shutil.copy(m.path+'/Results.json', d_path + '/json')

with open(m.path+'/Results.json') as f:
    results = json.load(f)
print(results)

with the input - apm:

Model 20210223_Follis_LeagueMatch

  Variables

    ! --- Public market information ---

    ! Trader win probabilities.
    ! Initial values set to market price implied probabilities.
    ! Lower-bound set to 17% - half the natural-odds implied 
    ! probability (1/3).
    ! Upper-bound set to 100%.
    home = 1.00/4.20, >=0.17, <=1.00
    away = 1.00/3.60, >=0.17, <=1.00
    draw = 1.00/1.98, >=0.17, <=1.00

    kld

  End Variables

  Equations
  
    ! Total trader win probabilities must sum to one.
    home + &
    away + &
    draw = 1.00
    

    ! --- Private trader opinions ---

    ! Gut instinct.
    draw >= 0.50
    draw <= 0.60

    ! Late Twitter away-team injury report.
    home >= 0.25
    home <= 0.30


    ! --- Combination of public and private information ---

    ! Minimize K-LD.
    kld=(1.00/4.20)*log(((1.00/4.20)/home)) + &
        (1.00/3.60)*log(((1.00/3.60)/away)) + &
        (1.00/1.98)*log(((1.00/1.98)/draw))

  End Equations

End Model

and the generated output - json:

{
  "home" : [   2.7046072411E-01],
  "away" : [   2.0172750176E-01],
  "draw" : [   5.2781177413E-01],

  "kld" : [   3.6252143687E-02]
}

Once again, as a sanity check. we can test the derived set of trader win probabilities against our constraints:

min(home, away, draw) >= 17% [0.20173];
0.25 <= home <= 0.30 [0.27046];
0.50 <= draw <= 0.60 [0.52781].

In sum, we have expertly combined the wisdom of the crowds with our own limited insights to derive a coherent and valid set of trader win probabilities (TWPD) Historically, professional traders have 'melded' (e.g. Bill Benter) their own odds-line with the win-market prices to gain a more informed opinion. We have revised that process by baselining on the win-market prices and adding a few constraints where we believe we have additional information (e.g. late-breaking injury or weather reports from social media) while relying on our automatic method to generate the updated trader win probabilities and, of course, indirectly our own odds-line:

Outcome  Private   Public
Home:    3.70      4.20
Away:    4.95      3.60
Draw:    1.89      1.98

For completion here is an updated horse-racing model template file:

Model 20210223_14-30_RacingPark

  Variables

    ! --- Public market information ---

    ! Trader win probabilities.
    ! Initial values set to market price implied 
    ! probabilities.
    ! Lower-bound set to 6% - half the natural-odds 
    ! implied probability (1/8).
    alpha   = 1/25.00, >=0.06, <=1.00
    bravo   = 1/34.00, >=0.06, <=1.00
    charlie = 1/13.50, >=0.06, <=1.00
    delta   =  1/2.78, >=0.06, <=1.00
    echo    =  1/3.60, >=0.06, <=1.00
    foxtrot = 1/17.00, >=0.06, <=1.00
    golf    =  1/8.80, >=0.06, <=1.00
    hotel   = 1/17.50, >=0.06, <=1.00

    kld

  End Variables

  Equations
  
    ! Total trader win probabilities must sum to one.
    alpha + &
    bravo + &
    charlie + &
    delta + &
    echo + &
    foxtrot + &
    golf + &
    hotel = 1.00
    

    ! --- Private trader opinions ---

    ! Half the field have a better chance.
    (alpha + echo + foxtrot + hotel) > (bravo + charlie + delta + golf)

    ! Two horses have between 30% and 40% combined win probability.
    echo + foxtrot >= 0.30
    echo + foxtrot <= 0.40
    
    ! Between 3% and 7% edge on one horse.
    (17.00 * foxtrot) - 1.00 >= 0.03
    (17.00 * foxtrot) - 1.00 <= 0.07


    ! --- Combination of public and private information ---

    ! Minimize K-LD.
    kld=(1/25.00)*log(((1/25.00)/alpha)) + &
        (1/34.00)*log(((1/34.00)/bravo)) + &
        (1/13.50)*log(((1/13.50)/charlie)) + &
        (1/2.78)*log(((1/2.78)/delta)) + &
        (1/3.60)*log(((1/3.60)/echo)) + &
        (1/17.00)*log(((1/17.00)/foxtrot)) + &
        (1/8.80)*log(((1/8.80)/golf)) + &
        (1/17.50)*log(((1/17.50)/hotel))

  End Equations

End Model

Tuesday, February 16, 2021

Trader Probabilities Derivation And K-L Divergence (Part 2)

Returning to the trader win probabilities derivation (TWPD) question we asked last time in Trader Probabilities Derivation And K-L Divergence (Part 1):

How do we derive a coherent and valid distribution of trader win probabilities that deviates as little as possible from the implied win market probabilities distribution while taking into account our own limited and possibly vague insights?

To illustrate our approach, let us assume (once again) that we have identified a horse-race with five runners that meets our WCMI trading threshold.

As previously observed, we never know the true win probabilities of individual horses. However, we almost always have some opinions to work with. For example:

70% chance that winner will come from one of three horses - Alpha, Bravo, or Charlie;
5% edge on implied win market probability for Charlie, and
All horses have at least 7% chance of winning.

Given the implied win market probability distribution $P$ and the trader win probability distribution $Q$ on the countable set $X = \{x1, x2,...\}$ of horses in a specific race with $P_i = P(x_i)$ and $Q_i = Q(x_i)$, the Kullback-Leibler Divergence (K-LD) is defined as $$ D_{KL} (P||Q) = \sum_{x \in X} P(x)log(\frac {P(x)}{Q(x)}) $$ and is the metric we wish to minimize. In so doing, we guarantee that the derived distribution $Q$ will be as close as possible to the original distribution $P$.

For those readers who would like an alternative to the Excel solution, we can strongly recommend the excellent APMonitor-GEKKO optimization suite. Using the following 'no-frills' Python script and APM model file, we can derive the same set of trader win probabilities by minimizing the K-LD from the impied win market probabilities distribution while meeting our additional constraints.

import json
import os
import shutil

from gekko import GEKKO

file = open('APM-Gekko_TWPD_Input.apm')
model = file.read()
file.close()

m = GEKKO(remote=False)

m.Raw(model)

m.solve(disp=False)

print('Objective: ', m.options.OBJFCNVAL)

dir_path = os.path.dirname(os.path.realpath(__file__))
shutil.copy(m.path+'/Results.json', dir_path + '/APM-Gekko_TWPD_Output.json')

with open(m.path+'/Results.json') as f:
    results = json.load(f)
print(results)

Model APM-Gekko TWPD Input
  Variables
    ! Trader win probabilities.
    ! Initial values set to win market implied probabilities.
    ! Lower-bound set to 7%.
    ! Upper-bound set to 100%.
    x1 = 1/2.625, >=0.07, <=1.00
    x2 = 1/3.250, >=0.07, <=1.00
    x3 = 1/5.500, >=0.07, <=1.00
    x4 = 1/6.000, >=0.07, <=1.00
    x5 = 1/21.000, >=0.07, <=1.00
    obj
  End Variables
  Equations
    ! Total trader win probabilities must sum to one.
    x1 + x2 + x3 + x4 + x5 = 1.00
    ! First three horses have approximately 70% combined win probability.
    x1 + x2 + x3 <= 0.70
    ! Assume 5% edge on third horse.
    (5.500 * x3) - 1.00 >= 0.05
    ! Minimize K-LD.
    obj=(1/2.625)*log(((1/2.625)/x1)) + &
        (1/3.250)*log(((1/3.250)/x2)) + &
        (1/5.500)*log(((1/5.500)/x3)) + &
        (1/6.000)*log(((1/6.000)/x4)) + &
        (1/21.000)*log(((1/21.000)/x5))
  End Equations
End Model

and the output - APM-Gekko_TWPD_Output.json - should be, as follows:

{
  "time" : [0.00],
  "apmonitorgekkoequus-kldhaighexample.x1" : [   2.8162476473E-01],
  "apmonitorgekkoequus-kldhaighexample.x2" : [   2.2746615613E-01],
  "apmonitorgekkoequus-kldhaighexample.x3" : [   1.9090908912E-01],
  "apmonitorgekkoequus-kldhaighexample.x4" : [   2.2999992603E-01],
  "apmonitorgekkoequus-kldhaighexample.x5" : [   7.0000063981E-02],
  "apmonitorgekkoequus-kldhaighexample.obj" : [   1.2714140699E-01],
  "apmonitorgekkoequus-kldhaighexample.slk_2" : [ 0.00              ],
  "apmonitorgekkoequus-kldhaighexample.slk_3" : [ 0.00              ]
}

which gives us exactly the same results as with Excel Solver!

As a sanity check. we can (once again) test the derived set of trader win probabilities against our constraints:

(28% + 23% + 19%) <= 70%;
((5.500 * 19%) - 1.000) >= 0.050 (rounding up);
MIN(28%, 23%, 19%, 23%, 7%) >= 7%.

Notes:

1. Round all probabilities to zero. Any additional precision is irrelevant.
2. Python script and APM files are a minimum set. You will have to install additional modules (e.g. GEKKO), as required.
3. Initial run of Python script may be quite slow (e.g 60 secs.). Subsequent runs should be approximately five to seven seconds.

In sum, we have expertly combined the wisdom of the crowds with our own limited insights to derive a coherent and valid set of trader win probabilities (TWPD)!

Monday, January 25, 2021

Trader Probabilities Derivation And K-L Divergence (Part 1)

For any sports event, we need to come up with a set of probabilities against which to match the current market prices. We do this in order to identify whether or not we have at least one positive expectation in that event. Obviously, the market itself reflects in its prices (and implied probabilities) the combined insights of all those who wish to trade on that specific event. Naturally, we have our own ideas as to the likely contenders but these views may be quite vague.

So, how do we create a coherent set of probabilities that does not stray too far from the implied market probabilities while taking into account our own insights?

To illustrate our approach, let us assume that we have identified a horse-race with five runners that meets our WCMI trading threshold.

In reality, we never know the true win probabilities of individual horses. However, we almost always have some opinions to work with. For example:
* 70% chance that winner will come from one of three horses - Alpha, Bravo, or Charlie;
* 5% edge on implied market probability for Charlie, and
* All horses have at least 7% chance of winning.

First, anchoring our initial range [$H8:$H12] (set as starting values) to the set of implied market probabilities, we can use Excel Solver and Kullback-Leibler Divergence calculations to search for the set of coherent win probabilities that minimizes the distance from the set of implied market probabilities while simultaneously satisfying our constraints.

As a sanity check. we can test the solved set of probabilities against our constraints:
* (28% + 23% + 19%) <= 70%;
* ((5.500 * 19%) - 1.000) >= 0.050 (rounding up);
* MIN(28%, 23%, 19%, 23%, 7%) >= 7%.

In sum, we have expertly combined the wisdom of the crowds with our own limited insights to produce a coherent and valid set of win probabilities!

Note that the market odds, trader probabilities, and constraints are not necessarily realistic!

Wednesday, December 23, 2020

Total Less Than Sum Of Parts

Total Less Than Sum Of Parts - Simultaneous Events

For Nx(AvB) ('multiple events, single selection') scenarios, such as simultaneous, Sunday, NFL games, we cannot just stake them as N separate events as this could potentially involve tying up a large portion of our bankroll.

Let us assume that we have lucked out and three 'home-dogs' are offered at unbelievable odds, as follows:

Treating them as simultaneous events gives us a total stake of 43% (approx.) of bankroll at Full-Kelly, but treating them as three independent events would require a total outlay of 80% (approx.).

Note that all stakes were calculated using the excellent SBR Kelly Calculator. Always keep in mind the specific advice given in Kelly's Multiple Personality Disorder, which outlines the differing incarnations of Kelly Staking depending on the context! Win percentages and odds in the above example are not necessarily realistic for these types of events. That said, last weekend, Betfair offered moneyline odds of 15.00 against the New York Jets winning away to the Los Angeles Rams for amounts any 'Weekend-Warrior' would have happily staked - assuming they estimated the Jets had better than a 7% (approx.) chance of winning!

In sum, if you are trading simultaneous events then the total stake should be less than the sum of the individual single stakes!

Thursday, November 26, 2020

Some AvB Events Are AvK Events In Disguise

Many AvB contests (MLB, NBA, and NFL moneyline markets) are exactly what they appear to be - simple win-lose events. But, other AvB contests (soccer win markets) are actually AvK contests in disguise - there are three valid outcomes (win, lose, and draw). This turns a

'single event, single selection' contest into a possible 'single event, multiple selections' one - Kelly's Multiple Personality Disorder.

Treating this soccer match as a single event with three exclusive outcomes leads to an combined investment of 5.93% of bankroll on both the draw and away-win outcomes.

Alternatively, focusing on one or both draw and home-win outcomes as separate selections leads to an investment of 3.75% on the draw outcome and 1.00% on the away-win outcome for a total of 4.75%.

Given your assumed edge relative to the market, this amounts to 'leaving money on the table', in Kelly terms!

All calculations can be replicated using the excellent SBR Calculator.Win percentages are not necessarily realistic for this specific event.

Thursday, October 29, 2020

Shannon-Fano_Crowd-Handicapping.md

Shannon-Fano Crowd Handicapping

Let us indulge ourselves in a thought experiment on how we might handicap the Crowd!

We have access to a betting-line (Betfair) for a graded-stakes race - with a low WCMI - as well as some simple, publicly-available data. Can we reverse-engineer what the Crowd is most likely factoring into its calculation?

Granted this is no Schrödinger's Cat, but nevertheless it might enlighten us as to whether or not the betting-line is vulnerable?

As outlined in Handicapping Twenty Questions Benford's Law And Shannon Entropy, taking our lead from Shannon-Fano Coding, we should iteratively divide the entrants into two approximately equal groups of win probabilities (i.e. 50%) and use Pairwise Comparison to eliminate the non-contenders using at most four questions.".

When the sub-divisions produced by the splits are approximately equal (50%) in terms of implied probability, then the one bit of information (question) used to distinguish them is maximally efficient. So, using the implied probability (I/P) of the betting-line odds (B/X) as our starting point, we can make an initial split into two groups:

Alpha, Bravo; and
Charlie, Delta, Echo, Foxtrot, Golf, Hotel.

Keeping our interpretation as simple as possible, it looks like the initial division is based on speed ratings. Then, in deciding between Alpha and Bravo, trainer rating appears to clinch it.

The next sub-division is:

Charlie, Delta; and
Echo, Foxtrot, Golf, Hotel.

Here, form ratings are the most likely rationale for the split with trainer rating again deciding the rank order within the group.

Next, we split the four remaining horses:

Echo, Foxtrot; and
Golf, Hotel.

Weight (proxy for fillies allowance) is the deciding factor here but it is not possible to easily account for the final, rank orders within these two groups. We have reached the limits of our simplistic approach. Obviously, the betting-line accounts for more factors than used by our naive approach. But, just because our model is wrong does not mean it is not useful!

In summary, speed ratings appear to be the primary driving factor for the betting-line with trainer rating as the qualifier. Given the likely high correlation between speed and form ratings, the fact that both are used suggests an element of double-counting by the Crowd and, consequently, may indicate a vulnerable betting-line.

Is Schrödinger's cat alive, dead or both?

Wednesday, September 30, 2020

Variable Weights - Entropy Method

In horse-racing, using fundamental handicapping, we try to derive predictor variables from mining past-performance data. As ever, our starting-point is to ask Bill Benter's fundamental question of handicapping:

What additional variables (if any) explain a significant proportion of the variance in results to date that is not already accounted for by the public odds (Wisdom of Crowds)?

Assuming that we have already identified a number of such variables that appear to influence the outcome of races, how do we weight those variables? Do we weight them separately for different codes (Flat, Jumps), different types (Maiden, Handicap), or different distances (Sprints, Routes) of races?

Obviously, we could use some form of Regression Analysis to derive the necessary weights but, perhaps, a simpler option presents itself! In the Multiple Criteria Decision Analysis process TOPSIS, the Entropy Weight method is used to objectively derive criteria (variables) weights based on the dispersion of scores across the alternatives being analysed. Translating into a handicapping scenario, the underlying assumption of this method is that the greater the difference in scores for contestants across multiple criteria, the greater the difference in predicted outcome for some future event! In other words, we are operationalizing the belief that it is the differences between horses on some key variables and not their similarities (or the differences between race codes, types, distances, and so on) that best determines the winner. Also, all races generate their own unique set of weights and there can be a mixture of positive (1,3,4,5,6) and negative (2,7) weights.

This method has some limitations (particularly relating to scores of zero and entropy values close to one). A number of solutions have been recommended to resolve these issues and the following approach shows promise - New Entropy Weight-Based TOPSIS for Evaluation of Multi-objective Job-Shop Scheduling Solutions.

Friday, July 24, 2020

WASP Trainers

WASP_Trainers.md

The single most important influence on a horse's performance is the trainer. Current stable form can help explain some of the strange race results we observe daily. To that end, WASP (Winners Above Starting Price), can help us stay current with how in-form trainers are on a rolling weekly, fortnightly, or monthly basis.

Using historical database of past performances, calculate wins per starting-price using Juvenile Finish Position Ratings algorithm.
For each yard:
- For each race in past performances during period
  - Calculate number of actual wins using Juvenile Finish Position Ratings algorithm.
  - Calculate number of expected wins using wins per starting-price.
- Sum over all races and calculate both actual and expected win percentages.
- Subtract expected win percentage from actual win percentage to calculate WASP.

Note: Numbers for illustrative purposes only.

The calculated difference tells us (if positive) that the stable is ahead of the market in terms of percentage of opponents beaten or conversely (if negative) that the stable is behind the market.

Thus, WASP informs us of both current stable form and stable value in a single number.

Thursday, June 25, 2020

Longshot Stakes: Probability Or Edge

Notwithstanding the specifc advice outlined in Kelly's Multiple Personality Disorder and Kelly And Mutually-Exclusive Outcomes relating to AvK events, consider an idealized horse-racing scenario where you have identified two selections: High Expectations at 2/1 with a 40% win probability and In With A Chance at 20/1 and a 10% chance of winning. Assume further that you are planning to bet ¤50 (Bankroll: ¤500) on High Expectations. How much should you bet on In With A Chance?

Win Probability Stakes

Selection	S/P	Win%	Edge	Stake	Profit
High Expectations	2/1	40%	0.20	¤50.00	¤100.00
In With A Chance	20/1	10%	1.10	¤12.50	¤250.00

Edge Stakes

Selection	S/P	Win%	Edge	Stake	Profit
High Expectations	2/1	40%	0.20	¤50.00	¤100.00
In With A Chance	20/1	10%	1.10	¤27.50	¤550.00

If your answer is ¤12.50, then your handicapping is driven by win probability as High Expectations (40%) is four times more likely to win than In With A Chance (10%). Alternatively, if your answer is ¤27.50, then your handicapping is driven by edge as In With A Chance (1.10) has 5.5 times more edge than High Expectations (0.20).

The Kelly Criterion advises that you choose the stake so that the amount you win is proportional to your edge. Most punters choose stakes based on win probability and, as a result, they are not exploiting their advantage and are 'leaving money on the table'!