Tuesday, February 16, 2021

Trader Probabilities Derivation And K-L Divergence (Part 2)

Trader Probabilities Derivation And K-L Divergence (Part 2)

Returning to the trader win probabilities derivation (TWPD) question we asked last time in Trader Probabilities Derivation And K-L Divergence (Part 1):

How do we derive a coherent and valid distribution of trader win probabilities that deviates as little as possible from the implied win market probabilities distribution while taking into account our own limited and possibly vague insights?

To illustrate our approach, let us assume (once again) that we have identified a horse-race with five runners that meets our WCMI trading threshold.

As previously observed, we never know the true win probabilities of individual horses. However, we almost always have some opinions to work with. For example:

  • 70% chance that winner will come from one of three horses - Alpha, Bravo, or Charlie;
  • 5% edge on implied win market probability for Charlie, and
  • All horses have at least 7% chance of winning.
Given the implied win market probability distribution \(P\) and the trader win probability distribution \(Q\) on the countable set \(X = \{x1, x2,...\}\) of horses in a specific race with \(P_i = P(x_i)\) and \(Q_i = Q(x_i)\), the Kullback-Leibler Divergence (K-LD) is defined as $$ D_{KL} (P||Q) = \sum_{x \in X} P(x)log(\frac {P(x)}{Q(x)}) $$ and is the metric we wish to minimize. In so doing, we guarantee that the derived distribution \(Q\) will be as close as possible to the original distribution \(P\).

For those readers who would like an alternative to the Excel solution, we can strongly recommend the excellent APMonitor-GEKKO optimization suite. Using the following 'no-frills' Python script and APM model file, we can derive the same set of trader win probabilities by minimizing the K-LD from the impied win market probabilities distribution while meeting our additional constraints.

import json import os import shutil from gekko import GEKKO file = open('APM-Gekko_TWPD_Input.apm') model = file.read() file.close() m = GEKKO(remote=False) m.Raw(model) m.solve(disp=False) print('Objective: ', m.options.OBJFCNVAL) dir_path = os.path.dirname(os.path.realpath(__file__)) shutil.copy(m.path+'/Results.json', dir_path + '/APM-Gekko_TWPD_Output.json') with open(m.path+'/Results.json') as f: results = json.load(f) print(results)
Model APM-Gekko TWPD Input Variables ! Trader win probabilities. ! Initial values set to win market implied probabilities. ! Lower-bound set to 7%. ! Upper-bound set to 100%. x1 = 1/2.625, >=0.07, <=1.00 x2 = 1/3.250, >=0.07, <=1.00 x3 = 1/5.500, >=0.07, <=1.00 x4 = 1/6.000, >=0.07, <=1.00 x5 = 1/21.000, >=0.07, <=1.00 obj End Variables Equations ! Total trader win probabilities must sum to one. x1 + x2 + x3 + x4 + x5 = 1.00 ! First three horses have approximately 70% combined win probability. x1 + x2 + x3 <= 0.70 ! Assume 5% edge on third horse. (5.500 * x3) - 1.00 >= 0.05 ! Minimize K-LD. obj=(1/2.625)*log(((1/2.625)/x1)) + & (1/3.250)*log(((1/3.250)/x2)) + & (1/5.500)*log(((1/5.500)/x3)) + & (1/6.000)*log(((1/6.000)/x4)) + & (1/21.000)*log(((1/21.000)/x5)) End Equations End Model

and the output - APM-Gekko_TWPD_Output.json - should be, as follows:

{ "time" : [0.00], "apmonitorgekkoequus-kldhaighexample.x1" : [ 2.8162476473E-01], "apmonitorgekkoequus-kldhaighexample.x2" : [ 2.2746615613E-01], "apmonitorgekkoequus-kldhaighexample.x3" : [ 1.9090908912E-01], "apmonitorgekkoequus-kldhaighexample.x4" : [ 2.2999992603E-01], "apmonitorgekkoequus-kldhaighexample.x5" : [ 7.0000063981E-02], "apmonitorgekkoequus-kldhaighexample.obj" : [ 1.2714140699E-01], "apmonitorgekkoequus-kldhaighexample.slk_2" : [ 0.00 ], "apmonitorgekkoequus-kldhaighexample.slk_3" : [ 0.00 ] }

which gives us exactly the same results as with Excel Solver!

As a sanity check. we can (once again) test the derived set of trader win probabilities against our constraints:

  • (28% + 23% + 19%) <= 70%;
  • ((5.500 * 19%) - 1.000) >= 0.050 (rounding up);
  • MIN(28%, 23%, 19%, 23%, 7%) >= 7%.

Notes:

1. Round all probabilities to zero. Any additional precision is irrelevant.
2. Python script and APM files are a minimum set. You will have to install additional modules (e.g. GEKKO), as required.
3. Initial run of Python script may be quite slow (e.g 60 secs.). Subsequent runs should be approximately five to seven seconds.

In sum, we have expertly combined the wisdom of the crowds with our own limited insights to derive a coherent and valid set of trader win probabilities (TWPD)!