Thursday, May 31, 2018

CsvPredictor: Turns Historical Record Into Mini Prediction System

CsvPredictor turns a historical record in CSV format into a mini prediction system. The program is completely agnostic with respect to the domain knowledge captured in the file (e.g. weather conditions, successful movies, past performances). 
Running CsvPredictor.exe with a valid csv file will result in a QnA session based on the salience of the features (columns), effectively, turning a standard flat file into a data mining classification tree
For example, whether or not to play ball given current weather conditions:
    C:\CsvPredictor>CsvPredictor.exe PlayBall.csv
    CsvPredictor v2.41
    Input File: "PlayBall.csv" (14 records and 4 features)
    Top Features (Salience)
    Outlook   0.46176
    Humidity  0.36618
    Wind      0.11693
    Q. Is Outlook  =  ["Overcast"; "Rainy"; "Sunny"]?  Sunny
    Q. Is Humidity =  ["High"; "Normal"]?  Normal
    A. Predict: PlayBall = True
    C:\CsvPredictor>
or checking the likelihood of a new movie being a blockbuster!
    C:\CsvPredictor>CsvPredictor.exe Movies.csv
    CsvPredictor v2.41
    Input File: "Movies.csv" (2690 records and 5 features)
    Top Features (Salience)
    Budget              0.34871
    Genre               0.26719
    Production Country  0.24084
    Runtime             0.11430
    Q. Is Budget =  ["<=15000000.00"; "<=44263333.33"; "<=380000000.00"]? 
                    <=15000000.00
    Q. Is Genre =  ["Action"; "Adventure"; "Animation"; "Comedy"; "Crime"; 
                    "Documentary"; "Drama"; "Family"; "Fantasy"; "Foreign"; 
                    "History"; "Horror"; "Music"; "Mystery"; "Romance"; 
                    "Science Fiction"; "Thriller"; "War"; "Western"]?  
                    Action
    Q. Is Production Country =  ["Australia"; "Canada"; "Hong Kong"; 
                                 "Ireland"; "United Kingdom";
                                 "United States of America"]?  
                                 United States of America
    Q. Is Runtime =  ["<=99.47"; "<=115.00"; "<=248.00"]?  <=115.00
    Q. Is Release Month =  ["<=5.00"; "<=9.00"; "<=12.00"]?  <=12.00
    A. Predict: Success = True
    C:\CsvPredictor>
or checking the past performances of a late closer in an upcoming sprint race to see if current racing conditions are favorable!
    C:\CsvPredictor>CsvPredictor.exe LateCloser_PPs.csv
    CsvPredictor v2.41
    Input File: "LateCloser_PPs.csv" (50 records and 7 features)
    Top Features (Salience)
    PostPosition  0.29615
    RaceDistance  0.26223
    FieldSize     0.13567
    PrizeMoney    0.12169
    Q. Is PostPosition =  ["<=6.00"; "<=10.00"; "<=22.00"]?  <=6.00
    Q. Is RaceDistance =  ["5f"; "6f"]?  5f
    Q. Is PrizeMoney   =  ["<=45412.80"; "<=166658.00"; "<=821262.00"]?  <=45412.80
    Q. Is DaysSinceRun =  ["<=24.00"; "<=41.20"; "<=218.00"]?  <=218.00
    A. Predict: InTheMoney = True
    C:\CsvPredictor>
or checking the past performances of previous runners of the heritage handicap sprint race on Epsom Derby Day to identify possible longshots.
    C:\CsvPredictor>CsvPredictor.exe Epsom_Dash_Results.csv
    CsvPredictor v2.41
    Input File: "Epsom_Dash_Results.csv" (260 records and 9 features)
    Top Features (Salience)
    SP      0.39906
    Weight  0.17504
    DSR     0.11722
    RPR     0.11484
    Q. Is SP      =  ["<=13.00"; "<=21.00"; "<=67.00"]?   <=67.00
    Q. Is Weight  =  ["<=120.00"; "<=127.00"; "<=140.00"]?  <=127.00
    Q. Is DSR     =  ["<=14.00"; "<=24.00"; "<=289.00"]?  <=289.00
    Q. Is Losses  =  ["<=3.00"; "<=5.00"; "<=12.00"]?  <=5.00
    A. Predict: ITM = True
    C:\CsvPredictor>
Note, it is very important to state that this program is only intended to provide an easy entry-point to data analytics for handicappers and is, in no way, intended to replace the advice and expertise of professional data analysts and statisticians!