Predictive Horse Race Handicapping Using Neural Networks Yet Another Progress Report - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Predictive Horse Race Handicapping Using Neural Networks Yet Another Progress Report

Description:

Finish position and lengths for each horse, compared between the two networks' results ... Learning To Predict the results Of sporting matches, Michael Baulch, http: ... – PowerPoint PPT presentation

Number of Views:671
Avg rating:5.0/5.0
Slides: 20
Provided by: starbase
Category:

less

Transcript and Presenter's Notes

Title: Predictive Horse Race Handicapping Using Neural Networks Yet Another Progress Report


1
Predictive Horse Race Handicapping Using Neural
NetworksYet Another Progress Report
  • Andrew Schurr
  • Advisor Ralph Morelli

2
Overall Goal
  • Use a neural network, trained with past
    performance data, to predict future horse races.

3
Picking Horses
  • The outcome of a horse race is determined by many
    factors
  • If we know what factors are important, we can
    predict which horse is favored to win
  • This raw information is available as past
    performance data
  • Factors
  • Previous race times
  • Post position
  • Jockey win/loss record
  • Previous stakes
  • Type of runner
  • Breeding
  • ect

4
with Neural Networks
  • By using a computerized neural network, we can
    sift through a large amount of computerized
    past-performance data, and identify which factors
    influence a winning horse
  • Neural network
  • Interconnected nodes that fire when stimulated
  • Can learn through training, gradually adjusting
    the level at which they fire
  • Good at figuring out how different variables
    influence each other

5
Topology
Inputs
Output
Previous race times Post position Jockey win/loss
record Previous stakes Type of runner Breeding ect

Relative fitness of the horse
6
Current Progress
- Completed -
- Pending -
  • Find flexible java library for building Neural
    Networks
  • Build prototype that mimics small-scale
    handicapping problem
  • Locate a large set of digital past performance
    data
  • Add small set of actual data to prototype
  • Clean and format full set of horse data, add to
    prototype
  • Use genetic algorithms to determine optimal
    neural network configuration
  • Too time consuming to be effective! Select blocks
    of data based on accepted handicapping knowledge,
    and combine the results
  • Build user interface and file loading
    capabilities on top of the fully-trained network

7
Initial Trial with Limited Real Data
  • Input data
  • 12 horses per race
  • Positions and fractional lengths for each horses
    two previous races
  • Speed rating for each horse
  • Output data
  • Finish position and lengths for each horse
  • Training
  • 21 training races
  • 4 test races
  • 20,000 training cycles

8
Initial Trial Performance
  • Training
  • Training error 17-18
  • Represents numerical error, not error in picking
    winners
  • Prediction
  • 0 correct winners picked
  • 42 of all horses (1st, 2nd, 3rd place) correctly
    predicted to show
  • but none in their proper positions

9
Target Accuracy
  • Odds of randomly selecting a winning horse in a
    12-horse race 8
  • But Each trial may have as many as 12 horses or
    as few as 5, padded out with null horses that
    should always lose.
  • Randomly selecting winner from 5 horses 20
  • A reasonable random accuracy should reflect the
    variation in race size.
  • About 10 to 13

10
Full Data Set Trial
  • Input data
  • 12 horses per race
  • All available data in Race, Entry, Horse Past
    Performance data files
  • 2,749 total inputs, 228 variables per horse, 13
    race-specific variables
  • Output data
  • Finish position and lengths for each horse
  • Training
  • 400 training races
  • 50 test races
  • 2,000 training cycles (about 15 hours)

11
Full Data Set Trial Performance
  • Training
  • Training error 1,700
  • Unable to find any kind of solution approaching a
    real answer
  • Prediction
  • Completely random
  • Useless!

12
Paired-down Data Set Trial
  • Input data
  • 12 horses per race
  • 2 Separate Neural Networks
  • Network 1 Horse Past Performance times, lengths,
    finish positions, odds, and speed ratings for
    past 3 races
  • Network 2 Horse win/loss record on various
    surfaces, jockey win/loss, trainer win/loss,
    various other pieces of data
  • Output data
  • Finish position and lengths for each horse,
    compared between the two networks results
  • Training
  • 400 training races
  • 50 test races
  • 200 training cycles (about 1 hour)

13
Paired-down Data Set Performance
  • Training
  • Training error - Net1 21 Net2 18
  • Again, relates to mathematical error, not winners
    picked
  • Prediction
  • Net 1
  • Winners picked 11/50, 22
  • 1st Horse Win/Place/Show 29/50, 58
  • Clear winners predicted 7/18, 39
  • Net 2
  • Winners picked 11/50, 22
  • 1st Horse Win/Place/Show 27/50, 54
  • Clear winners predicted 6/19, 32

14
Whats a clear winner ?
  • The network is predicting a mathematical guess as
    to what place the horse will finish in, not an
    actual place number
  • So, a horse could be predicted to finish in
    0.45765 place, not 1st place
  • Take three horses
  • Horse 1 predicted finish 0.2155 1st place
  • Horse 2 predicted finish 1.6020 2nd place
  • Horse 3 predicted finish 2.1985 3rd place
  • There is a large gap between first and second
    place, and a much smaller gap between second and
    3rd. So, we interpret this to mean that the
    network believes Horse 1 will win by a large
    margin, followed by horses 2 and 3 finishing very
    close to each other.
  • Hence, Horse 1 is a clear winner

15
Combined Performance
  • Results taken together
  • Both networks agreed on correct winner
  • 4/13, 31
  • Both networks agreed on correct winner, and both
    predict it to be a clear winner
  • 2/4, 50

16
Analysis
  • 50 is pretty good, right?
  • But wait the combined networks produced only 4
    such predictions.
  • 4 trials is far too small a sample size to base
    any kind of long-term projection on.
  • What if the random sequence of correct
    predictions for a larger sample went like this
  • Right, Wrong, Wrong, Right, Wrong, Wrong, Wrong,
    Wrong, Right, Wrong, Wrong, Wrong
  • If we looked at only the first four, our
    prediction rate is 50, but if we look at a
    larger sample, it falls to 25
  • Conclusion Its encouraging, but we need more
    sample trials!

17
To Do
  • Incorporate more of the data, possibly in more
    separate networks
  • Over half of the original data was culled between
    the second and third trials. Some of it could
    still be valuable.
  • Run networks on more learning epochs
  • Currently at 200, see if 2,000 or 20,000 training
    epochs increases accuracy
  • Find way to test 50 combined figure on a larger
    sample set.
  • Buy more data
  • Cycle the data, so that different trials are in
    training and test sets

18
Questions?
19
Sources
  • Neural Networks for Fun and Profit, Bret Halford,
    http//csel.cs.colorado.edu/cs3202/papers/Bret_Ha
    lford.html
  • An introduction to neural networks, Andrew Blais
    and David Mertz, IBM developerWorks,
    http//www-106.ibm.com/developerworks/library/l-ne
    ural/
  • Expert Prediction, Symbolic Learning, and Neural
    Networks An Experiment on Greyhound Racing, H.
    Chen, P. Buntin, L. She, S. Sutjahjo, C. Sommer,
    D. Neely, http//ai.bpa.arizona.edu/papers/dog93/d
    og93.html
  • Using Machine Learning To Predict the results Of
    sporting matches, Michael Baulch,
    http//innovexpo.itee.uq.edu.au/2001/projects/s348
    234/thesis.pdf
  • Albrecht, http//www.teco.uni-karlsruhe.de/albrec
    ht/neuro/html/
  • Handicappers Daily, ITS Inc., http//www.itsdata.
    com/
  • Betting Thoroughbreds, Steven Davidowitz, First
    Plume Printing, 1997
Write a Comment
User Comments (0)
About PowerShow.com