Optimal Decision Making in Football MS - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Optimal Decision Making in Football MS

Description:

On fourth down with 50 time periods left and a tied score, the optimal strategy ... For each time period, sample 1,000 states according to a series of distributions ... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 22
Provided by: bradle99
Category:

less

Transcript and Presenter's Notes

Title: Optimal Decision Making in Football MS


1
Optimal Decision Making in FootballMSE 339
Project
Rick Johnston Brad Null Mark Peters
2
Presentation Overview
  • Project Objectives
  • Football Primer
  • Literature Review
  • Problem Formulation
  • Approximate Approaches
  • Conclusions

3
Project Objectives
  • Use dynamic programming techniques to answer two
    primary questions about decision-making in
    football.
  • What is the optimal policy to follow for deciding
    whether to run an offensive play, punt or kick a
    field goal at each possible situation that could
    arise in the course of a football game?
  • If you implemented such a policy, how much of a
    performance improvement would you realize when
    competing against an opponent playing a standard
    strategy?

4
Football Primer
  • Key rules
  • 2 Teams, 60 minute game (2 halves), highest score
    wins
  • Basic scoring plays Touchdown (7 points), Field
    Goal (3 points)
  • Field is 100 yards long
  • Advancing the ball 4 plays (downs) to gain 10
    yards
  • if successful, down will reset to 1st down
  • if unsuccessful, other team will gain possession
    of the ball
  • Teams have the option of punting the ball to the
    other team (typically reserved for 4th down)
    which gives the other team possession but in a
    worse position on the field
  • Teams can attempt to kick a field goal at any
    point
  • Common Strategies
  • Coaches typically rely on common rules of thumb
    to make these decisions
  • Motivating Situation
  • 4th down and 2 yards to go from the opponents 35
    yard line
  • Chance of successfully kicking field goal is 40
  • Chance of gaining 2 yards is 60
  • Expected punt distance would be 20 yards
  • Which is the right decision? And when?

5
Brief Literature Review
  • Sackrowitz (2000)
  • Refining the Point(s)-After-Touchdown Decision
  • Backwards induction (based on the number of
    possessions remaining) to find optimal policy
  • No quantitative assessment of the difference
    between optimal strategy and the decisions
    actually implemented by NFL coaches
  • Romer (2003)
  • Its Fourth Down and what does Bellmans
    Equation Say?
  • Uses play-by-play data for 3 years of NFL play to
    solve a simplified version of the problem to
    determine what to do on fourth down
  • Key assumption is that the decision is made in
    the first quarter
  • Results are that NFL coaches should generally go
    for the first down more frequently
  • Others
  • Carter and Machor (1978)
  • Bertsekas and Tsitiklis (1996)
  • Carroll, Palmer and Thorn (1998)

6
Problem Formulation
  • Model setup
  • Model one half of a game
  • Approximately 500,000 states. One for each
    combination of
  • Score differential
  • Team in possession of ball
  • Ball position on field
  • Down
  • Distance to go for first down
  • Time remaining
  • The half was modeled as consisting of 60 time
    periods (equivalent to 60 plays)
  • Reward value created for each state
  • represents the probability that team 1 will win
    the game
  • Transition probabilities
  • We estimated all probabilities required for the
    model
  • Solution approach
  • Backwards induction to find optimal decision at
    each state

7
Solution Technique
The below diagram illustrates how the decisions
of the two teams are determined at each state.
Team 1 (Optimal Policy)
Team 2 (Heuristic Policy)
Time t
Time t-1
Time t
Time t-1
States xr,t-1 Reward known
Run
State x Reward chosen as the maximum of the
expectations of the three actions
State x Reward chosen as the expectation of the
given action
States xr,t-1 Reward known
States xr,t-1 Reward known
Punt
Punt
Kick
States xr,t-1 Reward known
Heuristic policy will instruct team which action
to take given that they are in state x
8
Optimal vs. Heuristic
Under our assumptions, the use of the optimal
strategy increases the probability of winning by
about 6.5 across the range of likely starting
field positions.
Probability of Winning
Starting Field Position
Note Starting Position (time left 60, score
is even, 1th down, 10 yards to go)
9
Optimal vs. Heuristic
The longer a player has to implement the optimal
policy, the larger increase can be expected in
probability of winning.
Probability of Winning
Time Left
Note Starting position (1st down, 10 yards to
go, 50 yard line, score even)
10
Comparison of Play Selection
Overall, the optimal policy calls for the team to
be more aggressive than the heuristic in terms of
kicking substantially fewer field goals.
Optimal Policy
Heuristic Policy
4th Down Play Selection
Runs
325,113
258,327
Punts
386,457
354,945
Field Goals
73,500
171,798
Total
785,070
785,070
11
Results
4th Down Decisions
Lets compare the range of fourth down decisions
for a typical situation in the game. In this
instance, the score is tied and the time left is
50.
Optimal Policy
Heuristic Policy
Key
Run
Kick
Punt
Distance to First Down
Distance to First Down
Distance to TD
Distance to TD
12
Near Goal Results
On fourth down with 50 time periods left and a
tied score, the optimal strategy is more
aggressive when close to the goal line.
13
Model Limitations
Simplifying Restrictions
Possible Enhancements
  • Limited outcomes of running plays
  • All plays set to a duration of 20 seconds
  • Excluding kickoffs
  • Assume extra points
  • Which play to run
  • offensive
  • defensive
  • special teams
  • Probabilities conditional on specific teams or
    players
  • real-time applications

The model might be made significantly more
powerful by expanding the state space to a point
where backwards induction is extremely
cumbersome. This provides a motivation to
explore approximate DP approaches.
14
Approximate DP Approach
Estimate state reward values by finding a linear
combination of basis functions to calculate Q
values.
  • Estimating reward values
  • State sampling
  • For each time period, sample 1,000 states
    according to a series of distributions that
    should represent the most commonly reached states
    at certain points in an actual game
  • Outcome sampling
  • For each feasible action in each state, sample
    one possible outcome for each action and set the
    Q value corresponding to that action equal to the
    samples Q value
  • The states Q value is set to the maximum Q value
    returned

15
Approximate DP Approach
  • Estimating reward values (continued)
  • Fitting basis functions
  • Given our sample of 1,000 states with Q values,
    we fit linear coefficients to our basis functions
    to solve the least squares problem
  • The basis functions that we employed were
  • Team in Possession of Ball
  • Position of ball
  • Point differential
  • Score indicators
  • Winning by more than 7
  • Winning by less than 7
  • Score tied
  • Losing by less than 7
  • Down indicators
  • 3rd down for us
  • 3rd down for them
  • 4th down for us
  • 4th down for them

16
Basis Functions
Coefficient value for different times to games
end
17
ADP vs. Exact Solution
Comparing the approximate policy to the exact
solution.
  • Determining approximate policy
  • Using the basis functions, can calculate Q values
    for all states
  • Iterate through all states and determine the
    optimal action at each state based on the
    relevant Q values for the potential states that
    we could transition to.
  • Comparison to heuristic policy
  • Employ backwards induction to solve for the exact
    reward values for all states given that team 1 is
    playing the approximate policy and team 2 is
    playing the heuristic policy

18
ADP v. Exact Results
The approximated dynamic program captures 55 of
the improvement over the heuristic that the
optimal policy achieves.
Improvement in Winning Probability Over Heuristic
Starting Field Position
Note Starting Position (time left 60, score
is even, 1th down, 10 yards to go)
19
Comparison of Play Selection
The approximated policy has a greater number of
runs than the heuristic but also punts more
frequently.
Optimal Policy
Approximated Policy
Heuristic Policy
4th Down Play Selection
Runs
325,113
280,540
258,327
Punts
386,457
374,981
354,945
Field Goals
73,500
129,585
171,798
Total
785,070
785,070
785,070
20
Comparison of Performance
  • The approximate dynamic program runs about 15x
    faster
  • Potential applications
  • Similar simple models may have some real-time
    applications
  • More complex models could become significantly
    more manageable

Minutes to Complete
21
Conclusions
  • Optimal Policy
  • The implementation of the optimal policy resulted
    in an average increased winning percentage of
    6.5 in the initial states which we considered
    representative
  • The algorithm was able to run on a PC in 32
    minutes (incorporating some restrictions on the
    state space to achieve this performance)
  • Approximate Policy
  • The implementation of the approximate policy
    resulted in an average increased winning
    percentage of 3.5 in initial representative
    states
  • The algorithm ran in 2.3 minutes
  • Next Steps
  • Get transition probabilities from real data
  • Incorporate more decisions
  • Improve the heuristic and basis functions
Write a Comment
User Comments (0)
About PowerShow.com