Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

1 / 15
About This Presentation
Title:

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

Description:

... the problem of allocating one's effort optimally between project i and a ... For the target p, given information state at time k, there are two actions: ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 16
Provided by: JSH55

less

Transcript and Presenter's Notes

Title: Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking


1
Hidden Markov Model Multiarm Bandits A
Methodology for Beam Scheduling in Multitarget
Tracking
Authors Vikram Krishnamurthy Robin Evans
Presented by Shihao Ji Duke University Machine
Learning Group June 10, 2005
2
Outline
  • Motivation
  • Overview
  • Multiarmed Bandits
  • HMM Multiarmed Bandits
  • Experimental Results

3
Motivation
  • ESA has only one steerable beam.
  • The coordinates of each target evolve according
    to a finite state Markov chain.
  • Question which single target should the tracker
    choose to observe at each time instant in order
    to optimize some specified cost function?

4
Overview - How it works?
5
Multiarmed Bandits
  • The Model
  • One has N parallel projects, indexed
    i1,2,,N and at each instant of discrete time
    can work on only a single project. Let the state
    of project i at time k be denoted . If one
    works on project i at time k then one pays an
    immediate expected cost of . The
    state changes to by a Markov transition
    rule (which may depend upon i, but not upon t),
    while the state of the projects one has not
    touched remain unchanged for
    .The problem is how to allocate ones effort
    over projects sequentially in time so as to
    minimize expected total discounted cost.

6
Gittins Index
  • Simplest non-trivial problem, classic
  • No essential solution until Gittins and his
    co-workers.
  • They proved that to each project i one could
    attach an index,
  • ,such that the optimal
    action at time k is to work on that project for
    which the current index is smallest. The index is
    calculated by solving the problem of allocating
    ones effort optimally between project i and a
    standard project which yields a constant cost.
  • Gittins result thus reduces the case of general
    N to that of the case N 2.


7
HMM Multiarmed Bandits
  • The standard multiarmed bandits problem
    involves a fully observed finite state Markov
    chain and is only a MDP with a rich structure.
  • For the multitarget tracking, due to measurement
    noise at the sensor, the states are only
    partially observable. Thus, the multitarget
    tracking problem needs to be formulated as a
    multiarmed bandits involving HMMs (with the HMM
    filter to estimate the information state).
  • Can be solved brute forcedly by POMDP, but it
    involves a much higher (enormous) dimensional
    Markov chain.
  • Bandit assumption decouples the problem.

8
Bandit Assumption
  • The information state of currently observed
    target updates by the HMM filter
  • For the other P-1 unobserved target, their
    information states are kept frozen

  • if target q is not observed

9
Why it is Valid?
  • Slow Dynamics slowly moving targets have a
    bandit structure.


  • where
  • Decoupling Approximation
  • without the bandit assumption, the optimal
    solution is intractable. Bandit model is perhaps
    the only reasonable approximation that leads to
    computationally tractable solution.
  • Reinitialization a compromise.
  • Reinitialize the HMM multiarmed bandits at
    regular intervals with updated estimates from all
    targets.

10
Some details
  • Finite State Markov Assumption
  • denotes the quantized distance of the pth
    target from base station, and the target distance
    evolves according to a finite-state Markov chain.

  • Cost structure
  • typically depends on the distance of the pth
    target to the base station, i.e., the target gets
    close to the base station pose a greater threat
    and given higher priority by the tracking
    algorithm.
  • Objective function

11
Optimal Solution
  • For the bandit assumption, the optimal solution
    has an indexable (decoupling) rule, that is, the
    optimization can be decoupled into P
    independent optimization problems.
  • For each target p, there is a function (Gittins
    index) . Solved by POMDP
    algorithms, see the next slide.
  • The optimal scheduling policy at time k is to
    steer the beam toward the target with the
    smallest Gittins index

12
Gittins Index
  • For arbitrary multiarmed bandits problem, the
    Gittins index can be calculated by solving an
    associated infinite horizon discounted control
    problem called the return to state.
  • For the target p, given information state
    at time k, there are two actions
  • 1) Continue, which incurs a cost
    and evolves according to HMM
    filter
  • 2) Restart, which moves to a fixed
    information state , incurs a cost
    , and evolves according to HMM
    filter.

13
  • The Gittins index of the state of target p
    is given by
  • where satisfies the
    Bellman equation

14
POMDP solver
  • Defining new parameters (see eq.15),
  • Can be solved by any standard POMDP solver such
    as sondiks algorithm, witness algorithm,
    incremental-prune, or suboptimal (approximated)
    algorithms.

15
Experimental Results
Write a Comment
User Comments (0)