Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

About This Presentation

Title:

Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

Description:

... the problem of allocating one's effort optimally between project i and a ... For the target p, given information state at time k, there are two actions: ... – PowerPoint PPT presentation

Number of Views:59

Avg rating:3.0/5.0

Slides: 16

Provided by: JSH55

Learn more at: http://people.ee.duke.edu

more less

Transcript and Presenter's Notes

Title: Hidden Markov Model Multiarm Bandits: A Methodology for Beam Scheduling in Multitarget Tracking

1
Hidden Markov Model Multiarm Bandits A
Methodology for Beam Scheduling in Multitarget
Tracking
Authors Vikram Krishnamurthy Robin Evans
Presented by Shihao Ji Duke University Machine
Learning Group June 10, 2005
2
Outline

Motivation
Overview
Multiarmed Bandits
HMM Multiarmed Bandits
Experimental Results

3
Motivation

ESA has only one steerable beam.
The coordinates of each target evolve according
to a finite state Markov chain.
Question which single target should the tracker
choose to observe at each time instant in order
to optimize some specified cost function?

4
Overview - How it works?
5
Multiarmed Bandits

The Model
One has N parallel projects, indexed
i1,2,,N and at each instant of discrete time
can work on only a single project. Let the state
of project i at time k be denoted . If one
works on project i at time k then one pays an
immediate expected cost of . The
state changes to by a Markov transition
rule (which may depend upon i, but not upon t),
while the state of the projects one has not
touched remain unchanged for
.The problem is how to allocate ones effort
over projects sequentially in time so as to
minimize expected total discounted cost.

6
Gittins Index

Simplest non-trivial problem, classic
No essential solution until Gittins and his
co-workers.
They proved that to each project i one could
attach an index,
,such that the optimal
action at time k is to work on that project for
which the current index is smallest. The index is
calculated by solving the problem of allocating
ones effort optimally between project i and a
standard project which yields a constant cost.
Gittins result thus reduces the case of general
N to that of the case N 2.

7
HMM Multiarmed Bandits

The standard multiarmed bandits problem
involves a fully observed finite state Markov
chain and is only a MDP with a rich structure.
For the multitarget tracking, due to measurement
noise at the sensor, the states are only
partially observable. Thus, the multitarget
tracking problem needs to be formulated as a
multiarmed bandits involving HMMs (with the HMM
filter to estimate the information state).
Can be solved brute forcedly by POMDP, but it
involves a much higher (enormous) dimensional
Markov chain.
Bandit assumption decouples the problem.

8
Bandit Assumption

The information state of currently observed
target updates by the HMM filter
For the other P-1 unobserved target, their
information states are kept frozen
if target q is not observed

9
Why it is Valid?

Slow Dynamics slowly moving targets have a
bandit structure.
where
Decoupling Approximation
without the bandit assumption, the optimal
solution is intractable. Bandit model is perhaps
the only reasonable approximation that leads to
computationally tractable solution.
Reinitialization a compromise.
Reinitialize the HMM multiarmed bandits at
regular intervals with updated estimates from all
targets.

10
Some details

Finite State Markov Assumption
denotes the quantized distance of the pth
target from base station, and the target distance
evolves according to a finite-state Markov chain.
Cost structure
typically depends on the distance of the pth
target to the base station, i.e., the target gets
close to the base station pose a greater threat
and given higher priority by the tracking
algorithm.
Objective function

11
Optimal Solution

For the bandit assumption, the optimal solution
has an indexable (decoupling) rule, that is, the
optimization can be decoupled into P
independent optimization problems.
For each target p, there is a function (Gittins
index) . Solved by POMDP
algorithms, see the next slide.
The optimal scheduling policy at time k is to
steer the beam toward the target with the
smallest Gittins index

12
Gittins Index

For arbitrary multiarmed bandits problem, the
Gittins index can be calculated by solving an
associated infinite horizon discounted control
problem called the return to state.
For the target p, given information state
at time k, there are two actions
1) Continue, which incurs a cost
and evolves according to HMM
filter
2) Restart, which moves to a fixed
information state , incurs a cost
, and evolves according to HMM
filter.

The Gittins index of the state of target p
is given by
where satisfies the
Bellman equation

14
POMDP solver

Defining new parameters (see eq.15),
Can be solved by any standard POMDP solver such
as sondiks algorithm, witness algorithm,
incremental-prune, or suboptimal (approximated)
algorithms.

15
Experimental Results

Write a Comment

User Comments (0)