Title: Stochastic Learning Automata-Based Dynamic Algorithms for Single Source Shortest Path Problems
1Stochastic Learning Automata-Based Dynamic
Algorithms for Single Source Shortest Path
Problems
- S. Misra
- B. John Oommen
- Professor and Fellow of the IEEE
- Carleton University
- Ottawa
- Ontario, Canada
- (Nominated for the Best Paper Award)
- (Associated Thesis Proposal AAAI Doctoral
Award)
2Outline
- Introduction
- Previous Dynamic Algorithms
- Ramalingam and Reps Algorithm
- Frigioni et al.s Algorithm
- Principles of Learning Automata (LA)
- Solution Model
- Proposed Algorithm
- Simulation and Experiments
- Conclusion
3Dynamic Single Source Shortest Path Problem
(DSSSP)
- Maintaining shortest paths in a graph (with
single-source), where the edge-weights constantly
change, and where edges are constantly
inserted/deleted. - Edge-insertion is equivalent to weight-decrease,
and edge-deletion is equivalent to
weight-increase. - Semi-dynamic problem Either insertion
(weight-decrease) or deletion (weight-increase).
4Dynamic Single Source Shortest Path Problem
(DSSSP)
- Fully-dynamic problem Both insertion
(weight-decrease) and deletion (weight-increase). - The problem is representative of many practical
situations in daily life. - What to do if the edge-weights keep changing? At
every time instant random edge-weights.. - How to get the SP for the average graph ? The
first known solution ???
5Shortest Path Tree Costs are Random -
Changing...
Costs of the edges are determined on-the-fly Cost
BE 1.79 1.68 2.01, ..
6The Static Algorithms
- Dijkstra or Bellman-Fords solutions are
unacceptably inefficient in dynamic environments. - Such static algorithms involve
- Recomputing the shortest path tree from scratch
- Done each time a topological change occurs.
7Previous Dynamic Algorithms
- Spira and Pan (1975) Very early work, proven
theoretically to be inefficient. - McQuillan et al. (1980) Very early work, not
proven at all. - Ramalingam and Reps (1996) Recent work,
fully-dynamic solution. - Franciosa et al. (1997) Recent work,
semi-dynamic solution. - Frigioni et al. (2000) Recent work,
fully-dynamic solution.
8Ramalingam and Reps Algorithm
- Edge-Insertion
- Maintains a priority queue containing vertices
- Priorities equal to their distance from the
end-point of the inserted edge. - When a vertex having a minimum priority is
extracted from the queue, all the outgoing edges
are processed. -
- Edge-Deletion
- Phase I Determines the vertices/edges affected
by the deletion. - Phase II Determines the new output value for all
the affected vertices and updates the shortest
path tree.
9Frigioni et al.s Algorithm
- Weight-Decrease
- Based on Output updates.
- No. of Output Updates No. of vertices that
change the distance from the source, on unit
change in the graph. - If decreasing a weight changes the distance of
the terminating end of the inserted vertex, a
global priority queue is used to compute the new
distances from the source. - Unlike the Ramalingam/Reps algorithm, on
dequeuing a vertex, not all the edges leaving it
are scanned.
10Frigioni et al.s Algorithm (Contd)
- Weight-Increase
- Based on the following node-coloring scheme
- Marking a node q white, which changes neither the
distance from s nor the parent in the tree rooted
in s - Marking a node q red, which increases the
distance from s, - marking a node q pink, which preserves its
distance from s, but replaces the old parent in
the tree rooted in s.
11Frigioni et al.s Algorithm (Contd)
- Weight-Increase
- Three main phases
- Update local data-structures at the end-points of
the affected edge, and check whether any
distances change. - Color the vertices repeatedly by extracting
vertices with minimum priority. - Compute the new distances for the red vertices.
12Learning Automata
- Previously used to model biological learning
systems. - Can be used to find the optimal action.
- How is learning accomplished?
13Learning Automata The Feedback Loop
- Random Environment (RE)
- Learning Automata (LA)
- Set of actions ?1, ..., ?r
- Reward/Penalty
- Action Probability Vector
- Action Probability Updating Scheme The LRI
scheme.
? ?1, ..., ?r
? 0, 1
14Variable Structure LA
? Defined in terms of ? Action Probability
Updating Schemes ? Action probability vector
is p1(t), ..., pr(t)T ? p i(t) Probability
of choosing ?i at time t ? Implemented using ?
random number generator ? Flexibility ?
Different actions at two consecutive time
instants ? Action probability Updated Various
ways
15Categories of VSSA
- Classification based on the type of the
probability space - Continuous
- Discrete
- Classification based on the learning paradigm
- Reward-Penalty schemes
- Reward-Inaction schemes
- Inaction-Penalty schemes
16Categories of VSSA
- Ergodic scheme
- ? Limiting Distribution Independent of initial
distribution - ? Used if Environment is Non-stationary
- ? LA wont get locked into any of the given
actions. -
17Categories of VSSA
- Absorbing scheme
- ? Limiting Distribution Dependent of initial
distribution - ? Used if Environment is Stationary
- ? LA finally gets Absorbed into its final action.
- Example Linear Reward-Inaction (LRI) scheme.
18LRI Scheme
p1(n) p2(n) p3(n) p4(n)
0.4 0.3 0.1 0.2
If ?2 chosen rewarded. ? p2 increased ? p1,
p3, p4 decreased linearly.
p1(n1) p2(n1) p3(n1) p4(n1)
0.36 1-0.36-0.9-0.18 0.09 0.18
0.36 0.37 0.09 0.18
0 1 0 0
p1(?) p2(?) p3(?) p4(?)
If ?2 is the best action
19Our Solution
- Current state-of-the-art No Solution to the
DSSSP - When the edge-weights are dynamically and
stochastically changing.
20Our Solution
- Our solution
- Uses the Theory of Learning Automata (LA).
- Extends the current models by encapsulating the
problem within the field of LA. - Finds a shortest path in realistically occurring
stochastic environments. - Finds a shortest path for the average
underlying graph, dictated by an Oracle (also
called the Environment). - Finds the statistical shortest path tree, that
will be stable regardless of continuously
changing weights.
21Our Solution Model The Automata
- Station a LA at every node in the graph.
- At every instance, the LA chooses a suitable edge
from all the outgoing edges at that node, by
interacting with the environment. - The LA requests the Environment for the current
random weight for the edge it chooses. - The system computes the current shortest path
using RR/FMN. - LA determines whether the choice it made should
be rewarded/penalized.
22Our Solution Model The Environment
- Consists of the overall dynamically changing
graph. - Multiple edge-weights that change stochastically
and continuously. - Changes Based on a distribution
- Unknown to the LA,
- Known to the Environment.
- The Environment supplies a Reward/Penalty signal
to the LA.
23Our Solution Model Reward/Penalty
- Updated shortest path tree is computed
- Based on the action the LA chooses, and
- The edge-weight the Environment provides.
- The LA compares the cost with the current
average shortest paths. - The LA
- Infers whether the choice should be
rewarded/penalized. - Updates the action probabilities using the LRI
scheme.
24Shortest Path Tree Updated Action Probability
Vectors
4.9
25LASPA The Proposed Algorithm
- INPUT
- G(V,E) A dynamically changing graph with
simultaneous multiple stochastic edge updates
occurring - iters total number of iterations.
- ? learning parameter.
- OUTPUT
- A converged graph that has all the shortest path
information. - Values of all action probability vectors.
- ASSUMPTION
- The algorithm maintains an action probability
vector, P p1(n), p2(n) pr(n), for each node
of the graph.
26LASPA The Proposed Algorithm
- LASPA ALGORITHM
- 1 Obtain a snapshot of the directed graph with
each edge having a random weight. This
edge-weight is based on the random call for an
edge. - Run Dijkstras Algorithm to determine the
shortest path edges on the graphs snapshot
obtained in the first step. Based on this, update
the action probability vector of each node -
shortest path edges have an increased
probability. - Randomly choose a node from the current graph.
For that node, choose an edge based on the action
probability vector. Request the edge-weight of
this edge and recalculate the shortest path using
either the RR or FMN algorithms. - Update the action probability vectors for all the
nodes using the Reward-Inaction philosophy. - Repeat Steps 3-4 above until the algorithm has
converged.
27Simulations The Experiments
- Experiment Set 1 Comparison of the performance
of LASPA with FMN and RR for a fixed graph
structure. - Experiment Set 2 Comparison of the performance
results with variation in graph structures. - Experiment Set 3 Sensitivity of the performance
of LASPA to the variation of certain parameters,
while keeping others constant.
28Simulations The Performance Metrics
- Average number of scanned edges per update
operation. - Average number of processed nodes per update
operation - Average time required per update operation.
29Simulations Experiment Set 1
- Average
- Processed Nodes
- Graph with 50 nodes, 20 sparsity.
- Edge-weights with means between 1.0 and 5.0, and
variances between 0.5 and 1.5 - Mixed sequences of 500 update operations.
30Simulations Experiment Set 1 (Contd)
- Average
- Time Per Update
- Graph with 50 nodes, 20 sparsity.
- Edge-weights with means between 1.0 and 5.0, and
variances between 0.5 and 1.5 - Mixed sequences of 500 update operations.
31Simulations Experiment Set 2
- Variation in
- Graph Sparsity
- Graphs with 100 nodes, varying sparsity.
- Edge-weights with means between 1.0 and 5.0, and
variances between 0.5 and 0.9, ?0.9 - Mixed sequences of 500 update operations.
- LASPA, on an average, performs better than
RR/FMN. - Variation in the number of nodes show similar
results.
Table Average/Min/Max Time Per Udate versus
Sparsity
32Simulations Experiment Set 3
- Sensitivity of results
- to the variation in
- Learning parameter.
- Graph with 50 nodes, 20 sparsity.
- Edge-weights with means between 1.0 and 5.0, and
variances between 0.5 and 1.5 - Mixed sequences of 500 update operations.
- ? is varied.
- Other metrics show similar results.
33Conclusions
- Novelty of our work
- First reported LA solution to DSSSP.
- Superior solution than the previous ones.
- Existing algorithms cant operate successfully in
realistically occurring continuously changing
stochastic environments. - Breakthrough solution that could have commercial
value. - Practical usefulness of our algorithm
- Telecommunications Networking
- Transportation
- Military
- Future work
- Evaluation on very large topologies.
- Evaluation on real networks.