Efficient Optimistic Parallel Simulations Using Reverse Computation - PowerPoint PPT Presentation

About This Presentation
Title:

Efficient Optimistic Parallel Simulations Using Reverse Computation

Description:

Title: Performance in the Presence of External Workloads Author: Hal Last modified by: chrisc Created Date: 5/28/1995 4:26:58 PM Document presentation format – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 28
Provided by: Hal1203
Learn more at: http://www.cs.rpi.edu
Category:

less

Transcript and Presenter's Notes

Title: Efficient Optimistic Parallel Simulations Using Reverse Computation


1
Efficient Optimistic Parallel Simulations Using
Reverse Computation
  • Chris Carothers
  • Department of Computer Science
  • Rensselaer Polytechnic Institute
  • Kalyan Permulla
  • and
  • Richard M. Fujimoto
  • College of Computing
  • Georgia Institute of Technology

2
Why Parallel/Distributed Simulation?
  • Goal speed up discrete-event simulation programs
    using multiple processors
  • Enabling technology for
  • intractable simulation models tractable
  • off-line decision aides on-line aides
    for time critical situation analysis
  • DPAT A distributed
    simulation success story
  • simulation model of the National Airspace
  • developed _at_ MITRE using Georgia Tech Time Warp
    (GTW)
  • simulates 50,000 flights in lt 1 minute, which
    use to take 1.5 hours.
  • web based user-interface
  • to be used in the FAA Command Center for on-line
    what if planning
  • Parallel/distributed simulation has the potential
    to improve how what if planning strategies are
    evaluated

3
How to Synchronize Distributed Simulations?
parallel time-stepped simulation lock-step
execution
parallel discrete-event simulation must allow
for sparse, irregular event computations
barrier
Problem events arriving in the past
Solution Time Warp
Virtual Time
Virtual Time
PE 2
PE 3
PE 1
PE 2
PE 1
PE 3
processed event
straggler event
4
Time Warp...
Local Control Mechanism error detection and
rollback
Global Control Mechanism compute Global Virtual
Time (GVT)
V i r t u a l T i m e
V i r t u a l T i m e
collect versions of state / events perform
I/O operations that are lt GVT
(1) undo state Ds (2) cancel sent events
GVT
LP 2
LP 3
LP 1
LP 2
LP 1
LP 3
unprocessed event
processed event
straggler event
committed event
5
Challenge Efficient Implementation?
  • Advantages
  • automatically finds available parallelism
  • makes development easier
  • outperforms conservative schemes by a factor of N
  • Disadvantages
  • Large memory requirements to support rollback
    operation
  • State-saving incurs high overheads for fine-grain
    event computations
  • Time Warp is out of performance envelop for
    many applications

Our Solution Reverse Computation
6
Outline...
  • Reverse Computation
  • Example ATM Multiplexor
  • Beneficial Application Properties
  • Rules for Automation
  • Reversible Random Number Generator
  • Experimental Results
  • Conclusions
  • Future Work

7
Our Solution Reverse Computation...
  • Use Reverse Computation (RC)
  • automatically generate reverse code from model
    source
  • undo by executing reverse code
  • Delivers better performance
  • negligible overhead for forward computation
  • significantly lower memory utilization

8
Example ATM Multiplexor
Original
N
if( qlen lt B ) qlen delaysqlen else lost
B
on cell arrival...
9
Gains.
  • State size reduction
  • from B2 words to 1 word
  • e.g. B100 gt 100x reduction!
  • Negligible overhead in forward computation
  • removed from forward computation
  • moved to rollback phase
  • Result
  • significant increase in speed
  • significant decrease in memory
  • How?...

10
Beneficial Application Properties
  • 1. Majority of operations are constructive
  • e.g., , --, etc.
  • 2. Size of control state lt size of data state
  • e.g., size of b1 lt size of qlen, sent, lost, etc.
  • 3. Perfectly reversible high-level operations
  • gleaned from irreversible smaller operations
  • e.g., random number generation

11
Rules for Automation...
Generation rules, and upper-bounds on bit
requirements for various statement types
12
Destructive Assignment...
  • Destructive assignment (DA)
  • examples x y x y
  • requires all modified bytes to be saved
  • Caveat
  • reversing technique for DAs can degenerate to
    traditional incremental state saving
  • Good news
  • certain collections of DAs are perfectly
    reversible!
  • queueing network models contain collections of
    easily/perfectly reversible DAs
  • queue handling (swap, shift, tree insert/delete,
    )
  • statistics collection (increment, decrement, )
  • random number generation (reversible RNGs)

13
Reversing an RNG?
double RNGGenVal(Generator g) long k,s
double u u 0.0 s Cg 0g k s
/ 46693 s 45991 (s - k 46693) - k
25884 if (s lt 0) s s 2147483647
Cg 0g s u u 4.65661287524579692e-10
s s Cg 1g k s / 10339 s
207707 (s - k 10339) - k 870 if (s lt
0) s s 2147483543 Cg 1g s u
u - 4.65661310075985993e-10 s if (u lt 0)
u u 1.0
s Cg 2g k s / 15499 s
138556 (s - k 15499) - k 3979 if (s lt
0.0) s s 2147483423 Cg 2g s
u u 4.65661336096842131e-10 s if (u gt
1.0) u u - 1.0 s Cg 3g k s /
43218 s 49689 (s - k 43218) - k
24121 if (s lt 0) s s 2147483323
Cg 3g s u u - 4.65661357780891134e-10
s if (u lt 0) u u 1.0 return
(u)
Observation k s / 46693 is a Destructive
AssignmentResult RC degrades to classic
state-savingcan we do better?
14
RNGs A Higher Level View
The previous RNG is based on the following
recurrence. xi,n aixi,n-1 mod mi where xi,n
one of the four seed values in the Nth set, mi is
one the four largest primes less than 231, and ai
is a primitive root of mi. Now, the above
recurrence is in fact reversible. inverse of ai
modulo mi is defined, bi aimi-2 mod mi Using
bi, we can generate the reverse recurrence as
follows xi,n-1 bixi,n mod mi
15
Reverse Code Efficiency...
  • Future RNGs may result in even greater savings.
  • Consider the MT19937 Generator...
  • Has a period of 219937
  • Uses 2496 bytes for a single generator
  • Property...
  • Non-reversibility of indvidual steps DO NOT imply
    that the computation as a whole is not
    reversible.
  • Can we automatically find this higher-level
    reversibility?
  • Other Reversible Structures Include...
  • Circular shift operation
  • Insertion deletion operations on trees (i.e.,
    priority queues).

Reverse computation is well-suited for queuing
network models!
16
Performance Study
17
Why the large increase in parallel performance?
million events/second
18
Cache Performance...
  • Faults TLB P cache S
    cache
  • SS 12pe 43966018 1283032615
    162449694
  • RC 12pe 11595326 590555715 94771426

19
Related Work...
  • Reverse computation used in
  • low power processors, debugging, garbage
    collection, database recovery, reliability, etc.
  • All previous work either
  • prohibit irreversible constructs, or
  • use copy-on-write implementation for every
    modification(correspond to incremental state
    saving)
  • Many operate at coarse, virtual page-level

20
Contributions
  • We identify that
  • RC makes Time Warp usable for fine-grain models!
  • disproved previous beliefthat fine grain models
    cant be optimistically simulated efficiently
  • less memory consumption, more speed, without
    extra user effort
  • RC generalizes state saving
  • e.g., incremental state saving, copy state saving
  • For certain data types, RC is more memory
    efficient than SS
  • e.g., priority queues

21
Future Work
  • Develop state minimization algorithms, by
  • State compressionbit size for reversibility lt
    bit size of data variables
  • State reusesame state bits for different
    statements
  • based on liveness, analogous to register
    allocation
  • Complete RC automation algorithm designavoiding
    the straightforward incremental state saving
    approach
  • Lossy integer and floating point arithmetic
  • Jump statements
  • Recursive functions

22
Geronimo! System Architecture
High Performance Simulation Application
Geronimo
distributed compute server
rack-mounted CPUs (not in demonstration)
multiprocessor
Geronimo Features (1) risky or speculative
processing of object computations, (2) reverse
computation to support undo operation, (3)
Active Code in a combination, heterogeneous,
shared-memory, message passing environment...
23
Geronimo! Risky Processing...
  • Execution Framework
  • Objects
  • schedule Threads / Tasks
  • at some virtual time
  • Applications
  • discrete-event simulations
  • scientific computing applications

processed thread
CAVEAT Good performance relies on cost of
recovery probability of failure being less than
cost of being safe!
straggler thread
unprocessed thread
24
Geronimo! Efficient Undo
  • Traditional approach State Saving
  • save byte-copies of modified items
  • high overhead for fine-granularity computations
  • memory utilization is large
  • need alternative for large-scale, fine-grain
    simulations
  • Our approach Reverse Computation
  • automatically generate reverse code from model
    source
  • utilize reverse code to do rollback
  • negligible overhead for forward computation
  • significantly lower memory utilization
  • joint with Kalyan Perumalla and Richard Fujimoto

Observation reverse computation treats code
asstate. This results in a code-state
duality.Can we generalize notion?..
25
Geronimo! Active Code
  • Key idea allow object methods/code to be
    dynamically changed during run-time.
  • objects can schedule in the future a new method
    or re-define old methods of other objects and
    themselves.
  • objects can erase/delete methods on themselves or
    other objects.
  • new methods can contain Active Code which can
    re-specialize itself or other objects.
  • work in a heterogeneous environment.
  • How is this useful?
  • increase performance by allowing the program to
    consistently execute the common case fast.
  • adaptive, perturbation-free, monitoring of
    distributed systems.
  • potential for increasing a languages
    expressive power.
  • Our approach?
  • Javano, need higher performancemaybe used in
    the future...
  • special compilerno, cant keep up with changes
    to microprocessors.

26
Geronimo! Active Code Implementation
  • Runtime infrastructure
  • modifies source code tree
  • start a rebuild of the executable on a another
    existing machine
  • uses a systems naïve compiler
  • Re-exec system call
  • reloads only the new text or code segment of new
    executable
  • fix-up old stack to reflect new code changes
  • fix-up pointers to functions
  • will run in user-space for portability across
    platforms
  • Language preprocessor
  • instruments code to support stack and function
    pointer fix-up
  • instruments code to support stack reconstruction
    and re-start process

27
Research Issues
  • Software architecture for the heterogeneous,
    shared-memory, message passing environment.
  • Development of distributed algorithms that are
    fully optimized for this combination
    environment.
  • What language to use for development, C or C or
    both?
  • Geronimo! API.
  • Active Code Language and Systems Support.
  • Mapping relevant application types to this
    framework

Homework Problem Can you find specific
applications/problems where we can apply
Geronimo!?
Write a Comment
User Comments (0)
About PowerShow.com