EnergyAware Deterministic Fault Tolerance in Distributed RealTime Embedded Systems - PowerPoint PPT Presentation

Loading...

PPT – EnergyAware Deterministic Fault Tolerance in Distributed RealTime Embedded Systems PowerPoint presentation | free to view - id: 1d53f1-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

EnergyAware Deterministic Fault Tolerance in Distributed RealTime Embedded Systems

Description:

Energy-Aware Deterministic Fault Tolerance in Distributed. Real-Time Embedded Systems ... Boeing 777:1280 embedded processors, 4 million lines of software. Motivation ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 27
Provided by: yin106
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: EnergyAware Deterministic Fault Tolerance in Distributed RealTime Embedded Systems


1
Energy-Aware Deterministic Fault Tolerance in
Distributed Real-Time Embedded Systems
  • Ying Zhang?
  • Robert P. Dick?
  • Krishnendu Chakrabarty?
  • Department of Electrical Computer Engineering
  • Duke University
  • ?Department of Electrical Computer Engineering
  • Northwestern University

2
Motivation
  • Complex embedded systems
  • Boeing 7771280 embedded processors, 4 million
    lines of software
  • Fault tolerance and power
  • management addressed separately
  • Can fault tolerance and power management be
    combined in an integrated fashion?

3
Motivation
  • Goal low power, fault-tolerant real-time
    systems
  • Responsive to task deadlines
  • Fault-tolerant
  • Energy-efficient
  • Challenges tradeoffs
  • Low energy vs. real-time responsiveness
  • Fault tolerance vs. real-time responsiveness
  • Low energy vs. fault tolerance

Real-time responsiveness
Fault tolerance
Energy efficiency
4
Outline
  • Motivation
  • Background
  • Constant speed case
  • Incorporating DVS
  • Conclusions and future work

5
Checkpointing
2. Background
  • Checkpointing intermediate states of a task are
    saved periodically
  • Rollback recovery computation is resumed from
    the most recent checkpoint rather than the
    beginning
  • Save re-execution time ? desirable for real-time
    systems
  • Transient faults caused by cosmic rays,
    high-energy particles, crosstalk, etc.

Checkpoint
Checkpoint
Release time
Deadline
Normal execution
6
Dynamic Voltage Scaling (I)
  • DVS Dynamic voltage scaling
  • run-time variation of processor supply voltage
  • P ? CoutVdd2f
  • f ? (Vdd - Vt)2/Vdd
  • cubic reductions in power, quadratic reductions
    in energy
  • Lower Vdd ? lower power consumption longer
    execution times
  • May cause tasks to miss their deadlines in
    real-time systems

7
Dynamic Voltage Scaling (II)
  • DVS for fault tolerance
  • Speed up ? increase slack ? easier to provide
    fault tolerance
  • Tradeoffs
  • Speed ? ? Energy?, Execution time?,
    Fault-tolerance capability?
  • Speed ? ? Energy?, Execution time?,
    Fault-tolerance capability?

Voltage levels
V2
High fault tolerance
Low fault tolerance
V1
Task execution time
8
Practical Issues in Checkpointing
  • Stable storage in embedded systems
  • Storage types SRAM, DRAM, ROM, and flash memory
  • DRAM is appropriate for checkpoint saving
    access speed capacity 1
  • Checkpoint overhead
  • Checkpoint size in the order of KBytes in many
    embedded system 1
  • Typical time to save a checkpoint to DRAM in the
    order of ?s
  • Energy cost dependent on memory access power
  • Fault during checkpointing and recovery
  • Must be taken into account due to high fault
    arrival rate
  • Always rollback and restore state wherever a
    fault occurs

1 C.-Y. Lin et al., A checkpointing tool
for Palm operating system, Proc. DSN, pp. 71-76,
2001.
9
System and Fault Model
System model
  • Communication based on message passing
  • Each PE has its own processor and DRAM
  • Checkpoints saved in DRAM
  • Processors can be constant-speed or
    variable-speed (DVS-capable)
  • Fixed communication speed
  • System implemented using CAD synthesis tool

Fault model
  • Target transient faults single-event upsets,
    crosstalk glitches, etc.
  • Permanent faults handled through manufacturing
    and testing techniques
  • Errors due to transient faults detected through
    appropriate schemes
  • Fault arrival k-fault-tolerance
  • Value of k determined by fault arrival rate
    task application time

10
Problem formulation
3. Constant speed case
  • Program modeled by a DAG G (V, E)
  • Node vi (ai, di, ti)
  • ai arrival time
  • di deadline
  • ni computation time
  • cij communication cost between vi and vj

Problem Given G (V, E) and a system level
synthesis tool Determine a checkpointing and
voltage scaling scheme (in the presence of
faults) Such that (1) all jobs complete on time
(2) energy savings achieved
11
Checkpointing Consistency
Consistent state t1 lt tA lt tf
Inconsistent state tA lt t1 lt tf
Consistent state if the checkpointing state of a
processor reflects a message receipt, then the
checkpointing state of the corresponding sender
processor should indicate that the message has
been sent out.
12
Synchronized Checkpointing Ensuring Consistency
  • Globally synchronized signal used for triggering
    local checkpointing clock or coarse-grained
    signal
  • All processors take checkpoints according to a
    global time
  • No costly coordination message
  • Predictable recovery cost (bounded by
    checkpointing period T)

P0
P1
P2
tf
T
t0T
t0
13
Feasibility Analysis under Constant Speed
  • Assumption
  • At most k faults occur in the system (including
    all PEs), before the program deadline (largest
    job deadline)
  • Time to store (retrieve) a checkpoint cw (cr)
  • Synchronized checkpointing with interval ?
  • Key observation
  • Maximum time penalty for one fault ? ? cw
    cr
  • Approach
  • Perform topological-sort for G
  • Calculate worst-case-finish-time (wcft) for each
    job vi
  • Examine deadline constraints feasible iff
    wcft(vi )di , ?vi ?V
  • Complexity O(VE)

14
Illustrative Example Program Composed of 3 jobs
t1
v1
PE1
d1
a1
t2
v2
PE2
d2
a2
c12
t3
v3
PE3
a3
d3
c23
15
Illustrative Example Key Points
  • Incorporate additional timing cost checkpointing
    rollback recovery
  • Examine the effect of fault occurrences for all
    predecessor jobs
  • Compare each predecessors WCFT with arrival time
    of current job
  • Divide the problem into three cases

16
Illustrative Example Case 1
a2
wcft1(n1) c12
n1
k
17
Illustrative Example Case 2
wcft1(n1) c12
a2
n1
k
18
Illustrative Example Case 3
wcft1(n1) c12
a2
k
n1
19
Theoretical Basis
20
Simulation Results
  • E3S benchmarks
  • Embedded system synthesis benchmarks suite
  • Automotive systems, telecommunications and
    consumer electronics

21
4. Incorporating DVS
  • System model
  • Message-passing system composed of identical PEs
  • HW/SW co-synthesis tool CORDS
  • Each processor has l variable speeds f1, f2, …,
    fl
  • Two-step method
  • Step 1 Pre-processing using CORDS (under
    fault-free
  • conditions highest processor speed)
  • Allocation of PEs and communication links
  • Assignment of jobs communications on PEs/links
  • Valid task scheduling
  • Step 2 Determine checkpointing interval and
    speed
  • scaling in the presence of faults
  • All jobs meet deadlines
  • Energy saving without violating deadlines

22
Incorporating DVS (II)
a1? c01 2 a2? c02 5 a3? c03 3
Key point Speed is scaled down without delaying
any successor jobs
23
Simulation Setup
  • AMD K6 processor
  • Operate at 1.4 V, 1.5 V, and 1.6 V
  • Consider 4 schemes
  • without checkpointing and DVS (S1)
  • with checkpointing but without DVS (S2)
  • without checkpointing and with DVS (S3)
  • with checkpointing and DVS (S4)

24
Simulation Results
  • Checkpointing enhances fault tolerance capability
  • DVS achieves energy savings
  • 13.3 energy reduction
  • Checkpointing is cost-effective
  • Negligible energy cost (lt1)

25
5. Conclusions
  • Goal distributed embedded systems with
  • Real-time responsiveness
  • Fault tolerance
  • Low energy consumption
  • Contributions a unified approach for
    fault tolerance energy saving
  • Fault tolerance achieved through synchronized
    checkpointing scheme
  • Energy saving achieved through DVS without
    violating deadlines

26
Future Work
  • More general real-time task model
  • More efficient energy saving
  • Improvement of checkpointing algorithm
  • Further integration with synthesis tools
About PowerShow.com