Power management in Realtime systems - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Power management in Realtime systems

Description:

Power off un-used parts: LCD, disk for Laptop. Gracefully reduce the performance ... Alex Jones. PhD student: Ding Zhu. Dan Li (now doing AI) Shuyi Shao ... – PowerPoint PPT presentation

Number of Views:163
Avg rating:3.0/5.0
Slides: 60
Provided by: PAJ6
Category:

less

Transcript and Presenter's Notes

Title: Power management in Realtime systems


1
Power management in Real-time systems
Collaborators Daniel Mosse Bruce ChildersPhD
students Hakan Aydin Dakai Zhu Cosmin
Rusu Nevine AbouGhazaleh Ruibin Xu
2
Power Management
  • Why?
  • Battery operated Laptop, PDA and Cell phone
  • Heating complex Servers (multiprocessors)
  • Power Aware maintain QoS, reduce energy
  • How?
  • Power off un-used parts LCD, disk for Laptop
  • Gracefully reduce the performance
  • CPU dynamic power Pd Cef Vdd2 f
  • Cef switch capacitance
  • Vdd supply voltage
  • f processor frequency ? linearly related to Vdd

3
Power Aware Scheduling
  • Static Power Management (SPM)
  • Static slack uniformly slow down all tasks
  • Gets more interesting for multiprocessors

fmax
Static Slack
D
E
T1
T2
idle
time
4
Dynamic Power management (DPM)
  • Dynamic slack average execution 10
  • Utilize slack to slow down future tasks
    (Proportional, Greedy, aggressive,)

5
Stochastic power management
slack
ß1
6
Computing ßi in Reverse Order
T1
T2
T3
T4
7
Dynamic Speed adjustment techniques for
non-linear code
PMP
p1
p3
p2
min
average
max
At a
PMP
  • Remaining WCET is based on the longest path
  • Remaining average case execution time is based on
    the branching probabilities (from trace
    information).

8
Who should manage?
Run-time information
Static analysis
Compiler (knows future better)
OS (knows past Better)
9
Maximizing systems utility
(as opposed to minimizing energy consumption)
Energy constrains
Time constrains (deadlines or rates)
System utility (reward)
Increased reward with increased execution
Determine appropriate versions to execute
Determine the most rewarding subset of tasks to
execute
10
Many problem formulations
  • Continuous frequencies, continuous reward
    functions
  • Discrete operating frequencies, no reward for
    partial execution
  • Version programming an alternative to the IRIS
    (IC) QoS model

Optimal solutions
Heuristics
EXAMPLE For homogeneous power functions, maximum
reward is when power is allocated equally to all
tasks.
Add a task
if constraintsis violated
no
yes
Repair schedule
11
Rechargeable systems
(additional constrains on energy and power)
battery
Available power
Use to store (recharge)
split
merge
consume
time
Schedulable system
Example
  • Solar panel (needs light)
  • Tasks are continuously executed
  • keep the battery level above a threshold at all
    times
  • Frame based system
  • Three dynamic policies (greedy, speculative and
    proportional)

12
Multiprocessing systems
13
Scheduling Policy
  • Partition Tasks to Processors
  • Each processor applies PM individually
  • Distributed
  • Global Queue
  • Global management
  • Shared memory

14
Dynamic Power Management
  • Greedy
  • Any available slack is given to next ready task
  • Feasible for single processor systems
  • Fails for multi-processor systems

15
Steaming Applications
  • Streaming applications are prevalent
  • Audio, video, real-time tasks, cognitive
    applications
  • Executing on
  • Servers, embedded systems
  • Multiprocessors and processor clusters
  • Chip Multiprocessors TRIPS, RAW, etc.
  • Constrains
  • Interarrival time (T)
  • End-to-end delay (D)
  • Two possible strategies
  • Master-slave
  • Pipelining

T
D
16
Master-slave Strategy
  • Single streaming application
  • The optimal number, n, of active PEs strikes a
    balance between static and dynamic power
  • Given n, the speed on each PE is chosen to
    minimize energy consumption
  • Multiple steaming applications
  • Determine the optimal number of active PEs
  • Given the number of active PEs,
  • First assign streams to groups of PEs (ex
    balance load using the minimum span algorithm).
  • Adjust the speed on each PE to minimize energy

17
Pipeline Strategy
  • (1) Linear pipeline ( of stages of PEs)

18
Pipeline Strategy
  • (2) Linear pipeline ( of stages of PEs)
  • Solution 1 (optimal)
  • Discretize the time and use dynamic programming

Solution 2 (use some heuristics)
(3) Nonlinear pipeline
  • of stages of PEs
  • Formulate an optimization problem with multiple
    sets of constraints, each corresponding to a
    linear pipeline
  • Problem the number of constraints (can be
    exponential)
  • Solution add additional variables denoting the
    finishing time for each stage
  • of stages of PEs
  • Apply partitioning/mapping first and then do
    power management

19
Scheduling into a 2-D processor array (CMP)
A
B
C
D
E
F
G
H
I
J
  • Step 1 topological-sort-based morphing
  • Step 2 A dynamic programming approach to find
    the optimal of stages and optimal of
    processors for each stage

20
Tradeoff Energy Dependability
21
Time slack (unused processor capacity)
Use to reduce speed
Use for redundancy
Use to do more work
Fault tolerance
Power management
Increase productivity
space redundancy
Time redundancy
Effect of DVS on reliability
22
Exploring time redundancy
The slack is used to 1) add checkpoints 2)
reserve recovery time 3) reduce processing
speed
For a given slack and checkpoint
overhead, We can find the number of
checkpoints and the placement of
checkpoints Such that we minimizes energy
consumption, and guarantee recovery and
timeliness.
Energy
of checkpoints
23
TMR vs. Duplex
TMR
Duplex
r
overhead of checkpoint
p
ratio of static/dynamic power
r
r
TMR is more Energy efficient
Load0.7
0.035
Load0.6
Load0.5
Duplex is more Energy efficient
0.02
p
p
0.1
0.2
Identified energy efficient operating regions for
TMR/Duplex
24
Effect of DVS on SEU rate
  • Lower voltages ? higher fault rate
  • Lower speed ? less slack for recovery

Reliability requirement
Fault model
Available slack
Acceptable level of DVS
25
Near-memory Caching for Improved Energy
Consumption
26
Near-CPU vs. Near-memory caches
CPU
  • Caching to mask memory delays
  • Where?

cache
cache
  • Which is more power and performance efficeint ?

Main Memory
  • Thesis Need to balance the allocation of the two
    for better delay and energy.

27
Near-memory caching Cached-DRAM (CDRAM)
  • On-memory SRAM cache
  • accessing fast SRAM cache ? Improves performance.
  • High internal bandwidth ? use large block sizes
  • Improves performance but consume more energy

Same config. as in Hsu et al., 2003
28
Power-aware CDRAM
  • Power management in near-memory caches
  • Use distributed near-memory caches
  • Choose adequate cache configuration
  • to reduce miss rate energy per access.
  • Power management in DRAM-core
  • Use moderate sized SRAM cache
  • Turn the DRAM core to low power state
  • Use immediate shutdown
  • Near-memory versus DRAM energy
  • tradeoff cache block size

29
Wireless Networks
Collaborators Daniel MossePhD student Sameh
Gobrial
30
Saving Power
  • Power is proportional to the square of the
    distance
  • The closer the nodes, the less power is needed
  • Power-aware Routing (PARO) identifies new nodes
    between other nodes and re-routes packets to
    save energy
  • Nodes decide to reduce/increase their transmit
    power

31
Asymmetry in Transmit Power
  • Instead of C sending directly to A, it can go
    through B
  • Saves transmit power, but may cause some
    problems.

32
Problems due to one-way links.
  • Collision avoidance (RTS/CTS) scheme is impaired
  • Even across bidirectional links!
  • Unreliable transmissions through one-way link.
  • May need multi-hop Acks at Data Link Layer.
  • Link outage can be discovered only at downstream
    nodes.

A
B
C
33
Problems for Routing Protocols
  • Route discovery mechanism.
  • Cannot reply using inverse path of route request.
  • Need to identify unidirectional links. (AODV)
  • Route Maintenance.
  • Need explicit neighbor discovery mechanism.
  • Connectivity of the network.
  • Gets worse (partitions!) if only bidirectional
    links are used.

34
Wireless bandwidth and Power savings
  • In addition to transmit power, what else can we
    do to save energy?
  • Power has a direct relation with signal to noise
    ratio (SNR)
  • The higher the power, the higher the signal, the
    less noise, the less errors, the more data a node
    can transmit
  • Increasing the power allows for higher bandwidth
  • Turn transceivers off when not used this
    creates problems when a node needs to relay
    messages for other nodes.

35
Using Optical Interconnections in Supercomputers
Collaborators Alex JonesPhD student Ding
Zhu Dan Li (now doing AI) Shuyi Shao
36
Motivation for using Optical Circuit Switching
(OCS) in Suprecomputers
  • Many HPCS applications have only a small degree
    (6-10) of high bandwidth communication among
    processes/threads
  • Rest of a threads / process communication
    traffic are low bandwidth communication
    exceptions.
  • Many HPCS applications have persistent
    communication patterns
  • Fixed over all the programs run time, or slowly
    changing
  • But there are bad applications, or phases in
    applications, which are chaos
  • GUPS..!
  • Optics is good for high bandwidth, bad for fast
    switching. Electronics is the other way around
    and is good for processing (collectives)
  • Need two networks to complement each other

37
The OCS Network fabric
2 networks complement each other
Circuit-Switched all Optical Fat-Trees made of
512x512 MEMS-based optical switches
One of multiple fat-tree networks
One of multiple fat-tree networks
One of multiple fat-tree networks
One of multiple fat-tree networks
OCS
One of multiple fat-tree networks
One of multiple fat-tree networks
One of multiple fat-tree networks
One of multiple fat-tree networks
Fat-tree network
Fat-tree network
Fat-tree network
Fat-tree network
Fat-tree network
Fat-tree network
Fat-tree network
Storage/IO Network
PERCS D-block
PERCS D-block
Intelligent Network (1/10 or less BW) Including
collective communication
38
Communication Pattern node48 AMR CTH
Communication patterns change in phases lasting
10s of Sec
(Node 48)
250 sec phase
Communication Phases
39
UMT2K Fixed, Irregular Communication Pattern
Percentage of Traffic By Bandwidth
Communication Matrix
Max communication degree from each node to any
other node is about 10. The pattern is irregular
but fixed.
100
10
1
40
Handling HPCS application in OCS
Communication Un-predictability
Use multiple hops through OCS OR use
intelligent network
Un-Predictable
Run-time predictor
Communication
Run-Time Predictable
Compile Time Statically Analyzable
Compiled Communication
Temporal Locality
Low
High
NOTE No changes in the applications code. OCS
setup by the compiler, run time auto prediction,
and multi hop routing
41
Paradigm of compiled communication
MPI trace code
HPC systems
Traces
MPI application
Optimized MPI code
Compiler
Communication Patterns
Network configuration Instruction Enhanced MPI
code
Network Configurations
Performance statistics
Compiled communication
Run-time predictor
HPC systems (Simulator)
42
Compilation Framework
  • Targets MPI applications
  • Compiler
  • Recognize and represent communication patterns
  • Communication compiling
  • Enhance applications with network configuration
    instructions
  • Automate trace generation

43
Communication Pattern
  • Communication Classification
  • Static
  • Persistent
  • Dynamic
  • Executions of parallel applications show
    phasescommunication phases

44
The Communication Predictor
  • Initially, setup the OCS for random traffic
  • Keep track of connections utilization
  • A migration policy to create circuits in OCS
  • A simple threshold policy
  • An intelligent pattern predictor
  • An evacuation policy to remove circuits from OCS
  • LRU replacement
  • A compiler inserted directive

45
Dealing with Unpredictable Communications
Set up the OCS planes so that any D-block can
reach any other D-block with at most two hops
through the network
Example Route from node 2 to node 4 (second node
in second group)
46
Scheduling in Buffer Limited Networks
Collaborators Taieb ZnatiPhD
student Mahmoud Elhaddad
47
Packet-switched Network with Fixed-Size Buffers
  • Packet routers connected via time-slotted
    buffer-limited links
  • Packet duration is one slot
  • You cannot freely size packet buffers to prevent
    loss
  • All-optical packet routers
  • On-chip (line-driver chip) SRAM buffers
  • Connections
  • Ingress--egress traffic aggregates
  • Fixed bandwidth demand
  • Connection has fixed path
  • Loss rate of a connection
  • Loss rate is fraction of lost packets
  • goal is to Guarantee loss rate
  • Loss guarantee depends on the path of connection

48
Link scheduling algorithms
  • Packet Service discipline
  • FCFS, LIFO, Fixed Priority, Nearest To Go
  • Drop policy
  • Drop tail, drop front, random drop, Furthest To
    Go
  • Must be Work conserving
  • Drop excess packets only when buffer overflows
  • Serve packet in every slot as long as buffer not
    empty
  • Must use only local information
  • No hints or coordination between routers

49
Link scheduling in buffer-limited networks
  • Problem
  • Minimize guaranteed loss rate for every
    connection
  • Key question Is there a class of algorithms that
    lead to better loss bounds as fn of utilization
    and path length?

FCFS scheduling with drop tail
Proposed rolling priority scheduling
50
Link scheduling in buffer-limited networks
  • Findings
  • A local fairness property is necessary to
    minimize the guaranteed loss rate for every path
    length and utilization constraint.
  • FCFS/RD (Random Drop) is locally fair
  • A locally-fair algorithm Rolling Priority that
    improves the loss guarantees compared to FCFS/RD,
    and is simple to implement
  • Rolling Priority is optimal
  • FCFS/RD is near-optimal at light load

51
Rolling Priority
  • Time divided into epochs of fixed duration nT.
  • Connection Initialization
  • Ingress chooses a phase at random from the
    duration of an epoch.
  • At toffset, ingress sends an init packet along
    the path of connection
  • Init packets are rare and never dropped
  • At every link, a new epoch starts periodically
  • At each time slot, every link gives higher
    priority to the connection with earlier starting
    current epoch.

52
Roaming Honeypots for Mitigating
Denial-of-Service Attacks
Collaborators Daniel Mosse Taieb ZnatiPhD
student Sherif Khattab
53
Denial-of-Service (DoS) Attack
  • DoS attacks aim at disrupting legitimate
    utilization of network/server resources.

54
Clients
Servers
Dropped Requests
55
Packet Filtering
Not Scalable (Grows with number of users)
??
56
Packet Filtering
More Scalable attackers ??
57
Roaming Honeypots Basic Idea
A1
A1
A1
A1
58
Roaming Honeypots Basic Idea
A1
A1
A1
A1
A1
A1
59
Effect of Attack Load
With roaming honeypots, the service exhibits a
stable average response time even in the presence
of attacks with increasing intensity
Write a Comment
User Comments (0)
About PowerShow.com