Integrated Management of Power Aware Computing - PowerPoint PPT Presentation

1 / 95
About This Presentation
Title:

Integrated Management of Power Aware Computing

Description:

Integrated Management of Power Aware Computing & Communication Technologies Review Meeting Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi, UC Irvine – PowerPoint PPT presentation

Number of Views:146
Avg rating:3.0/5.0
Slides: 96
Provided by: Me
Category:

less

Transcript and Presenter's Notes

Title: Integrated Management of Power Aware Computing


1
Integrated Management of Power Aware Computing
Communication Technologies
  • Review Meeting
  • Nader Bagherzadeh, Pai H. Chou, Fadi Kurdahi, UC
    Irvine
  • Jean-Luc Gaudiot, USC,Nazeeh Aranki, Benny
    Toomarian, JPL
  • DARPA Contract F33615-00-1-1719
  • June 13, 2001
  • JPL -- Pasadena, CA

2
Agenda
  • Administrative
  • Review of milestones, schedule
  • Technical presentation
  • Progress
  • Applications (UAV/DAATR, Rover, Deep Impact,
    distributed sensors)
  • Scheduling (system-level pipelining)
  • Advanced microarchitecture power modeling (SMT)
  • Architecture (mode selection with overhead)
  • Integration (Copper, JPL, COTS data sheet)
  • Lessons learned
  • Challenges, issues
  • Next accomplishments
  • Questions action items review.

3
Quad Chart
Behavior
Innovations
high-level simulation
  • Component-based power-aware design
  • Exploit off-the-shelf components protocols
  • Best price/performance, reliable, cheap to
    replace
  • CAD tool for global power policy optimization
  • Optimal partitioning, scheduling, configuration
  • Manage entire system, including mechanical
    thermal
  • Power-aware reconfigurable architectures
  • Reusable platform for many missions
  • Bus segmentation, voltage / frequency scaling

functional partitioning scheduling
Architecture
mapping
system integration synthesis
static configuration
dynamic powermanagement
Year 1
Year 2
Impact
Kickoff
2Q 02
2Q 00
2Q 01
  • Static hybrid optimizations
  • partitioning / allocation
  • scheduling
  • bus segmentation
  • voltage scaling
  • COTS component library
  • FireWire and I2C bus models
  • Static composition authoring
  • Architecture definition
  • High-level simulation
  • Benchmark Identification
  • Dynamic optimizations
  • task migration
  • processor shutdown
  • bus segmentation
  • frequency scaling
  • Parameterizable components library
  • Generalized bus models
  • Dynamic reconfiguration authoring
  • Architecture reconfiguration
  • Low-level simulation
  • System benchmarking
  • Enhanced mission success
  • More task for the same power
  • Dramatic reduction in mission completion time
  • Cost saving over a variety of missions
  • Reusable platform design techniques
  • Fast turnaround time by configuration, not
    redesign
  • Confidence in complex design points
  • Provably correct functional/power constraints
  • Retargetable optimization to eliminate overdesign
  • Power protocol for massive scale

4
Program Overview
  • Power-aware system-level design
  • Amdahl's law applies to power as well as
    performance
  • Enhance mission success (time, task)
  • Rapid customization for different missions
  • Design tool
  • Exploration evaluation
  • Optimization specialization
  • Technique integration
  • System architecture
  • Statically configurable
  • Dynamically adaptive
  • Use COTS parts protocols

5
Personnel teaming plans
  • UC Irvine - Design tools
  • Nader Bagherzadeh - PI
  • Pai Chou - Co-PI
  • Fadi Kurdahi
  • Jinfeng Liu
  • Dexin Li
  • Duan Tran
  • USC - Component power optimization
  • Jean-Luc Gaudiot - faculty participant
  • Seong-Won Lee - student
  • JPL - Applications benchmarking
  • Nazeeh Aranki
  • Nikzad Benny Toomarian

- students
6
Milestones Schedule
  • Static hybrid optimizations
  • partitioning / allocation
  • scheduling
  • bus segmentation
  • voltage scaling
  • COTS component library
  • FireWire and I2C bus models
  • Static composition authoring
  • Architecture definition
  • High-level simulation
  • Benchmark Identification
  • Dynamic optimizations
  • task migration
  • processor shutdown
  • bus segmentation
  • frequency scaling
  • Parameterizable components library
  • Generalized bus models
  • Dynamic reconfiguration authoring
  • Architecture reconfiguration
  • Low-level simulation
  • System benchmarking

7
Review of Progress
  • May'00 Kickoff meeting (Scottsdale, AZ)
  • Sept'00 Review meeting (UCI)
  • Scheduling formulation, UI mockup, System level
    configuration
  • Examples Pathfinder X-2000 (manual solution)
  • Nov'00 PI meeting (Annapolis, MD)
  • Tools scheduler UI v.1 (Java)
  • Examples Pathfinder X-2000 (automated)
  • Apr'01 PI meeting (San Diego, CA)
  • Tools scheduler UI v.2 - v.3 (Jython)
  • Examples Pathfinder initial UAV (Pipelined)
  • June'01 Review meeting

we are here!
8
New for this Review (June '01)
  • Tools
  • Scheduler UI v.4 (pipelined, buffer matching)
  • Mode selector v.1 (mode change overhead,
    constraint based)
  • SMT model
  • Examples
  • Pathfinder, µAMPS sensors (mode selection)
  • UAV, Wavelet (dataflow) (pipelined, detailed
    estimate)
  • Deep Impact (command driven) (planning)
  • Integration
  • Input from Copper timing/power
    estimation (PowerPC simulation model)
  • Output to Copper power profile
    budget (Copper Compiler)
  • Within IMPACCT initial Scheduler Mode
    Selector integration

9
Overview of Design Flow
  • Input
  • Tasks, constraints, component library
  • Estimation (measurement or simulation via
    COPPER)
  • Refinement Loop
  • Scheduling (pipeline/transform)
  • Mode Selection (either before or after
    scheduling)
  • System level simulation (planned integration)
  • Output to COPPER
  • Interchange Format
  • Power Profile, Schedule, Selected modes
  • Code Generation
  • Microarchitecture Simulation

10
Design Flow
task allocation, component selection
task model, timing /power constraints
scheduler
high-level simulator
IMPACCT
component library
mode model
mode selector
power profile, C program
power timing estimation
powersimulator
Compiler
low-level simulator
COPPER
executable
11
Power Aware Scheduling
  • Execution model
  • Multiple processors, multiple power consumers
  • Multiple domains digital, thermal, mechanical
  • Constraint driven
  • Min / Max power
  • Min / Max timing constraints
  • Handles problems in different domains
  • Time Driven
  • System level pipelining -- in time and in space
  • Parallelism extraction
  • Experimental results
  • Coarse to fine grained parallelism tradeoffs

12
Prototype of GUI scheduling tool
  • Power-aware Gantt chart
  • Time view
  • Timing of all tasks on parallel resources
  • Power consumption of each task
  • Power view
  • System-level power profile
  • Min/max power constraint, energy cost
  • Interactive scheduling
  • Automated schedulers timing, power, loop
  • Manual intervention drag drop
  • Demo available

13
Power-Aware Scheduling
  • New constraint-based application model paper at
    Codes'01
  • Min/Max Timing constraints
  • Precedence, subsumes dataflow, general timing,
    shared resource
  • Dependency across iteration boundaries loop
    pipelining
  • Execution delay of tasks enables
    frequency/voltage scaling
  • Power constraints
  • Max power total power budget
  • Min power controls power jitter or force
    utilization of free source
  • System-level, multi-scenario scheduling paper at
    DAC'01
  • 25 Faster while saving 31 energy cost
  • Exploits "free" power (solar, nuclear min-output)
  • System-level loop pipelining working papers
  • Borrow time and power across iteration boundaries
  • Aggressive design space exploration by new
    constraint classification
  • Achieves 49 speedup and 24 energy reduction

14
Scheduling case studyMars Pathfinder
  • System specification
  • 6 wheel motors
  • 4 steering motors
  • System health check
  • Hazard detection
  • Power supply
  • Battery (non-rechargeable)
  • Solar panel
  • Power consumption
  • Digital
  • Computation, imaging, communication, control
  • Mechanical
  • Driving, steering
  • Thermal
  • Motors must be heated in low-temperature
    environment

15
Scheduling case studyMars Pathfinder
  • Input
  • Time-constrained tasks
  • Min/Max Power constraints
  • Rationale control jitter, ensure utilization of
    free power
  • Core algorithm
  • Static analysis of slack properties
  • Solves time constraints by branchbound
  • Solves power constraints by local movements
    within slacks
  • Target architecture
  • X-2000 like configurable space platform
  • Symmetric multiprocessors, multi-domain power
    consumers, solar/batt
  • Results
  • Ability to track power availability
  • Finishes task faster while incurring less energy
    cost

16
More aggressive schedulingSystem-level
pipelining
  • Borrow tasks across iterations
  • Alleviates "hot spots" by spreading to another
    iteration
  • Smooth out utilization by borrowing across
    iterations
  • Core techniques
  • Formulation separate pseudo dependency from
    true dependency
  • Static analysis and task transformation
  • Augmented scheduler for new dependency
  • Results -- on Mars Pathfinder example
  • Additional energy savings with speedup
  • Smoother power profile

17
Scheduling case studyUAV DAATR
  • Example of a very different nature!
  • Algorithm, rather than "system" example
  • Target architecture
  • C code -- unspecified assume sequential
    execution, no parallelism
  • MatLab -- unmapped
  • Algorithm
  • Sequential, given in MatLab or C
  • Potential parallelism in space, not in time
  • Constraints dependencies
  • Dataflow partial ordering
  • Timing latency no pairwise Min/Max timing
  • Power budget for different resolutions

18
Scheduling case studyUAV example (cont'd)
  • Challenge Parallelism Extraction
  • Essential to enable scheduling
  • Difficult to automate need manual code rewrite
  • Different pipeline stages must be relatively
    similar in length
  • Rewritten code
  • Inserted checkpoints for power estimation
  • Error prone buffer mapping between iterations
  • Found a dozen bugs in benchmark C code
  • Missing Summation in standard deviation
    calculation
  • Frame buffer off by one line
  • Dangling pointers not exposed until pipelined

19
ATR application what we are given
1 Frame
Bugs
Target Detection
3 filters
m Detections
FFT
FFT
FFT
FFT
FFT
FFT
Filter/IFFT
Filter/IFFT
Filter/IFFT
Filter/IFFT
Filter/IFFT
Filter/IFFT
20
Bug report
  • Misread input data file
  • OK, no effect to the algorithm
  • Miscalculate mean, std for image
  • OK, these values not used (currently)
  • Wrong filter data for SUN/PowerPC
  • OK for us, since we operate on different
    platforms
  • Bad for SUN/PowerPC users, wrong results
  • Misplaced FFT module
  • The algorithm is wrong
  • However, these problems are not captured in the
    output image files

21
What it should look like
1 Frame
Target Detection
m Detections
3 filters
k distances
22
What it really should look like
1 Frame
Target Detection
m Detections
3 filters
k distances
23
Problems
  • Limited parallelism
  • Serial data flow with tight dependency
  • Parallelism available (diff. detections, filters,
    etc) but limited
  • Limited ability to extract parallelism
  • Limited by serial execution model (C
    implementation)
  • No available parallel platforms
  • Limited scalability
  • Cannot guarantee response time for big images (N2
    complexity)
  • Cannot apply optimization for small images (each
    block is too small)
  • Limited system-level knowledge
  • High-level knowledge lost in a particular
    implementation

24
Our vision 2-dimensional partitioning
Output target detection w/ distance for N
simultaneous frames
25
System-level blocks
InputN simultaneous frames
N Frames(N target detection)
Target Detection
M Targets(M FFTs)
FFT
M Targets(3M IFFTs)
Filter/IFFT
K Distances(2K IFFTs)
Compute Distance
Output target detection w/ distance for N
simultaneous frames
26
Our vision
27
System-level pipelining
InputN simultaneous frames
Target Detection
FFT
Filter/IFFT
Compute Distance
Output target detection w/ distance for N
simultaneous frames
28
What does it buy us?
  • Parallelism
  • All modules run in PARALLEL
  • Each module processes N (M, K) INDEPENDENT
    instances, that could all be processed in
    parallel
  • NO DATA DEPENDENCY between modules
  • Throughput
  • Throughput multiplied by processing units
  • Process N frames at a reduced response time
  • Better utilization of resources

29
What does it buy us? (cont'd)
  • Flexibility
  • Insert / remove modules at any time
  • Adjust N, (M or K) at any time
  • Make each module parallel / serial at any time
  • More knobs to tune parallelism / response time /
    throughput / power
  • Driven by run-time constraints
  • Scalability
  • Reduced response time on big images (small N
    and/or deeper pipe)
  • Better utilization/throughput on small images
  • More compiler support
  • Simple control / data flow each module is just a
    simple loop, which is essentially parallel
  • Need an automatic partitioning tool to take
    horizontal cuts

30
What does it buy us how power-aware is it?
  • Subsystems shut-down
  • Turn on / off any time based on power budget
  • Split / merge (migrate) modules on demand
  • Power-aware scheduling
  • Each task can be scheduled at any time during one
    pipe stage, since they are totally independent
  • More scheduling opportunity with an entire system
  • Dynamic voltage/frequency scaling
  • The amount of computation N, (M or K) is known
    ahead of time
  • Scaling factor C / N (very simple!)
  • Less variance of code behavior gt strong
    guarantee to meet deadline, more accurate power
    estimates
  • Run-time code versioning
  • Select right code based on N, (M or K)

31
Experimental implementationpipelining
transformation
  • Goal
  • To make everything completely independent
  • Methodology
  • Dataflow graph extraction (vertical)
  • Initial partitioning (currently manual with some
    aids from COPPER)
  • Horizontal clustering
  • Horizontal cut (final partitioning)
  • Techniques
  • Buffer assignment each module gets its own
    buffer
  • Buffer renaming read/write on different buffer
  • Circular buffer each module gets a window of
    fixed buffer size
  • Our approach the combination

32
Buffer rotation
Circular buffer B
B
Pipe stages a, b, c, d
33
Background - acyclic dataflow
  • Single circular buffer
  • One serial data flow path
  • All data flows are of same type same size
  • Multiple buffers
  • Multiple data flow paths
  • Different type, size

a
a
b
b
c
c
d
d
34
A more complete picture
3. Life-time spent in pipeline
4. Buffer dead
2. Buffer live
Circular buffer A, B
1. Buffer ready(raw data, e.g. ATR images)
Pipe stages a, b, c, d
Head pointer
35
How does it work?
  • Raw data is dumped into the buffer from the data
    sources
  • A head pointer keeps incrementing
  • Buffer is ready, but not live (active in
    pipeline) yet
  • Example, ATR image data coming from sensors
  • Buffer becomes live in pipeline
  • Raw data are consumed and/or forwarded
  • New data are produced/consumed
  • When a buffer is no longer needed by any pipeline
    stages, it is dead and recycled
  • Is everything really independent?
  • Yes!
  • At each snapshot, each module is operating on
    different data

36
What are we trading off?
Speed computation intensity, parallelism,throughp
ut,power
Time Response time, delay
Workload amount of computation, energy
37
3-D Design space navigation
Workload N frames
Time
Speed
38
Design flow
C Source code
IMPACCT pipeline code versioning
DFG
Pipelined C Source code
COPPER power simulator
Task-level constraints
Power-aware schedule
IMPACCT scheduler and mode selection
System-level constraints
39
Scheduling case studyWavelet compression (JPL)
  • Algorithm in C
  • Wavelet decomposition
  • Compression "knob" to choose lossy factor or
    lossless
  • Example category
  • Dataflow, similar to DAATR
  • Finer grained, better structure
  • IMPACCT improvements
  • Transformation to enable pipelining
  • Exploit lossy factor in trade space

40
Wavelet Algorithm
  • Wavelet Decomposition
  • Quantization
  • Entropy coding

41
Wavelet Algorithm structure
For all image blocks
Initialization (check params, allocate memory)
block init.,set params, read image block
decomp(), (lossless FWT)
  • Sequential execution blocks
  • No data dependency between image blocks

(remove overlap)
Bit_plane_decomp, (set decomp param)
(1st level entropy coding)
Output result to file
(bit_plane encoding)
42
Wavelet experiments
  • Experiments being conducted
  • Checkpoints marked up manually
  • Initial power estimation obtained
  • Code being manually rewritten / restructured for
    pipelining
  • Appears better structured than UAV example
  • Trade space
  • High performance to low power
  • Pipelining in space and in time, similar to UAV
    example
  • Lossy compression parameter

43
Ongoing scheduling case studyDeep Impact
  • "Planning" level example
  • Coarse grained, system level
  • Hardware architecture
  • COTS PowerPC 750 babybed, emulating a Rad-Hard
    PPC at 4xgt Models the X-2000 architecture using
    DS1 software
  • COTS PowerPC 603e board, emulating I/O devices in
    real time
  • Software architecture
  • vxWorks, static priority driven, preemptive
  • JPL's own software architecture -- command based
  • 1/8 second time steps 1-second control loops
  • Task set
  • 60 tasks to schedule, 255 priority levels

44
NASA Deep Impact project
  • Platform
  • X-2000 configurable architecture
  • to be using RAD 6000 (Rad-Hard PowerPC 750
    _at_133MHz)
  • Testbed (JPL Autonomy Lab)
  • PPC 750 single-board computer -- runs flight
    software
  • Prototype _at_233MHz, Real flight _at_133MHz
  • COTS board, L1 only, no L2 cache
  • PowerPC 603e -- emulate the I/O devices
  • connected via compact PCI
  • DS1 Deep Space One (legacy flight software )
  • Software architecture
  • 8 Hz ticks, command based
  • running on top of vxWorks
  • Perfmon performance monitoring utility in DS1
  • 11 test activities
  • 60 tasks

45
Deep Impact example (cont'd)
  • Available form Real-time Traces
  • Collected using Babybed
  • 90 seconds of trace, time-stamped tasks, L-1
    cache
  • Input needed
  • Algorithm (not available)
  • Timing / power constraints (easy)
  • Functional constraints
  • Sequence of events
  • Combinations of illegal modes
  • Challenges
  • Modeling two layers of software architecture
    (RTOS command)

46
Design Flow
task allocation, component selection
task model, timing /power constraints
scheduler
high-level simulator
IMPACCT
component library
mode model
mode selector
power profile, C program
power timing estimation
powersimulator
Compiler
low-level simulator
COPPER
executable
47
SMT Power Simulator
  • Simulator Features
  • Compatible with SimpleScalar 3.0b
  • Execute PISA and EV6 binaries
  • Portability Run on most kinds of computers
  • Handling Simultaneous Multithreading
  • Run up to 8 threads simultaneously
  • Similar to UW SMT model
  • Power Aware Features
  • Same analytic power model as WATTCH
  • Clock Gating
  • Parameterized Models
  • 42 functional unit classifications (WATTCH has
    12)
  • 10 dynamic activity factors (WATTCH has 4)

48
Examples of Module Classification
  • Functional Units include
  • Arithmetic units ALU, FPU, etc
  • Control units Instr decoder, etc
  • Memory units Caches, CAM, etc
  • Buses Result bus
  • Cache Access
  • Cache Hit
  • Read Tag Data
  • Cache Miss
  • Read Tag
  • Update Tag Data
  • Read Data
  • Arithmetic Operation 4 groups
  • Int ALU , -, bit operations
  • Int MULT ?, ?
  • FP ALU , -
  • FP MULT ?, ?

49
SMT Power Simulator
  • Project Status
  • Performance Simulator Done
  • Power Simulator Implementation is done
  • Power parameter verification on going
  • Verification Methodology
  • Analytic model
  • Proven models from WATTCH
  • Comparison with COTS processors
  • Motorola PowerPC 7450
  • Intel mobile Pentium III
  • Alpha 21264

50
Example of Verification with COTS Processors
PowerPC 7450 Power Consumption
  • Typical/Maximum Power Consumption
  • Typical -gt Average power consumption of
    applications
  • Maximum -gt Peak power consumption of applications
  • Benchmark simulations are needed to verify
  • Modules in operation
  • Deep Sleep Nothing -gt Static power dissipation
  • Sleep PLL working -gt Static PLL power
    dissipation
  • Nap BUS snooping -gt Static PLL I/O power
    dissipation
  • Doze No instruction fetch -gt no information

51
Example of Simulation Result
  • Processor Configuration
  • 4 issue superscalar
  • Target programs 4 simple test programs
  • Maximum power consumption
  • 87.37W at 4 ICP (Instruction per cycle) Maximum
    throughput
  • Clock gating
  • CC1 Max power for running units and zero for
    idle units
  • CC2 Input dependent power for running units and
    zero for idle units
  • CC3 Input dependent power for running units and
    static power for idle units

52
SMT Simulation Methodology
  • Input
  • C Program
  • Executable Binaries
  • PISA
  • EV6
  • Processor Parameters
  • Architectural Parameters
  • Output
  • Static Power Consumption
  • Program independent
  • Dynamic Power Consumption
  • Program dependent
  • Power Profile Moving Avg.

Processor parameters
Target C Program
Power Parameters
Host Compiler
crossCompiler
Power Simulator
Dynamic Power
Dynamic Profile
Static Power
53
SMT Power Simulator Tool Usage
  • Host Portability
  • Any host computer that can run SimpleScalar
  • Execution command
  • sim-smt options target.list
  • List file content
  • executable program arguments
  • Processor parameters
  • -config configuration.file
  • Simulation results redirection
  • -redirsim simulator.result
  • -redirprog target.program.result

54
Mode Selection
  • Determine when what component is running at what
    mode
  • Mode selection is non-trivial
  • Scheduler will be overwhelmed to determine
    component modes at the same time!
  • Exploration space of all mode combinations is
    tremendous
  • Greedy solution may fail mission
    timing-constraints or power constraints
  • Mode selection is worthwhile
  • Exploration spaces exist to improve power
    reduction and power-awareness
  • Energy saving ( 5-15) Cost saving (10-40)
  • Ease the task planning and give a more realistic
    picture

55
Methodology and Design Flow
  • The whole picture - the integration of
  • Power-aware scheduler
  • Mode selector
  • Power estimation/profiling tools
  • Static view

modified schedule
Scheduler
Mode Selector
Initial schedule
Power/timing number power profile
Power/timing budget
Power profile
Power Estimator
Power/timing budget
56
System Modeling
  • Component power model
  • Power modes with overhead
  • System timing model
  • Constraint graph
  • Mode dependency modeling
  • Mode dependency graph
  • External parameters
  • Environment temperature
  • Surrounding terrain

57
Component Power Model
  • Power mode
  • Each mode is defined by power and timing
    attributes
  • Constant, Profile, external (environmental)
    parameters
  • May be hierarchical -- e..g. PowerPC 7450
  • active cache on cache settings , cache
    off, voltage scaling, clock scaling ,
  • doze clock scaling ,
  • nap
  • deep sleep
  • Overhead on mode changes
  • Power overhead, timing overhead
  • e.g. preheating a motor, voltage scaling, PLL
  • Environmental parameters
  • e.g. temperature, terrain (roughness of ground
    for a motor)
  • Affect power and timing overhead

58
Component Model Examples
  • Driving motor
  • Power is function of Temperature
  • Mode change time also functionof Temperature T
  • Microprocessor (PowerPC 603e)

Power 2.2W Time (1.875T10)(Tlt0) 10(T0)
off
on
Power 0.1225T 1.0
0W
Power 0.5W Time 3
Full power
4.0W
DPM
3.2W
10 cycles -
10 cycles -
100us 255 bus clocks 10 cycles
10 cycles -
100us 255 bus clocks 10 cycles
10 cycles -
Doze
Sleep
Nap
40mW
1.0W
70mW
t1 3 cycles
3 cycles
t1 3 cycles
59
FireWire Bus Power Model
  • Cable Power
  • Pc µL Cf (µ constant, L cable length, Cf
    data transfer rate)
  • Driver Power (Pd)
  • Fast lookup table
  • Protocol simulator (in progress)
  • Event-driven system-level simulator
  • Generated event traces for high level power
    estimation
  • Bus Power
  • Pbus Pc Pd

60
Design Flow
task allocation, component selection
task model, timing /power constraints
scheduler
high-level simulator
IMPACCT
component library
mode model
mode selector
power profile, C program
power timing estimation
powersimulator
Compiler
low-level simulator
COPPER
executable
61
Timing Constraint graph
  • Min/max timing constraints
  • between pairs of events
  • Vertices
  • Represent events
  • A task has a Start and an End evente.g. A.s
    start event of task A, B.e end event of
    task B
  • Directed edges
  • Weights on edges
  • Nonnegative weight min constraint
  • Negative weight -max constraint

10
A.s
B.e
End event of B should be no earlier than 10 time
units after the start event of A
-10
A.s
B.s
Start event of B should be no later than 10 time
units after the start event of A
62
System Timing Modeling Example
Haz hazard detector Str steering motor Drv
driving motor Cam camera Ppc processor Sci
scientific device Rf radio frequency modem
  • Micro Rover example
  • Multiple resources
  • Timing constraints between tasks

sci.s
rf.e
1
1
-30
-20
-5
ppc1.s
ppc2.s
ppc1.e
str.s
str.s
1
1
-10
5
1
Haz.e
drv.e
cam.s
drv.s
63
Mode Dependency Modeling
  • Functional modes
  • examples ATR -- short range, middle range
  • behavior choice as dictated by functional
    requirements(i.e., not controllable by power
    management)
  • Component modes
  • examples processor full-on, sleep, doze,
    voltage/clock scaling
  • operational setting of component(i.e., open to
    mode selection for meeting power/timing
    constraints)
  • Dependencies
  • Among functional modes (of different activities)
  • Among component modes
  • Between functional and component modes
  • e.g., ATR in short-range mode, Processor running
    in high-clock rate

64
Mode dependency graph
  • Directed acyclic graph
  • Mode Vertices
  • modes of component
  • Edges
  • mode dependency "only if"
  • mode A chosen implies B may be chosen
  • mode B NOT chosen gt NOT mode A
  • Operator vertices
  • AND, OR, MUTEX
  • (C op D) implies E may be chosen
  • not E gt (C op D) must be false
  • op imposes constraint on combination of C, D

mode
A
B
C
op
op
E
D
65
Mode dependency example Rover
  • Components
  • hazard detector, driving motor, steering motor
  • Constraints on modes
  • hazard detector and the motors should not be
    working at the same time
  • Mode combinations

str.on
OR
MUTEX
drv.on
haz.on
haz hazard detector str steering motor drv
driving motor
66
Mode Modeling ExampleµAMPS sensors
  • Components
  • processor, memory, RF, sensor
  • Constraints on modes
  • Processor is active when both radio and sensor is
    active
  • Memory is active only when processor is active
  • Microsensor architecture

S.on
A.sleep
AND
R.on
A.sleep
S.on
XOR
R.rx
A.idle
MUTEX
R.rx_tx
A.active
M.on
A.active
M.on
AARM Mmemory R radio S sensor
67
Mode Modeling of µAMPS sensors(contd)
  • Mode combinations considered
  • by MIT group 5 combinations
  • manual grouping, ad hoc
  • Our method
  • 3 more combinations
  • systematically generated from dependency graph
  • Add constraint
  • When sensor is off, all other component should be
    off (proactive)
  • Automatically obtain same results as MIT group

Not given by MIT group
R.on
S.on
68
Mode Combination Enumeration- Using Dependency
Graph
Radio
  • Component level mode dep. graph
  • Group modes by component
  • Show mode dependency between components
  • Enumerating reachable modes
  • Topological sorting
  • Graph helps prune out infeasible mode
    combinations
  • Break cycle in comp. graph
  • Removing an edge in cycle
  • Keep track of the last dependent successor
    component

ARM
Memory
Sensor
Radio
Sensor
ARM
Memory
off
off
sleep
off
on
off
sleep
off
idle
off
on
idle
off
active
on
69
External Parameters Constraints
  • Parameters in system model
  • Temperature, terrain
  • Used to characterize components and their
    overhead
  • System Constraints
  • Maximum Power constraint
  • Constant or power profile (function of time)
  • Minimum Power constraint
  • Constant or power profile ( function of
    time)
  • Total energy constraint ( under working)
  • Mission time (mission deadline)

Power consumption of Driving motor at different
temperatures
70
System Power Representation
  • Schedule
  • Gantt Chart
  • Time view
  • Power view
  • Mode selection
  • Gantt chart
  • Tasks marked with mode settings
  • Added non-operating tasks
  • Idle intervals
  • mode change overheads
  • Power profile view

71
Design Flow
task allocation, component selection
task model, timing /power constraints
scheduler
high-level simulator
IMPACCT
component library
mode model
mode selector
power profile, C program
power timing estimation
powersimulator
Compiler
low-level simulator
COPPER
executable
72
Mode selection Problem statement
  • Input
  • initial schedule (timing power)
  • component model, system model
  • initial selection of modes
  • Objective
  • Model mode change overhead (timing, power)
  • Capture sequence of mode changes
  • Minimize energy cost by consideringoverhead
    tradeoffs
  • Output
  • Schedule for power timing, with overhead
  • Augmented schedule with selected mode

73
Application Example Rover
  • Behaviors and tasks
  • Moving around on Mars surface
  • Hazard detection, driving and steering
  • Communicating with the Lander
  • Taking pictures (IMP)
  • Performing scientific experiments (APXS, ASI/MET)
  • Components in the entire system
  • Hazard detector (HAZ)
  • Driving motor (DRV)
  • Steer motor (STR)
  • Radio frequency modem (RF)
  • Camera (CAM)
  • Microprocessor (PowerPC)
  • Microcontroller (ARM)

A schedule of the electronic subsystem of micro
rover
74
Mode selection ResultsEnergy savings
  • Traditional approach
  • Only two modes On, Off
  • Timing constraints ONLY
  • Power constraints may be violated
  • Considers mode change overhead
  • Our Approachwith Mode Selection
  • All legal mode combinations
  • Both timing and power constraints
  • Detailed mode change overhead
  • Results
  • Energy saving 3.7 to 11.9
  • average saving 8.7

75
Results for mode selectionCost savings
  • Cost vs. Energy saving
  • Cost defined as energy above minimum constraints
  • Savings
  • From 6.9 to 49.3
  • average 26.5

76
Exploring Different Working Scenarios
  • Three tasks
  • Moving around (MOV)
  • Taking picture (CAM)
  • Scientific experiment (SCI)
  • Three scenarios
  • A MOV, CAM, SCI
  • B CAM, MOV, SCI
  • C CAM, SCI, MOV
  • Temperature profile is given as

77
Result III
  • Scenarios consume different amounts of energy
  • Scenario C consumes 12 more energy than scenario
    A (by mode selection)
  • Mode selection always does better
  • compared to (on, off) only
  • up to 11.7 energy saving

78
Mode selection Issues
  • Challenges
  • Explosion of state space -- grows exponentially
  • Modeling restrictions in mode change sequence
  • Solution / novelty
  • Formalism for mode dependency at component level
    system level
  • Systematically prune search space
  • Experimental results
  • Energy and time saved
  • More accurate modeling of overhead

79
Accomplishments to date
  • Power-aware scheduling
  • Multi-processor/domain, Min / Max power and
    timing constraints
  • 3 classes of system level pipelining techniques
  • Mode selection
  • Component and system model
  • Captures power timing overhead on mode change
  • Incorporating power models and simulators
  • SMT simulator for advanced microarchitectural
    exploration
  • FireWire, DRAM, cache, PowerPC
  • Tool prototype Integration
  • GUI for power-aware Gantt chart scheduling mode
    selection
  • Power aware visualization tool for benchmarks
  • Interface to COPPER project

80
Lessons learned
  • Challenges
  • Not all applications fit a given model
  • Alternative design flows may be required for
    different applications
  • Manually extract parallelism dependency in
    benchmarks
  • Capture mode dependency in components
    applications
  • Integration of good power models for PowerPC
  • Right level of abstraction
  • Many low-level power models available not always
    usable
  • Need system-level power estimations
  • Details of the architecture model
  • Memory / bus power models
  • Overhead for voltage/frequency scaling

81
Fulfilled Milestones
  • Power-aware scheduling 3 papers
  • Multi-scenario
  • System-level pipelining
  • Mode selection
  • encompass power management (voltage/freq scaling)
  • UI prototype
  • scheduling, mode selection, benchmark
    visualization
  • Initial tool integration
  • interface to COPPER
  • Processor power simulation models
  • SMT simulator

82
Upcoming Milestones
  • Dynamic optimization
  • Scheduling and planning -- using the Deep Impact
    example
  • Pipeline depth/width tuning at run-time
  • Additional static optimization
  • component selection/assignment
  • bus topology optimization
  • Simulation
  • Bus simulation models
  • SMT -- Thermal dissipation profiling,Dynamic
    power/thermal management
  • Tool integration
  • Simulation models from other groups
  • IMPACCT tools and library
  • tighter integration between IMPACCT and COPPER

83
Ideas dynamic optimization
  • More dynamic scenarios
  • Power suddenly cut off, with small power reserve
    before shutdown
  • Mission replanning, changing objectives
  • Solutions required
  • Division between static preparation dynamic
    handling
  • Ability to decide most important actions to take
    under extreme time constraint
  • Need feedback/notification mechanism in execution
    model
  • Decentralized power management
  • Need new benchmark examples

84
Future planned evaluation
  • Deep Impact from JPL
  • Mission planning and scheduling example
  • Image compression (wavelet) algorithm
  • Architectural mapping
  • JPL Testbed
  • PPC750 board to measure actual power
  • PPC750 to simulate instrumentation in real-time
  • advanced board with real instrumentation
  • Validation through simulation
  • Scheduler output fed to COPPER for compilation
  • Simulation via COPPER and our own SMT
  • Compare estimated power with refined version

85
Applications
  • Space
  • Mars Rover (scheduling, mode selection)
  • Deep Impact (planning)
  • UAV
  • DAATR (pipelined scheduling) (mode
    selection under investigation)
  • Distributed sensors
  • MIT µAMPS sensor (mode selection)
  • Need apps requiring dynamic planning/reconfig!

86
Development plans
  • Scripting and web-based tool
  • Jython (Java Python), TkInter for GUI prototype
  • Core scheduler
  • Modular, detachable from GUI
  • Option to run on separate server or same process
    as UI
  • CGI scripts for arch. configuration (unix/web
    based)
  • Latest version distributed thru WebCVS
  • Interface with commercial CAD backend
  • Detailed power estimation tools
  • Functional simulation with proprietary models
  • Rationale
  • Open source, runs on any platform
  • All publicly available development tools
  • Trivial to install, no compilation, encourage
    modification

87
Technology Transition --Consystant Design
Technologies
  • Version 1 released Apr.11
  • shown at ESC
  • runs on Linux
  • will support Solaris, Win2k
  • Extensible system
  • platform plugin for synthesis
  • targets Linux, vxWorks,
  • Simulator
  • selective focus
  • coordination centric
  • Active collaboration confirmed
  • Installation in week of June 25
  • Designated application engineer

88
http//www.ece.uci.edu/impacct/
89
Metrics
  • Source-aware energy model
  • Takes free energy into account
  • Cost for not using free energy
  • Profile-aware
  • Total energy dependent on consumers power
    profile
  • Smoothness of power draw
  • Scenario-aware
  • Cost function tracks external factors (e.g.
    temperature, solar level)
  • Stage in mission
  • Timing/performance
  • Makespan (length of an iteration)
  • Dynamic planning cost

90
Architectural Configuration
  • Mode selection
  • Power consumption level (doze, nap, sleep, etc.)
  • Low power design techniques
  • Clock scaling, voltage scaling
  • Memory/cache configurations, bus encoding
  • Communication protocols, compression, algorithm
    transformations
  • Optimize feasible solutions for energy/timing
    costs
  • Power, Real time, Inter-resource modes
    constraints
  • Constraints between functionality modes and
    resources modes Functionality mode and resource
    modes
  • Bus topology optimization
  • Static clustering and bus partitioning
  • Dynamic reclustering with shutdown

91
Application - Mars Rover
  • Mission-critical embedded system
  • Hard real-time system
  • Composed of COTS component
  • Electronic µprocessor, µcontroller,
    memory,camera, scientific devices, ...
  • Mechanics/thermal driving motor, steering motor,
    heaters,
  • Power sources solar panel, battery
  • Power/energy and performance constraints
  • Stringent max power constraint
  • Flexible min power constraint
  • Limited non-rechargeable energy sources
  • Global timing requirement
  • Limited working window during sol daytime
  • Timing constraint among tasks
  • Harsh and uncertain working environment
  • Extremely low temperature - affects component
    behaviors
  • Uncertain environment winds/obstacles/rugged
    terrain

92
Example Platform- X2000
  • COTS components Modeling
  • Processors (PowerPC 603e, 750)
  • Memory organization (cache, memory)
  • System interconnects (FireWire bus
    driver/controller)
  • Scientific equipment
  • Sensors/actuators
  • Mechanics/Thermals (driving/steering
    motors/heaters)
  • System-level architecture modeling
  • Tree topology for FireWire bus architecture
  • Component clustering for bus segmentation

93
Testing Methodologies
  • A
  • "Activity" for given duration (5 s, 10 s, 15 s)
  • repeated 6 times
  • record both I-cache D-cache misses (recorded in
    separate runs)
  • B
  • Recording 90 seconds worth of an Activity till
    its completion
  • 1 minute gap between runs
  • also I-cache D-cache misses
  • C -- what is measurement C?

94
User Input
  • Attributes
  • tasks, resources, timing constraints,
  • power budgets
  • Unique features
  • power as constraint
  • scheduling, system-level mission planning,
    power-aware loop pipelining,
  • timing constraint classification.
  • subsumes deadline, dataflow
  • Language
  • mix of graphical and custom constraint language

95
Methodology and Work Flow
  • Exploration techniques
  • Backtracking
  • Cutting exploration space with multi-dimensional
    constraints
  • Two steps in design exploration
  • Find feasible mode selection for operating tasks
  • Timing constraints
  • Constraint graph
  • Resource slacks
  • Mission deadline
  • Dependency between tasks
  • Dependency graph
  • Find feasible mode selections for idle intervals
  • System power/energy constraints min, max, or
    power profile
  • Mode change overhead, both time and power
    overheads
  • Speedup techniques
  • Sorting component modes with power numbers
Write a Comment
User Comments (0)
About PowerShow.com