Title: Architecture Exploration For Ambient Energy Harvesting Nonvolatile Processors
1Architecture Exploration For Ambient Energy
Harvesting Nonvolatile Processors
2Introduction
- Future powered by technology harvesting ambient
energy sources - Battery-free systems
- Ambient energy sources
- Solar Energy
- Wi-Fi and Radio Frequency (RF) energy
- Motion energy Piezoelectric devices
- Eg. Wireless powered smart contact lens
3Application Categories
- Applications vary in complexity, throughput
constraints and computational demands. - Based on demand for nonvolatility, categorized
into - Signal detection and sensing Detection and
relaying. Eg. UV radiation, blood pressure, blood
sugar level, temperature - Signal detection and analysis Computation
carried out for analyzing the signal for
diagnosis. Eg. wearable EEG/ECG - Signal prediction Predicts future pattern. Eg.
Wearable systems that warn against seizures - Ambient energy sources are unreliable. Category 1
is easier to implement - Category 2, 3 require QoS (to be completed within
fixed time)
4Energy Harvesting System Structure
- Energy Harvesting and Management
- Determines entire power used for signal
- sensing, processing and transmission
- Digital Signal Processor More about it
- later
- I/O Interface and analog RF frontend
- Digital interfaces, antennas, etc
5Processor Design Volatile Vs Nonvolatile
- Volatile processor with
- periodic checkpointing
- Forced rollback to previously
- checkpointed state
- NV processor enables more
- complex state-dependent
- signal processing that tolerates
- power source insufficiency and
- unreliability consumes more
- power for read and write
6Architectural Exploration
- Parameters to be analyzed
- Number of pipeline stages
- Data to be backed up
- Frequency of backup
- Assumptions
- MIPS ISA
- Clock frequency - 8 KHz limited strength of the
Wi-Fi signal used - Instruction memory (ROM) and ICache (SRAM, NVM)
- Data memory (nonvolatile) and DCache (NV
write-back)
7Non-pipelined Configuration (NP)
- Entire state of the processor can be
characterized by a single instruction state - Program Counter (PC) Instruction being executed
and needs to be stored - Register File (RegFile) Volatile RegFile is
energy efficient due to frequent usage and large
number of frequent read and writes - Tradeoff between energy consumed in backing up
and recovering data and the overall performance - Which data to save? When to save? 3 policies
- Backup Every Cycle (BEC)
- On Demand All Backup (ODAB)
- On Demand Selective Backup (ODSB)
8NV Backup Every Cycle (BEC)
- Employs NVM RegFile inspite of significant energy
penalty, else volatile and nonvolatile need to be
updated every cycle - PC and few registers in RegFile written every
cycle - Instructions like StoreWord and Jump do not
require RegFile write
9NV On Demand All Backup (ODAB)
- All RegFile entries to be backed up in the event
of reduced power state - If input power lt preset threshold, power warning
signal is activated - Control unit backs up PC and resets atomic flag
- Upon power restore, energy is accumulated in the
capacitor
10NV On Demand Selective Backup (ODSB)
- Synchronous power warning signal ensures that
current PC finishes executing and writing back.
PC 4 is stored to avoid re-execution - Change flag to identify if a register has been
written into - Control unit doesnt generate address for
unchanged data - Reduces backup time and energy penalty
11Simulation Results And Comparison
- Total area is similar as NVM cache and backup
blocks are much bigger than logic - BEC has lowest peak frequency due to frequent
backups - Recovery time Time from activation of Energy OK
signal to the time all backup operations are
complete - ODSB backup time lt ODAB backup time
12Simulation Results And Comparison
- ODSB is more energy efficient with stable source
like solar - ODSB can reduce backup energy penalty by 69 with
0.002 area overhead - BEC doesnt need time to accumulate energy in
cap, viable when power failure is extremely
frequent (less than 1 in 10 cycles)
13N-stage-pipeline
- Increased circuit complexity and activity factor
results in higher power threshold compared to
non-pipelined processor - 5 Stage Pipeline (5SP) under study
- Two backup schemes proposed
- Shifted PC and Volatile Flip-flops (SPC/VFF)
- Nonvolatile Flip-flops Solution (NVFF)
14Shifted PC Volatile FF (SPC/VFF)
- Pipelined data flow with bypass and forward,
complex control flow to handle hazard - Shifter buffer stores the PC value in each
pipeline stage - When power is down, PC in write back stage will
be finished, unfinished PC to be backed up will
be in data memory stage - Shifter used instead of rolling back since
different - PC needs to be backed up for jump and branch
- An extra 4 clock cycles are needed to re-execute
- the last 4 instructions lost from the latter
pipeline stages after recovery
15Nonvolatile FF Solution (NVFF)
- This solution uses NVM flip-flops
- SPC/VFF requires 11 less time and 57 less
energy than NVFF
16Out-of-order Processor (OoO)
- More complex than NP and 5SP
- System state is broadly distributed across
structures such as PC, ROB, RegFile, Map Table,
Issue Queue, Load Store Queue, BHT and BTB - Larger power requirement ? fewer periods where
the input power exceeds the min threshold. Which
structures need to be backed up?
17Resource Selection Strategies
- The resource selection strategies proposed are
- Minimum State Resource backup solution (MinR)
- Low-latency Backup solution (LLB)
- Middle-level Backup solution (MLB)
- Min-state-lost Backup solution (MPL)
- Integrated Flexible Atomic Backup Solution (IFA)
18Resource Selection Strategies
- Minimum State Resource backup solution (MinR)
- Backs up min number of bits required to preserve
functionality - Depends on branch misprediction mechanism to
minimize the number of valid/ relevant state bits
prior to backup. - ROB and PC Backs up the first uncommitted PC at
the head of ROB - ARegFile is backed up as it is small
- Map Table Pseudo-Misprediction is used to
restore Map table - PRegFile, Ready Table, Free List, BHT, BTB can be
recovered
19Resource Selection Strategies
- Low Latency Backup solution (LLB) Aims to
minimize the number of bits to store if backup
begins immediately - Backs up the entire ROB, IQ, ARegFile, Map Table
and PRegFile - Middle-level Backup solution (MLB) Backs up
Ready table and Free List as well - Min-state-lost Backup solution (MPL) All
structures including BHT, BTB backed up - Integrated Flexible Atomic Backup Solution (IFA)
Even if the power is below threshold, it could
allow for an optional state (BHT) to be stored
subjected to optimistic attempt
20OOO Strategies Comparison
- In MinR pseudo-misprediction operation for map
table requires extra backup clock cycles. While
recovering, extra clock cycles needed to restore
PRegFile, Ready Table and Free Table
21OoO Strategies Comparison
- LLB ROB, PRegFile are large ? increase backup
time and energy. Recovery energy is smaller as
instructions in ROB are backed up (no
re-execution) - MPL incurs largest backup and recovery penalties,
but backing up all structures incurs min latency
to return to peak performance after a power
failure - OoO needs higher threshold, but periods of
sufficient power are common enough to allow
superior performance to pay for lost clock cycles
22Simulation Results
- The configurations are compared with baseline
non-pipelined volatile processor without
checkpointing or data backup - The volatile processors progress returns to zero
when power drops to below threshold - Nonvolatile NP and 5SP have higher power
threshold - OoO runs for only a small fraction of time but
its performance can be upto 4x faster than NP and
5SP
23Validation
- Non pipelined On Demand strategy was explored
using an actual fabricated processor (THU1010N) - It has an Intel 8051 CISC like architecture
- The saved state includes the state machine that
captures current instruction - PC, RegFiles are FeRAM based FF. FF have
additional backup FeCap - NV processor based system interfaced to a solar
panel and UV sensor
24Operation
- Upon power failure detection, NV control logic
backs up DFFs to FeCaps - When power resumes, data is restored from FeCaps
to DFFs - Internal RC oscillator is used. External osc
becomes unstable with low power - Simulator calibration
- Several kernels executed both on platform and
simulator - Intermittent power supply modeled by a 1KHz
square waveform - Processor frequency 3MHz
- Each kernel is executed 1000 times to obtain
completion time - Stable power case No mismatch Unstable power
case mismatch lt 5 - Simulator averages energy consumed by instruction
to estimate remaining energy
25Dependence On Input Power
- Input signal characteristics plays a major role
in determining optimal design. - Performance of backup schemes with home and
office Wi-Fi sources for harvesting - In home, NP ODSB architecture is best performing,
in office OoO MPL is most desirable
26Dependence On Nature Of Input Source
- Input energy sources differ in magnitude
- For each case, the best performing backup policy
is adopted - For same input power source, the actual execution
time for NP and 5SP is almost same - Higher power threshold in OoO results in longer
Off time
27Meeting QoS Requirements
- Some application (like ECG) require periodic
outputs within fixed time periods QoS
constraints - Ambient energy - unreliable
- Piezo and solar can provide almost 100 QoS
- QoS can be improved by
- Shrinking size and using FinFETs
- Power reduction techniques dark silicon aware
architecture, clock gating, DVFS, DATS, Tunnel
FET, low power sub-threshold circuits
28Conclusion
- Explored various factors battery-less system
with ambient energy - Intermittent energy source Different nonvolatile
processor configurations, techniques to conserve
state while maximizing forward progress - Examined tradeoffs between performance and energy
for different architecture - Compared and validated simulation results with
nonvolatile solar energy harvesting processor
platform - The video of HPCA 2015 Best Paper Competition Demo
29References
- KaiSheng Ma, Yang Zheng, Shuangchen Li, Karthik
Swaminathan, Xueqing Li, Yongpan Liu, Jack
Sampson, Yuan Xie, Vijaykrishnan Narayanan. "
Architecture Exploration for Ambient Energy
Harvesting Nonvolatile Processors", The
International Symposium on High-Performance
Computer Architecture (HPCA-21) - A. Parks, A. Sample, Z. Yi, and J. Smith. A
wireless sensing platform utilizing ambient RF
energy. In IEEE Radio and Wireless Symposium
(RWS), 2013. - S. Kannan, A. Gavrilovska, K. Schwan, and D.
Milojicic. Optimizing checkpoints using NVM as
virtual memory. In IPDPS, 2013. - X. Dong, C. Xu, Y. Xie, and N. Jouppi. NVSim A
circuit-level performance, energy, and area model
for emerging nonvolatile memory. IEEE
Transactions on Computer-Aided Design of
Integrated Circuits and Systems, 31(7)9941007,
2012.
30Questions?
31Thank You