Title: HybDTM: A Coordinated HardwareSoftware Approach for Dynamic Thermal Management
1HybDTM A Coordinated Hardware-Software Approach
for Dynamic Thermal Management
- Amit Kumar, Li-Shiuan Peh, Niraj K. Jha
- Princeton University
- Li Shang
- Queens University
- July 26, 2006
2Outline
- Introduction
- Dynamic Thermal Management
- Hybrid approach
- Overview
- Thermal modeling
- Run-time characterization
- Proactive/reactive DTM
- Evaluation
- Conclusion
3Dynamic Thermal Management
- Thermal challenges in modern microprocessors
- Increasing chip complexity and associated power
budgets - High temperatures affect
- Reliability
- Performance
- Cost
- Worst-case solutions no longer cost-effective
- Move towards run-time techniques
- DTM evaluation metrics
- Effectiveness guarantee thermal safety
- Efficiency low performance impact
4Past work on DTM
- Hardware-based approaches
- Clock gating, DVS, fetch toggling etc.
- Effective in reducing processor power consumption
and hence temperature - Ignore application-specific thermal behavior
- Software-based approaches
- Using the operating system
- Global system-level knowledge
- May not guarantee thermal safety when used in
isolation
5HybDTM best of both worlds
- Synergistically leverage the advantages of both
hardware and software techniques - Proactive use of software DTM mixed with reactive
use of hardware DTM - Lower performance impact while guaranteeing
thermal safety - Fast and accurate thermal model to predict
run-time application thermal behavior is the key
6Thermal characterization
- Software thermal analysis tools
- HotSpot Skadron ISCA03, ISAC Yang ICCAD06
- Involves complex RC models which incur large
overhead on every invocation - Applicable to design-time thermal analysis
- Hardware sensors
- Cannot predict the future
(Adapted from K. Skadron et al.
Temperature-aware microarchitecture, ISCA 03)
- Our modeling technique uses both approaches
- Run-time software-based model
- Performance counters
- Modern processors support performance counters
for debugging and system tuning - Pentium 4 18 hardware counters to monitor events
in different microarchitecture units - Regression-based thermal model
- Hardware sensors to improve accuracy
- Counters to temperature directly low overhead
-
7HybDTM system architecture
- System-level approach
- Considering different components like processor,
memory etc. together - Software layer
- Operating system natural choice
- Scheduling can control activity on different
system components - ExComputation/communication
- Hardware layer
- Temperature values from on-die sensors
- Leverage underlying power management techniques
like clock gating etc. - Pentium 4 used as a testbed in this study
8 9Regression Thermal Analysis
- To characterize the thermal behavior of
- Individual applications
- Entire system
- Applications have varying processor usage
patterns - Different microarchitecture units impact overall
chip temperature differently - Appropriate temperature weights for each hardware
event - Toverall w1 (u1 / ttotal) w2
(u2 / ttotal) wconst - Regression space
- Contribution of each hardware event in isolation
- Different possible combinations of hardware
events (to capture thermal correlation) - Exhaustive set of microbenchmarks tailored
towards each hardware event to cover the entire
regression space
10Microbenchmark set
- Units targeted
- Floating point unit
- Instruction fetch logic
- Instruction decoder
- Branch predictor unit
- TLBs
- L1/L2 cache
- Bus control logic
- Rename logic
- Trace cache
- Allocation logic
- Microcode ROM
- Memory order buffer
- Retirement logic
- Instruction queue
- Instruction scheduler
(Adapted from G. Hilton et al. The
microachitecture of the Pentium 4 processor,
Intel Tech. Journal, Feb. 01)
11Processor thermal model
- Validated against on-die thermal sensor
12- HybDTM Hybrid Dynamic Thermal Management
13HybDTM major constituents
- Run-time characterization
- Based on the usage of different processor
microarchitecture units - Application thermal behavior
- Overall chip temperature
- Run-time management
- Proactive software-directed DTM
- Thermal-aware priority management
- Thermal-aware timeslice management
- Reactive hardware-directed DTM
- Fine-grained clock-gating
14HybDTM flow
15Run-time thermal characterization
- Characterize the thermal behavior of
- Each individual process
- Local history (tlp)
- Global history (tgp)
- Tprocess wlp tlp wgp tgp (wlp gt wgp)
- Overall chip temperature
- Local component Tl ? T1 (t1 / ttotal)
- Global component average energy stored
- Toverall wl Tl wg Tg (wl gt wg)
16Software-directed DTM
- Used proactively to balance thermal profile
- Process scheduling
- For performance
- Throughput
- Response time
- Fairness
- For thermal-efficiency
- Process thermal characteristics
- Each process assigned a priority and a timeslice
- Thermal-aware scheduling
- Interleave processes to avoid local hotspots
- Identify processes as hot or cold
- Based on a software DTM trigger (Tsw)
- Tprocess gt Tsw gt hot process
Priority adjustment Lower chance of
getting scheduled
Timeslice adjustment Decrease total CPU run time
17Hardware-directed DTM
- Reactive use of more aggressive hardware
techniques to guarantee thermal safety - Pentium 4 supports fine-grained clock gating
- Clock can be throttled in discrete steps ranging
from 12.5 to 87.5 - Clock throttling ratio set using a model-specific
register (MSR) - Low software overhead of engaging/disengaging
clock gating (few processor clock cycles) - Allows fine-grained control
- Catastrophic shutdown detector
- No software initiation required
- On-chip thermal diode which triggers processor
shutdown when temperature become alarmingly high
18HybDTM evaluation
- Objectives
- Effectiveness guarantee thermal safety
- Efficiency minimize performance penalty
- Comparison
- Software-based DTM (SDTM)
- Uses thermal-aware scheduling only
- To study if SDTM alone is able to remove all
thermal emergencies - Hardware-based DTM (HDTM)
- Uses clock gating only
- To study the performance impact of purely
hardware-based approaches
19Uniprocessor
- Evaluation using SPEC2000 benchmarks
- Ran memory (a memory-intensive microbenchmark)
in the background to study effect of SDTM - Max. temperature limit Tmax 65 C
- Software DTM trigger Tsw 60 C
- Hardware DTM trigger Thw 62 C (12.5
throttling) / - 63 C (25 throttling) / 64 C (50
throttling)
20Performance Impact
Execution time overhead
Relative CPU utilization
memory
100
80
60
40
wupwise
vpr
gzip
twolf
20
applu
0
nodtm
nodtm
nodtm
nodtm
nodtm
hybdtm
hybdtm
hybdtm
hybdtm
hybdtm
Average execution time overhead HybDTM 9.9
(16.3 max.) HDTM 20.4 (29.5 max.)
21SMT
Same trigger settings Tmax 65 C, Tsw 60 C
Thw 62 C (12.5) / 63 C (25 ) / 64 C (50)
22Conclusion
- This work proposes a low overhead system-level
DTM solution - Regression based thermal model (using hardware
performance counters) for run-time application
thermal characterization - Hybrid of hardware and software DTM schemes
- Effective and low overhead DTM