HybDTM: A Coordinated HardwareSoftware Approach for Dynamic Thermal Management - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

HybDTM: A Coordinated HardwareSoftware Approach for Dynamic Thermal Management

Description:

Worst-case solutions no longer cost-effective. Move towards ... (Adapted from G. Hilton et al. 'The microachitecture of the Pentium 4 processor', Intel Tech. ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 23
Provided by: carl290
Category:

less

Transcript and Presenter's Notes

Title: HybDTM: A Coordinated HardwareSoftware Approach for Dynamic Thermal Management


1
HybDTM A Coordinated Hardware-Software Approach
for Dynamic Thermal Management
  • Amit Kumar, Li-Shiuan Peh, Niraj K. Jha
  • Princeton University
  • Li Shang
  • Queens University
  • July 26, 2006

2
Outline
  • Introduction
  • Dynamic Thermal Management
  • Hybrid approach
  • Overview
  • Thermal modeling
  • Run-time characterization
  • Proactive/reactive DTM
  • Evaluation
  • Conclusion

3
Dynamic Thermal Management
  • Thermal challenges in modern microprocessors
  • Increasing chip complexity and associated power
    budgets
  • High temperatures affect
  • Reliability
  • Performance
  • Cost
  • Worst-case solutions no longer cost-effective
  • Move towards run-time techniques
  • DTM evaluation metrics
  • Effectiveness guarantee thermal safety
  • Efficiency low performance impact

4
Past work on DTM
  • Hardware-based approaches
  • Clock gating, DVS, fetch toggling etc.
  • Effective in reducing processor power consumption
    and hence temperature
  • Ignore application-specific thermal behavior
  • Software-based approaches
  • Using the operating system
  • Global system-level knowledge
  • May not guarantee thermal safety when used in
    isolation

5
HybDTM best of both worlds
  • Synergistically leverage the advantages of both
    hardware and software techniques
  • Proactive use of software DTM mixed with reactive
    use of hardware DTM
  • Lower performance impact while guaranteeing
    thermal safety
  • Fast and accurate thermal model to predict
    run-time application thermal behavior is the key

6
Thermal characterization
  • Software thermal analysis tools
  • HotSpot Skadron ISCA03, ISAC Yang ICCAD06
  • Involves complex RC models which incur large
    overhead on every invocation
  • Applicable to design-time thermal analysis
  • Hardware sensors
  • Cannot predict the future

(Adapted from K. Skadron et al.
Temperature-aware microarchitecture, ISCA 03)
  • Our modeling technique uses both approaches
  • Run-time software-based model
  • Performance counters
  • Modern processors support performance counters
    for debugging and system tuning
  • Pentium 4 18 hardware counters to monitor events
    in different microarchitecture units
  • Regression-based thermal model
  • Hardware sensors to improve accuracy
  • Counters to temperature directly low overhead

7
HybDTM system architecture
  • System-level approach
  • Considering different components like processor,
    memory etc. together
  • Software layer
  • Operating system natural choice
  • Scheduling can control activity on different
    system components
  • ExComputation/communication
  • Hardware layer
  • Temperature values from on-die sensors
  • Leverage underlying power management techniques
    like clock gating etc.
  • Pentium 4 used as a testbed in this study

8
  • Thermal modeling

9
Regression Thermal Analysis
  • To characterize the thermal behavior of
  • Individual applications
  • Entire system
  • Applications have varying processor usage
    patterns
  • Different microarchitecture units impact overall
    chip temperature differently
  • Appropriate temperature weights for each hardware
    event
  • Toverall w1 (u1 / ttotal) w2
    (u2 / ttotal) wconst
  • Regression space
  • Contribution of each hardware event in isolation
  • Different possible combinations of hardware
    events (to capture thermal correlation)
  • Exhaustive set of microbenchmarks tailored
    towards each hardware event to cover the entire
    regression space

10
Microbenchmark set
  • Units targeted
  • Floating point unit
  • Instruction fetch logic
  • Instruction decoder
  • Branch predictor unit
  • TLBs
  • L1/L2 cache
  • Bus control logic
  • Rename logic
  • Trace cache
  • Allocation logic
  • Microcode ROM
  • Memory order buffer
  • Retirement logic
  • Instruction queue
  • Instruction scheduler

(Adapted from G. Hilton et al. The
microachitecture of the Pentium 4 processor,
Intel Tech. Journal, Feb. 01)
11
Processor thermal model
  • Validated against on-die thermal sensor

12
  • HybDTM Hybrid Dynamic Thermal Management

13
HybDTM major constituents
  • Run-time characterization
  • Based on the usage of different processor
    microarchitecture units
  • Application thermal behavior
  • Overall chip temperature
  • Run-time management
  • Proactive software-directed DTM
  • Thermal-aware priority management
  • Thermal-aware timeslice management
  • Reactive hardware-directed DTM
  • Fine-grained clock-gating

14
HybDTM flow
15
Run-time thermal characterization
  • Characterize the thermal behavior of
  • Each individual process
  • Local history (tlp)
  • Global history (tgp)
  • Tprocess wlp tlp wgp tgp (wlp gt wgp)
  • Overall chip temperature
  • Local component Tl ? T1 (t1 / ttotal)
  • Global component average energy stored
  • Toverall wl Tl wg Tg (wl gt wg)

16
Software-directed DTM
  • Used proactively to balance thermal profile
  • Process scheduling
  • For performance
  • Throughput
  • Response time
  • Fairness
  • For thermal-efficiency
  • Process thermal characteristics
  • Each process assigned a priority and a timeslice
  • Thermal-aware scheduling
  • Interleave processes to avoid local hotspots
  • Identify processes as hot or cold
  • Based on a software DTM trigger (Tsw)
  • Tprocess gt Tsw gt hot process

Priority adjustment Lower chance of
getting scheduled
Timeslice adjustment Decrease total CPU run time
17
Hardware-directed DTM
  • Reactive use of more aggressive hardware
    techniques to guarantee thermal safety
  • Pentium 4 supports fine-grained clock gating
  • Clock can be throttled in discrete steps ranging
    from 12.5 to 87.5
  • Clock throttling ratio set using a model-specific
    register (MSR)
  • Low software overhead of engaging/disengaging
    clock gating (few processor clock cycles)
  • Allows fine-grained control
  • Catastrophic shutdown detector
  • No software initiation required
  • On-chip thermal diode which triggers processor
    shutdown when temperature become alarmingly high

18
HybDTM evaluation
  • Objectives
  • Effectiveness guarantee thermal safety
  • Efficiency minimize performance penalty
  • Comparison
  • Software-based DTM (SDTM)
  • Uses thermal-aware scheduling only
  • To study if SDTM alone is able to remove all
    thermal emergencies
  • Hardware-based DTM (HDTM)
  • Uses clock gating only
  • To study the performance impact of purely
    hardware-based approaches

19
Uniprocessor
  • Evaluation using SPEC2000 benchmarks
  • Ran memory (a memory-intensive microbenchmark)
    in the background to study effect of SDTM
  • Max. temperature limit Tmax 65 C
  • Software DTM trigger Tsw 60 C
  • Hardware DTM trigger Thw 62 C (12.5
    throttling) /
  • 63 C (25 throttling) / 64 C (50
    throttling)

20
Performance Impact
Execution time overhead
Relative CPU utilization
memory
100
80
60
40
wupwise
vpr
gzip
twolf
20
applu
0
nodtm
nodtm
nodtm
nodtm
nodtm
hybdtm
hybdtm
hybdtm
hybdtm
hybdtm
Average execution time overhead HybDTM 9.9
(16.3 max.) HDTM 20.4 (29.5 max.)
21
SMT
Same trigger settings Tmax 65 C, Tsw 60 C
Thw 62 C (12.5) / 63 C (25 ) / 64 C (50)
22
Conclusion
  • This work proposes a low overhead system-level
    DTM solution
  • Regression based thermal model (using hardware
    performance counters) for run-time application
    thermal characterization
  • Hybrid of hardware and software DTM schemes
  • Effective and low overhead DTM
Write a Comment
User Comments (0)
About PowerShow.com