ISCA 2004 Tutorial - PowerPoint PPT Presentation

About This Presentation
Title:

ISCA 2004 Tutorial

Description:

Convection. Macroscopic (bulk transport, mix of hot and cold, energy storage) ... Note that convection is profoundly affected by board layout. Source: CRC Press, ... – PowerPoint PPT presentation

Number of Views:135
Avg rating:3.0/5.0
Slides: 56
Provided by: skadronsta
Category:

less

Transcript and Presenter's Notes

Title: ISCA 2004 Tutorial


1
ISCA 2004 Tutorial
  • Thermal Issues for Temperature-Aware Computer
    Systems
  • Saturday, June 19th
  • 800am - 500pm 

2
Presenters
  • Kevin Skadron (skadron_at_cs.virginia.edu)
  • CS Department, University of Virginia
  • David Brooks (dbrooks_at_eecs.harvard.edu)
  • CS Department, Harvard University
  • Antonio Gonzalez (antonio_at_ac.upc.es)
  • UPC-Barcelona, and Intel Barcelona Research
    Center
  • Lev Finkelstein (lev.finkelstein_at_intel.com)
  • Intel Haifa
  • Mircea Stan (mircea_at_virginia.edu)
  • ECE Department, University of Virginia

3
Overview
  1. Motivation (Kevin) 1.5 hrs
  2. Thermal issues (Kevin)
  3. Power modeling (David) 1.5
  4. Thermal management (David) hrs
  5. Optimal DTM (Lev) .5 hrs
  6. Clustering (Antonio) 1 hr
  7. Power distribution (David) 15 min
  8. What current chips do (Lev) 45 min
  9. HotSpot and sensors (Kevin) 1 hr

4
Overview
  1. Motivation (Kevin)
  2. Thermal issues (Kevin)
  3. Power modeling (David)
  4. Thermal management (David)
  5. Optimal DTM (Lev)
  6. Clustering (Antonio)
  7. Power distribution (David)
  8. What current chips do (Lev)
  9. HotSpot (Kevin)

5
Motivation
  • Power consumption first-order design constraint
  • unconstrained power is a theoretical max
  • peak (?inst.) power is limiting power delivery
  • sustained power limits thermal design/packaging
  • max sustained power thermal virus
  • same as thermal design power
  • average active power and idle power limit mobile
    battery life, etc.
  • Common fallacy instantaneous power ? temperature
  • Power-density is increasing exponentially
  • Unfortunate corollary of Moores Law
  • thermal effects become more problematic
  • Need Power/Temperature-aware computing!

6
Power Dissipation
Source Microprocessor Report
7
Effects of Technology Scaling on Power Dissipation
  • Feature size is scaling down
  • 30
  • Frequency is increasing
  • 2x
  • Area increases due to microarchitecture
    improvements
  • 25 (Ideal scaling decreases by 50)
  • Active capacitance increases
  • at least 30 (Ideal scaling decreases by 30)
  • Vdd is not scaled down at the same rate as
    feature size
  • 0-10 (Ideal scaling 30)
  • Ideal scaling P ? CV2f ? 0.72 reduction ? 0.5
  • Observed scaling ? 2 2.5x increase
  • Power density becomes a problem!
  • Especially since the power density is non-uniform

8
Trends in Power Density
Sun's Surface
1000
Rocket Nozzle
Nuclear Reactor
100
Pentium 4
Pentium III
Hot plate
Pentium II
10
Pentium Pro
Pentium
i386
i486
1
1.5m
1m
0.7m
0.5m
0.35m
0.25m
0.18m
0.13m
0.1m
0.07m
New Microarchitecture Challenges in the Coming
Generations of CMOS Process Technologies Fred
Pollack, Intel Corp. Micro32 conference key note
- 1999.
9
ITRS Projections
  • These are targets
  • Power-density problem is still getting worse
  • Intel papers suggest that in the 45-75W range,
    cooling costs 1/W but then rate of increase
    goes up 2, 3/W, probably more!(Borkar, IEEE
    Micro 99, Gunther et al, ITJ 01)

ITRS 2001
10
Leakage Power
  • The fraction of leakage power is increasing
    exponentially with each generation
  • Also exponentially dependent on temperature

Increasingratioacross generations
Source Sankaranarayanan et al, University of
Virginia
11
Power-aware figures of merit
  • Power (P) battery time (mobile)
  • (1/W) packaging (high-performance)
  • Energy (PD) battery life (mobile)
  • (MIPS/W) fundamental limits (kT)
  • Energy-delay (PD2)
  • (MIPS2/W) performance and low power
  • Energy-delay2 (PD3) indep. of Vdd
  • (MIPS3/W) emphasis on performance
  • Power-aware ? low power
  • Similar to old VLSI complexity (A,AD,AD2)
  • None of these are appropriate for thermal
  • This is a problem
  • Refs R. Gonzales et al. Supply and threshold
    voltage scaling for low power CMOS, JSSC, Aug.
    1997
  • A. Martin et al. Design of an Asynchronous MIPS
    R3000, ARVLSI97
  • J. Ullman, Computational aspects of VLSI, CS
    Press, 1984

12
Cooking-aware computing
  • Some chips rated for 100C

13
Power and temperature are BAD
  • and can be EVIL

Source Toms Hardware Guidehttp//www6.tomshardw
are.com/cpu/01q3/010917/heatvideo-01.html
14
Other Costs of High Heat Flux
  • Some chips may already be underclocked due to
    thermal constraints!
  • (especially mobile and sealed systems)

15
Temporal, Spatial Variations
Temperature variation of SPEC applu over time
Hot spots increase cooling costs ? must cool
for hot spot
16
Application Variations
  • Wide variation across applications
  • Architectural and technology trends are making it
    worse, e.g. simultaneous multithreading (SMT)
  • Leakage is an especially severe problem
    exponentially dependent on temperature!

17
Heat vs. Temperature
  • Different time scales
  • Heat no notion of spatial locality
  • Does architecture have a role?
  • Temperature-aware computing
  • Optimize performance subject to a temperature
    constraint

18
Overview
  1. Motivation (Kevin)
  2. Thermal issues (Kevin)
  3. Power modeling (David)
  4. Thermal management (David)
  5. Optimal DTM (Lev)
  6. Clustering (Antonio)
  7. Power distribution (David)
  8. What current chips do (Lev)
  9. HotSpot and sensors (Kevin)

19
Thermal issues
  • Temperature affects
  • Circuit performance
  • Circuit power (leakage)
  • IC reliability
  • IC and system packaging cost
  • Environment

20
Performance and leakage
  • Temperature affects
  • Transistor threshold and mobility
  • Subthreshold leakage, gate leakage
  • Ion, Ioff, Igate, delay
  • ITRS 85C for high-performance, 110C for
    embedded!

Ioff
Ion NMOS
21
Temperature-aware circuits
  • Robustness constraint sets Ion/Ioff ratio
  • Robustness and reliability Ion/Igate ratio
  • Idea keep ratios constant with T trade leakage
    for performance!

Ref Ghoshal et al. Refrigeration
Technologies, ISSCC 2000 Garrett et al. T3,
ISCAS 2001
22
Resulting performance
  • 25 - 30 extra performance (110oC to 0oC)

regular
TAC
23
Reliability
  • The Arrhenius Equation MTFAexp(Ea/KT)
  • MTF mean time to failure at T
  • A empirical constant
  • Ea activation energy
  • K Boltzmanns constant
  • T absolute temperature
  • Failure mechanisms
  • Die metalization (Corrosion, Electromigration,
    Contact spiking)
  • Oxide (charge trapping, gate oxide breakdown, hot
    electrons)
  • Device (ionic contamination, second breakdown,
    surface-charge)
  • Die attach (fracture, thermal breakdown, adhesion
    fatigue)
  • Interconnect (wirebond failure, flip-chip joint
    failure)
  • Package (cracking, whisker and dendritic growth,
    lid seal failure)
  • Most of the above increase with T (Arrhenius)
  • Notable exception hot electrons are worse at low
    temperatures
  • More on this later

24
Packaging cost
  • From Cray (local power generator and
    refrigeration)

Source Gordon Bell, A Seymour Cray
perspective http//www.research.microsoft.com/use
rs/gbell/craytalk/
25
Packaging cost
  • To today
  • Grid computing power plants co-located near
    compute farms
  • IBM S/390
  • refrigeration

Source R. R. Schmidt, B. D. Notohardjono
High-end server low temperature cooling IBM
Journal of RD
26
IBM S/390 refrigeration
  • Complex and expensive

Source R. R. Schmidt, B. D. Notohardjono
High-end server low temperature cooling IBM
Journal of RD
27
IBM S/390 processor packaging
  • Processor subassembly complex!
  • C4 Controlled Collapse Chip Connection
    (flip-chip)

Source R. R. Schmidt, B. D. Notohardjono
High-end server low temperature cooling IBM
Journal of RD
28
Intel Itanium packaging
  • Complex and expensive (note heatpipe)

Source H. Xie et al. Packaging the Itanium
Microprocessor Electronic Components and
Technology Conference 2002
29
Intel Pentium 4 packaging
  • Simpler, but still

Source Intel web site
30
Graphics Cards
  • Nvidia GeForce 5900 card

Source Tech-Report.com
31
More Graphics Cards
32
Under/Overclocking
  • Some chips need to be underclocked
  • Especially true in constrained form factors
  • Try fitting this in a laptop or Gameboy!

Ultra model of Gigabyte's 3D Cooler Series
Source Toms Hardware Guide
33
Apple G5 liquid cooling
  • Dont know details
  • Lots of people in thermal engineering community
    think liquid is inevitable, especially for server
    rooms
  • But others say no
  • This introduces a whole new kind of leakage
    problem
  • Water and electronics dont mix!

34
Environment
  • Environment Protection Agency (EPA) computers
    consume 10 of commercial electricity consumption
  • This incl. peripherals, possibly also
    manufacturing
  • A DOE report suggested this percentage is much
    lower
  • No consensus, but its still a lot
  • Equivalent power (with only 30 efficiency) for
    AC
  • CFCs used for refrigeration
  • Lap burn
  • Fan noise

35
Heat mechanisms
  • Conduction
  • Convection
  • Radiation
  • Phase change
  • Heat storage

36
Conduction
  • Similar to electrical conduction (e.g. metals are
    good conductors)
  • Heat flow from high energy to low energy
  • Microscopic (vibration, adjacent molecules,
    electron transport)
  • No major displacement of molecules
  • Need a material typically in solids (fluids
    distance between mol)
  • Typical example thermal slug, spreader,
    heatsink

Source CRC Press, R. Remsburg Ed. Thermal
Design of Electronic Equipment, 2001
37
Conduction
  • Not a strongfunction oftemperature
  • But for the hightemp. variationson high-perf.
    chips,(30), it matters
  • Note esp. Sivs. Al, Cu

Source CRC Press, R. Remsburg Ed. Thermal
Design of Electronic Equipment, 2001
38
Convection
  • Macroscopic (bulk transport, mix of hot and cold,
    energy storage)
  • Need material (typically in fluids, liquid, gas)
  • Natural vs. forced (gas or liquid)
  • Typical example heatsink (fan), liquid cooling
  • Note that convection is profoundly affected by
    board layout

Source CRC Press, R. Remsburg Ed. Thermal
Design of Electronic Equipment, 2001
39
Radiation
  • Electromagnetic waves (can occur in vacuum)
  • Negligible in typical applications
  • Sometimes the only mechanism (e.g. in space)

Source CRC Press, R. Remsburg Ed. Thermal
Design of Electronic Equipment, 2001
40
Carnot Efficiency
  • Note that in all cases, heat transfer is
    proportional to ?T
  • This is also one of the reasons energy
    harvesting in computers is probably not
    cost-effective
  • ?T w.r.t. ambient is ltlt 100
  • For example, with a 25W processor, thermoelectric
    effect yields only 50mW
  • Solbrekken et al, ITHERM04
  • This is also why Peltier coolers are not energy
    efficient
  • 10 eff., vs. 30 for a refrigerator

41
Surface-to-surface contacts
  • Not negligible, heat crowding
  • Thermal greases/epoxy (can pump-out)
  • Phase Change Films (undergo a transition from
    solid to semi-solid with the application of heat)

Source CRC Press, R. Remsburg Ed. Thermal
Design of Electronic Equipment, 2001
42
Phase-change
  • Thermal solutions evolution
  • Natural air cooling
  • Forced-air cooling
  • Liquid cooling
  • Phase change (e.g. heat pipe)
  • Refrigeration
  • Phase change

a. Solid changing to a liquidfusion, or
melting, b. Liquid changing to a
vaporevaporation, also boiling, c. Vapor
changing to a liquidcondensation, e. Liquid
changing to a solidcrystallization, or
freezing, f. Solid changing to a
vaporsublimation, g. Vapor changing to a
soliddeposition.
43
Thermal resistance
  • T rt / A t / kA

44
Thermal capacitance
  • Cth VCp ?
  • ?(Aluminum) 2,710 kg/m3
  • Cp(Aluminum) 875 J/(kg-C)
  • V t A 0.000025 m3
  • Cbulk VCp ? 59.28 J/C

45
Refrigeration
  • conventional vs. thermo-electric (TEC)
  • Can get T lt T_amb (negative Rth!)
  • TEC Peltier effect (can use for local cooling)

46
TEC electro-thermal model
47
Simplistic steady-state model
  • All thermal transfer R k/A
  • Power density matters!
  • Ohms law for thermals
  • (steady-state)
  • ?V I R -gt ?T P R
  • T_hot P Rth T_amb
  • Ways to reduce T_hot
  • reduce P (power-aware)
  • reduce Rth (packaging)
  • reduce T_amb (Alaska?)
  • maybe also take advantage of transients (Cth)

48
Simplistic dynamic thermal model
  • Electrical-thermal duality
  • V ? temp (T)
  • I ? power (P)
  • R ? thermal resistance (Rth)
  • C ? thermal capacitance (Cth)
  • RC ? time constant
  • KCL
  • differential eq. I C dV/dt V/R
  • difference eq. ?V I/C ?t V/RC ?t
  • thermal domain ?T P/C ?t T/RC ?t
  • (T T_hot T_amb)
  • One can compute stepwise changes in
    temperature for any granularity at which one can
    get P, T, R, C

49
Combined package model
Note Tja is meaningless!
Steady-state Tj junction temperature Tc case
temperature Ts heatsink temperature Ta
ambient temperature
What exactly is Ta?
Guts of the component
Tjc is better but still sketchy
Source CRC Press, R. Remsburg Ed. Thermal
Design of Electronic Equipment, 2001
50
Reliability as f(T)
  • Reliability criteria (e.g., DTM thresholds) are
    typically based on worst-case assumptions
  • But actual behavior is often not worst case
  • So aging occurs more slowly
  • This means the DTM design is over-engineered!
  • We can exploit this, e.g. for DTM or frequency

Spend
Bank
51
EM Model
Life Consumption Rate
Apply in a lumped fashion at the granularity of
microarchitecture units, just like RAMP
Srinivasan et al.
52
Reliability-Aware DTM
53
Temperature limits
  • Temperature limits for circuit performance can be
    measured
  • Temperature limits for reliability are at best an
    estimate
  • 150 is a reasonable rule of thumb for when
    immediate damage might occur
  • Chips are typically specified at lower
    temperatures, 100-125 for both performance and
    long-term reliability
  • Rule of thumb that every 10 halves circuit
    lifetime is false
  • Originates from a mil-spec that is debunked

54
Thermal issues summary
  • Temperature affectsperformance, power, and
    reliability
  • Architecture-level conduction only
  • Very crude approximation of convection as
    equivalent resistance
  • Convection too complicated
  • Need CFD!
  • Radiation can be ignored
  • Use compact models for package
  • Power density is key
  • Temporal, spatial variation are key
  • Hot spots drive thermal design

55
Review of Thermal Issues
  • From ITHERM04 keynote by Ken Goodson,
    Stanford/Cooligy
Write a Comment
User Comments (0)
About PowerShow.com