Title: Siggraph thermal poster
1Temperature-Aware GPU Design
Jeremy W. Sheaffer, Kevin Skadron, David P.
Luebke jws9c, skadron, luebke_at_cs.virginia.edu
University of Virginia, Charlottesville, VA 22904
http//gfx.cs.virginia.edu
http//qsilver.cs.virginia.edu
http//lava.cs.virginia.edu
Simulator Setup and Output
- For these results, our simulator is configured to
model a system - Built on a 180nm process at 1.8V and 300MHz
- Using an aluminum cooling solution with no fan
- With a temperature sensor on each functional unit
block. We assume that the vendor specifies a
100C maximum safe operating temperature and
enable dynamic thermal management at 97C to
account for sensor imprecision. - We have implemented the following DTM techniques
on Qsilver - Clock Gatingthe clock is stopped until the chip
drops below the threshold temperature. - Fetch Gatinga single stage in the pipeline is
slowed down. We implement this in both the
vertex fetch and rasterization stages. - Dynamic Voltage ScalingDVS scales the core
voltage, and with it frequency, yielding a cubic
reduction in power. - Multiple Clock DomainsMCD also scales voltage
and frequency, but on the granularity of
individual functional units. Both DVS and MCD
require a sync time penalty when they are
enabled and disabled.
GPU Simulation with Qsilver
- To study thermal issues in a GPU, we have
developed a simulator called Qsilver that - models GPU clock-cycle-by-cycle activity and
power in the microarchitecture domain. - uses the Chromium system to intercept a stream
of OpenGL calls, annotating it with aggregate
information about the vertices and fragments,
textures, lighting, and other relevant rendering
state - Qsilver is useful for
- analyzing performance bottlenecks
- estimating power
- exploring new graphics architectural ideas
- We have used Qsilver to analyze a hypothetical
fixed-function console-like GPU architecture.
For these results, we augment Qsilver with an
architectural thermal model called HotSpot that
tracks temperature in each functional unit over
time.
Thermal Simulation Results
Architecture-Level Thermal Modeling
- Requirements General, simple, and fast, and must
model heating at the granularity of architectural
objects - Must be able to dynamically calculate
temperatures for each block in the architecture - Must be able to simulate billions of clock
cycles in a few hours - Must be general enough to use for modeling a
variety of processor architectures - Must be able to reason about results at the
architecture level - Solution Derive an equivalent circuit of lumped
thermal resistances and capacitances.This
circuit must be derived at the granularity of the
processor architecture. - Key components
- Floorplanning
- Lumped-RC circuit derivation
http//chromium.sourceforge.net/
http//lava.cs.virginia.edu/HotSpot/
Floorplans
- In order to add thermal modeling to Qsilver, the
simulator must first be instrumented with an
architectural floorplan. From the left, these
floorplans are - Defaultbased on an nVIDIA marketing photo. We
use this chip to drive an 800600, console-like
display in our simulations. - Separating Hot Unitsbased on the default
floorplan. The two hottest units, framebuffer
operations and the vertex engine, are separated. - High Resolutionalso based on the default, but
modified to drive a PC display at 12801024. The
framebuffer, fragment engine, and texture cache
are enlarged to maintain reasonable power
densities under higher workload. - Partitioned High Resolutionthis novel floorplan
maintains the functional unit area of the high
resolution design, but partitions units into
separate blocks per pipe, and separates hot
blocks from cooler ones.
From left to right, below No architectural
thermal management with the default floorplan
yields a very hot vertex engine the hot units
moved apart, combined with DVS make the chip
cooler with a less profound thermal spatial
gradient fetch gating on the high resolution
system and DVS on the redesigned high-res chip,
where the affect of separating hotspots on
spatial gradient is more obviouscombining static
and dynamic techniques is a double win. Note
that to better illustrate their full dynamic
range, these thermal maps are not all on the same
scale.
Rasterizer
Vertex Engine
Vertex Engine
Rasterizer
Vertex Engine
Unused
Unused
Framebuffer and Data Compression
Host Interface
Framebuffer control
Framebuffer control
Vertex Engine
2D Video
2D Video
Framebuffer and Data Compression
Framebuffer and Data Compression
Framebuffer and Data Compression
2D Video
Fragment Engine
Fragment Engine
Fragment Engine
Rasterizer
Rasterizer
Framebuffer control
Fragment Engine
Framebuffer control
Host Interface
Texture Cache
Texture Cache
Fragment Engine
2D Video
Host Interface
Texture Cache
Host Interface
Texture Cache
Fragment Engine
Vertex Engine
Framebuffer control
Framebuffer control
Separating Hot Units
High Resolution
Partitioned High Resolution
Default