Siggraph thermal poster - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Siggraph thermal poster

Description:

The two hottest units, framebuffer operations and the vertex engine, are separated. ... The framebuffer, fragment engine, and texture cache are enlarged to maintain ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 2
Provided by: jeremyws
Category:

less

Transcript and Presenter's Notes

Title: Siggraph thermal poster


1
Temperature-Aware GPU Design
Jeremy W. Sheaffer, Kevin Skadron, David P.
Luebke jws9c, skadron, luebke_at_cs.virginia.edu
University of Virginia, Charlottesville, VA 22904
http//gfx.cs.virginia.edu
http//qsilver.cs.virginia.edu
http//lava.cs.virginia.edu
Simulator Setup and Output
  • For these results, our simulator is configured to
    model a system
  • Built on a 180nm process at 1.8V and 300MHz
  • Using an aluminum cooling solution with no fan
  • With a temperature sensor on each functional unit
    block. We assume that the vendor specifies a
    100C maximum safe operating temperature and
    enable dynamic thermal management at 97C to
    account for sensor imprecision.
  • We have implemented the following DTM techniques
    on Qsilver
  • Clock Gatingthe clock is stopped until the chip
    drops below the threshold temperature.
  • Fetch Gatinga single stage in the pipeline is
    slowed down. We implement this in both the
    vertex fetch and rasterization stages.
  • Dynamic Voltage ScalingDVS scales the core
    voltage, and with it frequency, yielding a cubic
    reduction in power.
  • Multiple Clock DomainsMCD also scales voltage
    and frequency, but on the granularity of
    individual functional units. Both DVS and MCD
    require a sync time penalty when they are
    enabled and disabled.

GPU Simulation with Qsilver
  • To study thermal issues in a GPU, we have
    developed a simulator called Qsilver that
  • models GPU clock-cycle-by-cycle activity and
    power in the microarchitecture domain.
  • uses the Chromium system to intercept a stream
    of OpenGL calls, annotating it with aggregate
    information about the vertices and fragments,
    textures, lighting, and other relevant rendering
    state
  • Qsilver is useful for
  • analyzing performance bottlenecks
  • estimating power
  • exploring new graphics architectural ideas
  • We have used Qsilver to analyze a hypothetical
    fixed-function console-like GPU architecture.
    For these results, we augment Qsilver with an
    architectural thermal model called HotSpot that
    tracks temperature in each functional unit over
    time.

Thermal Simulation Results
Architecture-Level Thermal Modeling
  • Requirements General, simple, and fast, and must
    model heating at the granularity of architectural
    objects
  • Must be able to dynamically calculate
    temperatures for each block in the architecture
  • Must be able to simulate billions of clock
    cycles in a few hours
  • Must be general enough to use for modeling a
    variety of processor architectures
  • Must be able to reason about results at the
    architecture level
  • Solution Derive an equivalent circuit of lumped
    thermal resistances and capacitances.This
    circuit must be derived at the granularity of the
    processor architecture.
  • Key components
  • Floorplanning
  • Lumped-RC circuit derivation

http//chromium.sourceforge.net/
http//lava.cs.virginia.edu/HotSpot/
Floorplans
  • In order to add thermal modeling to Qsilver, the
    simulator must first be instrumented with an
    architectural floorplan. From the left, these
    floorplans are
  • Defaultbased on an nVIDIA marketing photo. We
    use this chip to drive an 800600, console-like
    display in our simulations.
  • Separating Hot Unitsbased on the default
    floorplan. The two hottest units, framebuffer
    operations and the vertex engine, are separated.
  • High Resolutionalso based on the default, but
    modified to drive a PC display at 12801024. The
    framebuffer, fragment engine, and texture cache
    are enlarged to maintain reasonable power
    densities under higher workload.
  • Partitioned High Resolutionthis novel floorplan
    maintains the functional unit area of the high
    resolution design, but partitions units into
    separate blocks per pipe, and separates hot
    blocks from cooler ones.

From left to right, below No architectural
thermal management with the default floorplan
yields a very hot vertex engine the hot units
moved apart, combined with DVS make the chip
cooler with a less profound thermal spatial
gradient fetch gating on the high resolution
system and DVS on the redesigned high-res chip,
where the affect of separating hotspots on
spatial gradient is more obviouscombining static
and dynamic techniques is a double win. Note
that to better illustrate their full dynamic
range, these thermal maps are not all on the same
scale.
Rasterizer
Vertex Engine
Vertex Engine
Rasterizer
Vertex Engine
Unused
Unused
Framebuffer and Data Compression
Host Interface
Framebuffer control
Framebuffer control
Vertex Engine
2D Video
2D Video
Framebuffer and Data Compression
Framebuffer and Data Compression
Framebuffer and Data Compression
2D Video
Fragment Engine
Fragment Engine
Fragment Engine
Rasterizer
Rasterizer
Framebuffer control
Fragment Engine
Framebuffer control
Host Interface
Texture Cache
Texture Cache
Fragment Engine
2D Video
Host Interface
Texture Cache
Host Interface
Texture Cache
Fragment Engine
Vertex Engine
Framebuffer control
Framebuffer control
Separating Hot Units
High Resolution
Partitioned High Resolution
Default
Write a Comment
User Comments (0)
About PowerShow.com