Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin-Madison - PowerPoint PPT Presentation

About This Presentation
Title:

Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin-Madison

Description:

Increasing performance gap between processors and memory calls for faster memory ... Cache memories reduce average memory. latency ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 22
Provided by: yw2
Category:

less

Transcript and Presenter's Notes

Title: Adaptive Cache Compression for High-Performance Processors Alaa R. Alameldeen and David A.Wood Computer Sciences Department, University of Wisconsin-Madison


1
Adaptive Cache Compression for High-Performance
ProcessorsAlaa R. Alameldeen and David
A.WoodComputer Sciences Department, University
of Wisconsin-Madison
2
Outline
  • Introduction
  • Motivation
  • Adaptive Cache Compression
  • Evaluation Methodology
  • Reported performance
  • Review conclusion
  • Critics/Suggestions

3
Introduction
  • Increasing performance gap between processors
    and memory calls for faster memory access.
  • Cache memories reduce average memory
  • latency
  • Cache Compression improves performance of cache
  • memories
  • Adaptive Cache Compression Theme of this
    discussion

4
Motivation
  • Cache compression can improve effectiveness of
    cache memories (increase effective cache
    capacity)
  • Increasing effective cache capacity reduces miss
    rate
  • Performance will improve !

5
Adaptive Cache Compression An Overview
  • Dynamically optimize cache performance
  • Use the past to predict the future
  • How likely is compression going to help, hurt,
    or make no difference
  • to next reference?
  • Feedback from previous compression helps to
    decide whether to compress
  • the next write to cache

6
Adaptive Cache CompressionImplementation
  • 2-level cache hierarchy
  • L1 cache (data, instruction)
  • uncompressed
  • L2 cache is unified and
  • optionally compressed
  • Decompression/
  • Compression used/skipped
  • as necessary

Pros L1 cache performance not affected Cons
Compression/Decompression introduces latency
7
Adaptive Cache CompressionL2 cache detail
  • 8-way set associative
  • Use a compression information tag stored with
    each address tag
  • 32 segments (8 bytes each) in each set
  • An uncompressed line comprises 8 segments
  • (4 uncompressed lines max in each set)
  • Compressed lines are 1 to 7 segments in length
  • Max number of lines in each set 8
  • Least recently used (LRU) lines evicted
  • Compacting may be used to make room for a new
    line

8
Adaptive Cache CompressionTo compress or not to
compress?
  • While compression eliminates L2 misses, it
    increases the latency of L2 hits (more frequent).
  • However, penalty for L2 misses is usually large
    and extra latency due to decompression is usually
    small.
  • Compression helps if

( avoided L2 misses ) x (L2
miss penalty)
( penalized L2 hits ) x (
decompression penalty)
gt
Example For a 5 cycle decompression penalty and
400 cycle cycle L2 miss
penalty, compression wins if it eliminates at
least one L2 miss for every
400/580 penalized L2 hits
9
Adaptive Cache CompressionClassification of
Cache References
  • Classifications of hits
  • Unpenalized hit
  • (e.g. reference to address A)
  • Penalized hit
  • (e.g. reference to address C)
  • Avoided miss
  • (e.g. reference to address E)
  • Classifications of misses
  • Avoidable miss
  • ( e.g. reference to address G)
  • Unavoidable miss
  • ( e.g. reference to address H)

Evicted
10
Adaptive Cache CompressionHardware use in
decision-making
  • Global Compression Predictor
  • estimates the recent cost or benefit of
    compression
  • On a penalized hit, the controller biases against
    compression by decrementing the counter
  • ( subtractedvaluedecompression penalty)
  • On an avoided or avoidable miss, the controller
    increments the counter by the L2 miss penalty.
  • The controller uses the GCP when allocating a
    line in the L2 cache
  • Positive value -gt compression has helped, so
    now compress
  • Negative value -gt compression has been
    penalizing, so dont
  • compress
  • Size of GCP determines sensitivity to changes
  • In this paper, 19-bit used ( saturates at 262143
    or -262144 )

11
Adaptive Cache CompressionSensitivity
  • Effectiveness depend on the workloads size,
    caches size and latencies
  • Sensitive to L2 cache size (effective for small
    L2 cache)
  • Sensitive to L1 cache size (observe trade-offs)
  • Adapting to benchmark phase
  • - changes in phase behaviour may hurt
    adaptive policy
  • - takes time to adapt

12
Evaluation Methodology
  • Host system dynamically-scheduled SPARC V9
    uniprocessor
  • Target system superscalar processor with
    out-of-order execution
  • Simulation Parameters

13
Evaluation Methodology (continued)
  • Simulator Simics full-system simulator, extended
    with a detailed
  • processor simulator
    (TFSim), and a detailed memory
  • system timing simulator.
  • Workloads
  • multi-threaded commercial workloads from the
    Wisconsin Commercial
  • workload suite
  • eight of the SPECcpu2000 benchmarks
  • Integer benchmarks (bzip, gcc, mcf, twolf)
  • Floating benchmarks (ammp, applu, equake, swim)
  • Workloads selected to cover a wide range of
    compressibility properties,
  • miss rates and working set sizes.

14
Evaluation methodology (continued)
  • To understand the utility of adaptive
    compression, 2 extreme policies ( Never compress,
    and always compress were compared with )
  • Never strives to reduce hit latency
  • Always strives to reduce miss rate
  • Adaptive strives to optimize.

15
Reported Performance(Average cache capacity)
Figure Average cache capacity during benchmark
runs (4MB uncompressed)
16
Reported Performance (cache miss rate)
Figure L2 cache miss rate (normalized to Never
miss rate)
17
Reported Performance (Runtime)
Figure Runtime for the three compression
alternatives (normalized to Never)
18
Reported Performance(sensitivity of adaptive
compression to benchmark phase changes)
Top temporal changes in Global Compression
Predictor values Bottom effective cache size
19
Review Conclusion
  • Compressing all compressible cache lines only
    improves memory-intensive applications.
    Applications with low miss rate / compressibility
    suffer.
  • Optimization achieved by adaptive scheme are
  • Up to 26 speedup (over uncompressed scheme) for
  • memory-intensive, highly-compressible benchmarks
  • Performance degradation for other benchmarks lt
    0.4

20
Critics/Suggestions
  • Data inconsistency17 improvement in performance
    for memory-intensive commercial workloads claimed
    on page 2 but 26 claimed on page 11.
  • Miscalculation on page 4
  • The sum of the compressed sizes at stack depths 1
    through 7 totals
  • 29.
  • However, this miss cannot be avoided because the
    sum of compressed sizes exceeds the total number
    of segments (i.e. 35 gt 32 ) .
  • All in all, the proposed technique doesnt seem
    to enhance performance significantly with respect
    to always.

21
Thank you !
Write a Comment
User Comments (0)
About PowerShow.com