On the Importance of Optimizing the Configuration of Stream Prefetches

About This Presentation

Title:

On the Importance of Optimizing the Configuration of Stream Prefetches

Description:

Title: PowerPoint Presentation Author: Ilya Last modified by: Ben Zorn Created Date: 9/28/2003 2:18:02 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 18

Provided by: Ilya55

Category:

more less

Transcript and Presenter's Notes

Title: On the Importance of Optimizing the Configuration of Stream Prefetches

1
On the Importance of Optimizing the Configuration
of Stream Prefetches

Ilya Ganusov
Martin Burtscher

Computer Systems Laboratory Cornell University
2
Introduction

Memory wall
Increasing gap between processor and memory
speeds
Concentration on bandwidth at the expense of
latency
Prefetch important data
Do not wait until the processor requests data
Pro-actively fetch the data that is likely to be
consumed in the near future

3
Stream Prefetching

Prefetching with outcome-based prediction
Use the history of previous misses to guess data
addresses that are likely to miss soon
Stream prefetching
A special case of outcome-based prediction
Proposed 15 years ago
The only hardware prefetching scheme used in
modern microprocessors

4
Contributions

Detailed sensitivity analysis of main prefetcher
parameters on SPECcpu2000 programs
No such study in the literature
Many research papers fail to specify prefetcher
parameters in comparative studies
Case study
Evaluate performance of Runahead execution on a
baseline with different stream prefetcher
parameters

5
Outline

Introduction
Stream Prefetcher Operation
Evaluation Methodology
Experimental Results
Conclusion

6
How Stream Prefetchers Work
Global miss history
miss addr
addr addr addr addr
Stream table
valid stream address stride
valid stream address stride

valid stream address stride
AGU

addr stride lookahead
Stream exists?
prefetch addr
7
Measured Parameters
miss history length
miss addr
addr addr addr addr
valid stream address stride
valid stream address stride

valid stream address stride
Number of supported streams
prefetch distance
AGU

addr stride lookahead
Stream exists?
prefetch addr
8
Evaluation Methodology

Benchmarks
22 SPECcpu2000 programs, highly optimized
All F77, C, and C programs
Multiple reference inputs per program
SimPoint interval of 500 million instructions
Simulated architecture
SimpleScalar v4.0 cycle-accurate simulator
Aggressive superscalar Alpha 21264-like core

9
Simulated System
Execution Core Execution Core
Fetch/issue/commit 4/4/4
I-window/ROB/LSQ 64/128/64
LdSt/Int/FP units 2/4/2
Execution latencies Similar to Alpha 21264
Branch predictor 16K-entry bimodal/gshare hybrid
Memory Subsystem Memory Subsystem
Cache sizes 64KB IL1, 64KB DL1, 1MB L2
Cache associativity 2-way L1, 4-way L2
Cache latencies 2 cyc L1, 20 cyc L2
Main memory latency 400 cycles
10
Outline

Introduction
Motivation
Implementation
Experimental Results
Conclusion

11
Miss History Length
7 programs are very sensitive
16-entry history is enough
12
Number of Stream Table Entries
only 3 programs are sensitive
gt 8 streams provides little benefit
13
L2 Cache Prefetch Distance
11 programs are very sensitive
FP speedup varies by 80 - 140
14
Case Study Runahead Execution

Performance of stream prefetching is highly
dependent on parameter choice
Another proposal Runahead execution
Pseudo-retire long latency loads stalling the
pipeline and continue executing
Roll back to checkpoint after load comes back
from memory

15
Speedup over Stream Prefetching

SPEC fp speedup drops by gt 2x

16
Conclusion

Key observations
The performance of the stream prefetcher is
highly dependent on its configuration
Varying the prefetch distance alone almost
doubles the average performance benefit
Choosing a non-optimal stream prefetcher as a
baseline can distort results by a factor of two
Conclusion
Parameter optimizations are imperative when
comparing stream prefetchers to other prefetching
techniques

17
On the Importance of Optimizing the Configuration
of Stream Prefetches