Instruction Based Memory Distance Analysis and its Application to Optimization - PowerPoint PPT Presentation


PPT – Instruction Based Memory Distance Analysis and its Application to Optimization PowerPoint presentation | free to view - id: 6b5619-YjBlO


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Instruction Based Memory Distance Analysis and its Application to Optimization


Instruction Based Memory Distance Analysis and its Application to Optimization Changpeng Fang Steve Carr Soner nder Zhenlin Wang Motivation Widening gap between ... – PowerPoint PPT presentation

Number of Views:1
Avg rating:3.0/5.0
Date added: 26 March 2020
Slides: 36
Provided by: StevenM176


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Instruction Based Memory Distance Analysis and its Application to Optimization

Instruction Based Memory Distance Analysis and
its Application to Optimization
  • Changpeng Fang
  • Steve Carr
  • Soner Önder
  • Zhenlin Wang

  • Widening gap between processor and memory speed
  • memory wall
  • Static compiler analysis has limited capability
  • regular array references only
  • index arrays
  • integer code
  • Reuse distance prediction across program inputs
  • number of distinct memory locations accessed
    between two references to the same memory
  • applicable to more than just regular scientific
  • locality as a function of data size
  • predictable on whole program and per instruction
    basis for scientific codes

  • Memory distance
  • A dynamic quantifiable distance in terms of
    memory reference between tow access to the same
    memory location.
  • reuse distance
  • access distance
  • value distance
  • Is memory distance predictable across both
    integer and floating-point codes?
  • predict miss rates
  • predict critical instructions
  • identify instructions for load speculation

Related Work
  • Reuse distance
  • Mattson, et al. 70
  • Sugamar and Abraham 94
  • Beyls and DHollander 02
  • Ding and Zhong 03
  • Zhong, Dropsho and Ding 03
  • Shen, Zhong and Ding 04
  • Fang, Carr, Önder and Wang 04
  • Marin and Mellor-Crummey 04
  • Load speculation
  • Moshovos and Sohi 98
  • Chyrsos and Emer 98
  • Önder and Gupta 02

  • Memory distance
  • can use any granularity (cache line, address,
  • either forward or backward
  • represented as a pattern
  • Represent memory distance as a pattern
  • divide consecutive ranges into intervals
  • we use powers of 2 up to 1K and then 1K intervals
  • Data size
  • the largest reuse distance for an input set
  • characterize reuse distance as a function of the
    data size
  • Given two sets of patterns for two runs, can we
    predict a third set of patterns given its data

  • Let be the distance of the ith bin in the
    first pattern and be that of the second
    pattern. Given the data sizes s1 and s2 we can
    fit the memory distances using
  • Given ci, ei, and fi, we can predict the memory
    distance of another input set with its data size

Instruction Based Memory Distance Analysis
  • How can we represent the memory distance of an
  • For each active interval, we record 4 words of
  • min, max, mean, frequency
  • Some locality patterns cross interval boundaries
  • merge adjacent intervals, i and i 1, if
  • merging process stops when a minimum frequency is
  • needed to make reuse distance predictable
  • The set of merged intervals make up memory
    distance patterns

Merging Example
What do we do with patterns?
  • Verify that we can predict patterns given two
    training runs
  • coverage
  • accuracy
  • Predict miss rates for instructions
  • Predict loads that may be speculated

Prediction Coverage
  • Prediction coverage indicates the percentage of
    instructions whose memory distance can be
  • appears in both training runs
  • access pattern appears in both runs and memory
    distance does not decrease with increase in data
    size (spatial locality)
  • same number of intervals in both runs
  • Called a regular pattern
  • For each instruction, we predict its ith pattern
  • curve fitting the ith pattern of both training
  • applying the fitting function to construct a new
    min, max and mean for the third run
  • Simple, fast prediction

Prediction Accuracy
  • An instructions memory distance is correctly
    predicted if all of its patterns are predicted
  • predicted and observed patterns fall in same
  • or, given two patterns A and B such that B.min ?
    A.max ? B.max

Experimental Methodology
  • Use 11 CFP2000 and 11 CINT2000 benchmarks
  • others dont compile correctly
  • Use ATOM to collect reuse distance statistics
  • Use test and train data sets for training runs
  • Evaluation based on dynamic weighting
  • Report reuse distance prediction accuracy
  • value and access very similar

Reuse Distance Prediction
Suite Patterns Patterns Coverage Accuracy
Suite constant linear Coverage Accuracy
CFP2000 85.1 7.7 93.0 97.6
CINT2000 81.2 5.1 91.6 93.8
Coverage issues
  • Reasons for no coverage
  • instruction does not appear in at least one test
  • reuse distance of test is larger than train
  • number of patterns does not remain constant in
    both training runs

Suite Reason 1 Reason 2 Reason 3
CFP2000 4.2 0.3 2.5
CINT2000 2.2 4.4 1.8
Prediction Details
  • Other patterns
  • 183.equake has 13.6 square root patterns
  • 200.sixtrack, 186.crafty all constant (no data
    size change)
  • Low coverage
  • 189.lucas 31 of static memory operations do
    not appear in training runs
  • 164.gzip the test reuse distance greater than
    train reuse distance
  • cache-line alignment

Number of Patterns
Suite 1 2 3 4 ?5
CFP2000 81.8 10.5 4.8 1.4 1.5
CINT2000 72.3 10.9 7.6 4.6 5.3
Miss Rate Prediction
  • Predict a miss for a reference if the backward
    reuse distance is greater than the cache size.
  • neglects conflict misses
  • Accurate miss rate prediction

Miss Rate Prediction Methodology
  • Three miss-rate prediction schemes
  • TCS test cache simulation
  • Use the actual miss rates from running the
    program on a the test data for the reference data
    miss rates
  • RRD reference reuse distance
  • Use the actual reuse distance of the reference
    data set to predict the miss rate for the
    reference data set
  • An upper bound on using reuse distance
  • PRD predicted reuse distance
  • Use the predicted reuse distance for the
    reference data set to predict the miss rate.

Cache Configurations
config no. L1 L2 L2
1 32K, fully assoc. 1M fully assoc.
2 3 4 32K, 2-way 1M 8-way 4-way 2-way
L1 Miss Rate Prediction Accuracy
CFP2000 97.5 98.4 95.1
CINT2000 94.4 96.7 93.9
L2 Miss Rate Accuracy
Suite 2-way 2-way 2-way Fully Associative Fully Associative Fully Associative
CFP2000 91 93 87 97 99.9 91
CINT2000 91 95 87 94 99.9 89
Critical Instructions
  • Given reuse distance for an instruction
  • Can we determine which instructions are critical
    in terms of cache performance?
  • An instruction is critical if it is in the set of
    instructions that generate the most L2 cache
  • Those top miss-rate instructions whose cumulative
    total misses account for 95 of the misses in a
  • Use the execution frequency of one training run
    to determine the relative contribution number of
    misses for each instruction
  • Compare the actual critical instructions with
  • Use cache configuration 2

Critical Instruction Prediction
Suite PRD RRD TCS pred act
CPF2000 92 98 51 1.66 1.67
CINT2000 89 98 53 0.94 0.97
Critical Instruction Patterns
Suite 1 2 3 4 ?5
CFP2000 22.1 38.4 20.0 12.8 6.7
CINT2000 18.7 14.5 25.5 22.5 18
Miss Rate Discussion
  • PRD performs better than TCS when data size is a
  • TCS performs better when data size doesnt change
    much and there are conflict misses
  • PRD is much better at identifying the critical
    instructions than TCS
  • these instructions should be targets of

Memory Disambiguation
  • Load speculation
  • Can a load safely be issued prior to a preceding
  • Use a memory distance to predict the likelihood
    that a store to the same address has not finished
  • Access distance
  • The number of memory operations between a store
    to and load from the same address
  • Correlated to instruction distance and window
  • Use only two runs
  • If access distance not constant, use the access
    distance of larger of two data sets as a lower
    bound on access distance

When to Speculate
  • Definitely no
  • access distance less than threshold
  • Definitely yes
  • access distance greater than threshold
  • Threshold lies between intervals
  • compute predicted mis-speculation frequency
  • speculate is PMSF lt 5
  • When threshold does not intersect intervals
  • total of frequencies that lie below the threshold
  • Otherwise

Value-based Prediction
  • Memory dependence only if addresses and values
  • store a1, v1 store a2, v2 store a3, v3 load
    a4, v4
  • Can move ahead if a1a2a3a4, v2v3 and v1?v2
  • The access distance of a load to the first store
    in a sequence of stores storing the same value is
    called the value distance

Experimental Design
  • SPEC CPU2000 programs
  • SPEC CFP2000
  • 171.swim, 172.mgrid, 173.applu, 177.mesa,, 183.equake, 188.ammp, 301.apsi
  • SPEC CINT2000
  • 164.gzip, 175.vpr, 176.gcc, 181.mcf, 186.crafty,
    197.parser, 253.perlbmk, 300.twolf
  • Compile with gcc 2.7.2 O3
  • Comparison
  • Access distance, value distance
  • Store set with 16KB table, also with values
  • Perfect disambiguation

issue width 8
fetch width 8
retire width 16
window size 128
load/store queue 128
functional units 8
fetch multiblock gshare
data cache perfect
memory ports 2
Operation Latency
load 2
int division 8
int multiply 4
other int 1
float multiply 4
float addition 3
float division 8
other float 2
IPC and Mis-speculation
Suite Access Distance Store Set 16KB Table Perfect
CFP2000 3.21 3.37 3.71
CINT2000 2.90 3.22 3.35
Suite Mis-speculation Rate Mis-speculation Rate Speculated Loads Speculated Loads
Suite Access Store Set Access Store Set
CFP2000 2.36 0.07 57.2 62.0
CINT2000 2.33 0.08 26.9 34.7
Value-based Disambiguation
Suite Value Distance Store Set 16KB Value
CFP2000 3.34 3.55
CINT2000 3.00 3.23
Suite Mis-speculation Rate Speculated Loads
CFP2000 1.22 59.3
CINT2000 1.55 27.6
Cache Model
Suite Access Store Set 16K
CFP2000 1.55 1.61
CINT2000 1.53 1.60
Suite Value Store Set 16K
CFP2000 1.59 1.63
CINT2000 1.55 1.65
  • Over 90 of memory operations can have reuse
    distance predicted with a 97 and 93 accuracy,
    for floating-point and integer programs,
  • We can accurately predict miss rates for
    floating-point and integer codes
  • We can identify 92 of the instructions that
    cause 95 of the L2 misses
  • Access- and value-distance-based memory
    disambiguation are competitive with best hardware
    techniques without a hardware table

Future Work
  • Develop a prefetching mechanism that uses the
    identified critical loads.
  • Develop an MLP system that uses critical loads
    and access distance.
  • Path-sensitive memory distance analysis
  • Apply memory distance to working-set based cache
  • Apply access distance to EPIC style architectures
    for memory disambiguation.