Reducing DRAM Latencies with an Integrated Memory Hierarchy Design - PowerPoint PPT Presentation

1 / 23

About This Presentation

Title:

Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

Description:

IPCReal Instructions per cycle with real memory system ... Thus proposed prefetch scheme overshadows the software prefetching benefits. 21. OUTLINE ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 24

Provided by: Pra855

Category:

more less

Transcript and Presenter's Notes

Title: Reducing DRAM Latencies with an Integrated Memory Hierarchy Design

1

Reducing DRAM Latencies with an Integrated
Memory Hierarchy Design
Authors
Wei-fen Lin and Steven K. Reinhardt, University
of Michigan
Doug Burger, University of Texas at Austin
Presentation
by
Pravin Dalale

2
OUTLINE

Motivation
Main idea in the paper
- Analysis
- Main idea
Prefetch engine
- Insertion policy
- Prefetch scheduling
Results
Conclusion

3
Motivation
Memory density and capacity have grown along with
the CPU power and complexity, but memory speed
has not kept pace.
4
Solutions

Multithreading
Multiple levels of caches
Prefetching

5
OUTLINE

Motivation
Main idea in the paper
- Analysis
- Main idea
Prefetch engine
- Insertion policy
- Prefetch scheduling
Results
Conclusion

6
Analysis (1)

IPCReal Instructions per cycle with real memory
system
IPCPerfectL2 - Instructions per cycle with real
L1 cache but perfect L2 cache
IPCPerfectMem - Instructions per cycle with
perfetct L1 cache but perfect L2 cache

7
Analysis (2)

Fraction of performance lost due to imperfect L1
and L2
(IPCPerfectMem IPCReal) / IPCPerfectMem
Fraction of performance lost due to imperfect L2
(IPCPerfectL2 IPCReal) / IPCPerfectL2

8
Analysis (3)

Simulated 1.6GHz, out-of-order core
64KB L1
1MB L2
Direct Rambus Memory System with four 1.6GB/s
channels
The 26 SPEC benchmarks were tested on this system
to obtain IPCReal, IPCPerfectL2, IPCPerfectMem.

9
Analysis (4)
L2 stall fraction is 80 for mcf benchmark
Average stall fraction caused by L2 misses is 57
over 26 SPEC CPU2000 benchmarks
10
Main idea

Paper describes technique to reduce L2 miss
latencies
Introduces a prefetch engine to prefetch data to
L2 cache upon a L2 demand miss

11
OUTLINE

Motivation
Main idea in the paper
- Analysis
- Main idea
Prefetch Engine
- Insertion policy
- Prefetch scheduling
Results
Conclusion

12
Prefetch Engine
1
3
2
13
Prefetch Engine
1 Prefetch queue maintains the list of n region
entries not in L2 cache 2 Prefetch prioritizer
uses the bank state and the region age to
determine which prefetch to issue next. 3 Access
prirotizer selects a prefetch in case of no
demand misses
14
Insertion policy (1)

The prefetched block may be loaded into L2 with
one of four priorities
1. most-recently-used (MRU)
2. second-most-recently-used (SMRU)
3. second-least-recently-used (SLRU)
4. least-recently-used (LRU)

15
Insertion policy (2)

Benchmarks were divided into two classes
High (above 20) prefetch accuracy benchmarks
Low (below 20) prefetch accuracy benchmarks
All benchmarks were tested for four possible
insertion policies.

LRU insertion policy gives best results in both
categories.
16
Prefetch Scheduling

Simple aggressive prefetching can consume large
amount of bandwidth and cause channel contention
This large contention at channel can be avoided
scheduling prefetch accesses onlt when Rambus
channels are idle

17
OUTLINE

Motivation
Main idea in the paper
- Analysis
- Main idea
Prefetch engine
- Insertion policy
- Prefetch scheduling
Results
Conclusion

18
Results (1/3) - Overall performance improvement
The performance with prefetching is very close to
that of perfect L2
19
Results (2/3) - Sensitivity of prefetch scheme to
DRAM latencies

Base DRDRAM had 40ns latency and 800MHz data
transfer rate
If latency is increased to 50ns the mean
performance of prefetch scheme reduces by less
than 1 as compared to the base system
If latency is reduced to 34ns the mean
performance of prefetch scheme was again reduced
by less than 2

20
Results (3/3) - Interaction with software
prefetching

When prefetch scheme is coupled with software
prefetching, none of the benchmarks improved
significantly (at most 2)
Thus proposed prefetch scheme overshadows the
software prefetching benefits

21
OUTLINE

Motivation
Main idea in the paper
- Analysis
- Main idea
Prefetch engine
- Insertion policy
- Prefetch scheduling
Results
Conclusion

22
Conclusions

Authors proposed and evaluated a prefetch
architecture, integrated with on chip L2 cache
This architecture involves aggressive prefetching
of large regions of data to L2 on demand misses
By scheduling these prefetches only during idle
cycles, inserting them into the cache with low
replacement prioroty a significant improvement is
obtained in 10 of 26 SPEC benchmarks

23
QUESTIONS

Write a Comment

User Comments (0)