Design and Optimization of Large Size and Low Overhead OffChip Caches - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Design and Optimization of Large Size and Low Overhead OffChip Caches

Description:

Inefficient Use of CPU speed. Emergence of increasingly memory intensive applications ... 11 memory-intensive programs from the SPEC CPU2000 benchmark suite ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 21
Provided by: JS245
Category:

less

Transcript and Presenter's Notes

Title: Design and Optimization of Large Size and Low Overhead OffChip Caches


1
Design and Optimization of Large Size and Low
Overhead Off-Chip Caches
  • Zhao Zhang, Zhichun Zhu, Xiaodong Zhang

Presentation by Phaneeth Junga Manjeera
Jeedigunta
2
Introduction
  • Processor-Memory speed gap
  • Inefficient Use of CPU speed
  • Emergence of increasingly memory intensive
    applications

Need for better memory hierarchy architecture
3
Present Situation
On-Chip
L1
L2
Off-Chip
L3
4
SRAM Vs DRAM
SRAM Strengths High Speed Drawbacks Low
Density High Cost SRAM Size of gt 10MB is not
practical Used in L2 on-chip Cache L3
off-chip Cache
DRAM Strengths Low Cost High Density Drawbacks L
ow Speed Used in Main Memory
5
Ideal L3 Cache
  • Large enough to hold working sets of most
    applications
  • Fast enough to reduce access latency
  • SRAM-Currently being used
  • Fast but not Large enough
  • DRAM
  • Large but slow

6
Proposed Architecture
SRAM
DRAM
Cached DRAM Cache
DRAM Cache CDC-DRAM SRAM Cache CDC Cache
7
Features of CDC
  • Structure of Sector Cache
  • Small amount of logic inside the processor chip
    tag cache and CDC controller
  • On-chip hit/miss predictor

Results Large Capacity equivalent to DRAM
size Low average latency close to SRAM
8
CDC Design
Structure of Cache
  • CDC-DRAM has a structure of sector cache
  • CDC-DRAM block size Main memory Page size
  • Sub-block size L2 cache block size
  • Design exploits spatial locality without
    increasing memory bandwidth consumption

9
CDC Design
On-Chip CDC Tag Cache
  • Recently accessed CDC-DRAM pages are stored in
    CDC-Cache
  • Tags of CDC-Cache pages are stored in CDC-tag
    cache
  • On L1 miss address tags are compared with tag
    cache in parallel with L2 access
  • Minimizes cache miss overhead

10
CDC Design
Miss/Hit Predictor
  • Two level adaptive predictor using a global
    history table and a global pattern table
  • Originally designed for dynamic branch
    predictions
  • Very simple and effective scheme

11
Mapping Methods Considered
  • Direct Mapping
  • Each cache block is mapped onto a single location
    in the CDC-DRAM
  • Strengths
  • Simple requires only one tag for each page
  • Drawbacks
  • Storage may not be efficiently used

12
Mapping Methods Considered
  • De-Coupled Sector Cache Mapping
  • Data Direct Mapping
  • Tags Set-Associative Mapping
  • Reduces the chance of page thrashing when two
    pages conflict in the CDC

13
Experimental Set-Up
14
Results
  • Performance of the CDC Vs 8MB SRAM L3 Cache under
    the same processor configuration
  • 11 memory-intensive programs from the SPEC
    CPU2000 benchmark suite have used to test the
    performance
  • CDC outperforms the L3 SRAM cache for most
    programs by up to 51
  • Average performance improves by 25

15
Hit rates
16
Accuracies of CDC Hit/Miss Predictor
The average accuracies are 95.0, 96.4 and
97.2 with the 32MB, 64MB and 128MB CDCs
respectively
17
Related work
  • Commercial Products Enhanced DRAM and Virtual
    Channel SDRAM
  • IRAM (Intelligent RAM) combines Processor and
    DRAM onto the same chip
  • In general, processors with on-chip DRAM are
    targeted for embedded applications that highlight
    the considerations of power and cost rather than
    performance
  • The approach of attaching a fast, small device to
    a slow device of a similar media has also
    appeared in operating systems and I/O areas

18
Conclusions
  • CDC addresses to major concerns of SRAM off-chip
    cache
  • Size
  • Miss overhead
  • CDC equivalent to DRAM in terms of capacity and
    close to SRAM in terms of speed
  • Very Beneficial to memory intensive applications

19
Future Work
  • Putting data with spatial locality into the CDC
    and data with temporal locality in on-chip cache
  • Examining the use of aggressive pre-fetching
    techniques

20
Thank You!!!
Write a Comment
User Comments (0)
About PowerShow.com