Title: Using a Victim Buffer in an ApplicationSpecific Memory Hierarchy
1Using a Victim Buffer in an Application-Specific
Memory Hierarchy
- Chuanjun Zhang, Frank Vahid
- Dept. of Electrical Engineering
- Dept. of Computer Science and Engineering
- University of California, Riverside
- Also with the Center for Embedded Computer
Systems at UC Irvine - This work was supported by the National Science
Foundation and the Semiconductor Research
Corporation
2Low Power/Energy Techniques are Essential
- Low energy dissipation is imperative for
battery-driven embedded systems - Low power techniques are essential to both
embedded systems and high performance processors
Skadron et al., 30th ISCA
Hot enough to cook an egg.
- High performance processors are going to be too
hot to work
3Caches Consume Much Power
- Caches consume 50 of total processor system
power - ARM920T and MCORE(Segars 01, Lee 99)
- Caches accessed often
- Consume dynamic power
- Associativity reduces misses
- Less power off-chip, but more power per access
- Victim buffer helps (Jouppi 90)
- Add to direct-mapped cache
- Keep recently evicted lines in small buffer,
check on miss - Like higher-associativity, but without extra
power per access - 10 energy savings, 4 performance improvement
(Albera 99)
gt50
Processor
Victim buffer
Cache
Memory
4Victim Buffer
- With a victim buffer
- One cycle on a cache hit
- Two cycles on a victim buffer hit
- Twenty two cycles on a victim buffer miss
- Without a victim buffer
- One cycle on a cache hit
- Twenty one cycles on a victim buffer miss
- More accesses to off-chip memory
PROCESSOR
HIT
HIT
MISS
Victim buffer
OFFCHIP MEMORY
5Cache Architecture with a Configurable Victim
Buffer
data to processor
- Is a victim buffer a useful configurable cache
parameter? - Helps for some applications
- For others, not useful
- VB misses, so extra cycle wasteful?
- Thus, want ability to shut off VB for given app.
- Hardware overhead
- One bit register
- A switch
- Four-line victim buffer shown
VB on/off
L1 cache
tag
data
reg
SRAM
SRAM
from cache control circuit
Vdd
cache control circuit
victim line
s
1
0
data from next level memory
27-bit tag
16-byte cache line data
to mux
control signals
SRAM
CAM
Fully-associative victim buffer
control signals to the next level memory
6Hit Rate of a Victim Buffer
Data cache
Instruction cache
Hit rate of victim buffer when added to an 8
Kbyte, 4 Kbyte, or 2 Kbyte direct-mapped
cache Benchmarks from Powerstone, MediaBench, and
Spec 2000.
7Computing Total Memory-Related Energy
- Consider CPU stall energy and off-chip memory
energy - Excludes CPU active energy
- Thus, represents all memory-related energy
energy_mem energy_dynamic energy_static
energy_dynamic cache_hits energy_hit
cache_misses energy_miss energy_miss
energy_offchip_access energy_uP_stall
energy_cache_block_fill energy_static cycles
energy_static_per_cycle
energy_miss k_miss_energy energy_hit
energy_static_per_cycle k_static
energy_total_per_cycle (we varied the ks to
account for different system implementations)
- Underlined measured quantities
- SimpleScalar (cache_hits, cache_misses, cycles)
- Our layout or data sheets (others)
8Performance and Energy Benefits of Victim Buffer
with a Direct-Mapped Cache
Configurable victim buffer is clearly useful to
avoid performance penalty for certain applications
9Is a Configurable Victim Buffer Useful Even With
a Configurable Cache
- We showed that a configurable cache can reduce
memory access power by half on average - (Zhang/Vahid/Najjar ISCA 03, ISVLSI 03)
- Software-configurable cache
- Associativity 1, 2 or 4 ways
- Size 2, 4 or 8 Kbytes
- Does that configurability subsume usefulness of
configurable victim buffer?
10Best Configurable Cache with VB Configurations
- Optimal cache configuration when cache
associativity, cache size, and victim buffer are
all configurable. - I and D stands for instruction cache and data
cache, respectively. - V stands for the victim buffer is on.
- nK stands for the cache size is n Kbyte.
- The associativity is represented by the last four
characters - Benchmark vpr, I2D1 stands for two-way
instruction cache and direct-mapped data cache. - Note that sometimes victim buffer should be on,
sometimes off
11Performance and Energy Benefits of Victim Buffer
Added to a Configurable Cache
- An 8-line victim buffer with a configurable
cache, whose associativity, size, and line size
are configurable (0optimal config. without VB)
- Still surprisingly effective
12Conclusion
- Configurable victim buffer useful with
direct-mapped cache - As much as 60 energy and 4 performance
improvements for some applications - Can shut off to avoid performance penalty on
other apps. - Configurable victim buffer also useful with
configurable cache - As much as 43 energy and 8 performance
improvement for some applications - Can shut off to avoid performance overhead on
other applications - Configurable victim buffer should be included as
a software-configurable parameter to
direct-mapped as well as configurable caches for
embedded system architectures