Title: Improving the Efficiency of Memory Partitioning by Address Clustering
1Improving the Efficiency of Memory Partitioning
by Address Clustering
- Alberto Macii Enrico Macii Massimo Poncino
- Proceedings of the Design ,Automation and Test in
Europe Conference and Exhibition - Presenter Hung Yu Chen
2Abstract
- Memory partitioning is a effective approach to
memory energy optimization in embedded systems.
Spatial locality of the memory address profile is
the key property that partitioning exploits to
determine an efficient multi-bank memory
architecture.This paper presents an approach,
called address clustering, for increasing the
locality of given memory access profile, and thus
improving the efficiency of partitioning.Results
obtained on several embedded applications running
on an ARM7 core show average energy reductions of
25 (maximum 57) w.r.t a partitioned memory
architecture synthesized without resorting to
address clustering.
3Outline
- Whats the problem?
- Memory Energy
- Memory Partitioning
- Address Clustering
- Experimental Result
- Conclusions
4Whats the problem?
- Modern SoC platforms usually contain one or more
processors. - the increasing gap between processor and memory
speed. - Various types of on-chip embedded memories
providing shorting latencies and wider
interfaces. - Problem
- Ubiquity of embedded memories makes them the
largest contributor to the overall energy budget
of a chip.
5Memory Energy
- ModelEmen ?Ni1 Cost(i)
- Nnumber of accesses during the computation.
- Cost(i) cost of an access due to the memory
organization and the cost of the physical
access given by technology. - Memory energy optimization
- Reducing Cost(i)
- build low-energy memory architecture.
- Reducing N
- modify the memory access pattern.
- Both two.
6Memory Partitioning
- memory partitioning technique.
7Memory Partitioning (cont.)
- Figure 1-a
- The whole address space of the application is
mapped to a single SRAM memory array. - Figure 1-b
- A dynamic access profile.
- Figure 1-c
- The partitioned memory.
- Notice that we need to account for the power
consumed in the entire partitioned memory system.
8Address Clustering-Example
- MPEG Decoding application for ARM7 core
- Instruction stream
9Address Clustering-Example (cont.)
- Figure 2 show
- Total number of addresses 31,233 (range from 0
to 124,892) - Memory cut has 1,952 rows 512 columns.
- Power consumes 170mJ. (44.4 million total read)
- Memory partitioning
- Three memory blocks of sizes
- 736256 696512 892512
- Power consumes 96mJ. (inclusive of the overhead)
- 43.5 Energy reduction
- 696512 keep the majority (82) of the memory
accesses. (36 million out of 44.4)
10Address Clustering-Example (cont.)
- Figure 3 Clustered Address Profile of a MPEG
Decoder - Two memory block sizes 212128 1900512
- Power 42mJ. (an additional 56 of energy saved)
- 99 of the memory access. (43.99 million out of
44.4 )
11Address Clustering-Problem
- Find a relocation of a proper subset of the
address space. - Maximize the locality of the dynamic trace.
- Minimizing the energy consumption of the memory
architecture - Cost Metrics
- Dynamic access profile C c0,c1,.,cN-1
- D(C,W) maxi (Si) , i 0, 1, , N-W
- (Si) ?W-1j0 cij , W a sliding window of
size - d(C,W) D(C,W) / Tot.
- Tot ?Ni0 Ci
12Address Clustering-Problem (cont.)
- Figure4 shows the values of d(C,W) for w 32,
64, 128, 256, 512, about Figure2.
80
13Address Clustering-Exploration
- High-level pseudo-code
- Explore find a good value of W
14Address Clustering-Clustering Algorithm
- Cluster returns a modified trace whose first M
locations contain the M most visited addresses.
15Address Clustering-Encoder
- Hardware Encode
- the swap of address pair -gt 2M Cluster Address.
- f(X) represents a function if X belongs to the
set of 2M. - Clustering address X R(X).
- 32 input, combinational network.
16Experimental Result
- Benchmarks are taken from the Ptolemy
distribution, others come from the MediaBench
suite. - Platform ARM software development kit.
- Table1
- Addr total number of distinct addresses.
- Emono the energy of the monolithic memory that
contains all the
data/instructions. - Epartitioned total memory energy of a
partitioned memory architecture. - M 256, 512, 1024 memory partitioning combined
with address clustering.
17Experimental Result (cont.)
18Experimental Result (cont.)
- Original vs. Clustering (Energy)
19Encoder Overhead Analysis
- Encoders have been synthesized with Synopsys
DesignCompier on a 0.25um technology by
STMicroelectronics - Power figure (Figure 8) are obtained with
Synopsys PowerCompier. - The energy figures over the various applications
is relatively small - The complexity of the decoder is basically
independent of the set of addresses that are
clustered. - The switching activity of the address lines is
very similar for all benchmarks.
20Encoder Overhead Analysis (cont.)
- 16K memory which dissipates about 375 mW
- frequency of 150Mhz.
- Power 7.5 mW for M 1024.
21Conclusions
- Energy reduction achievable by memory
partitioning technology can be improved sensibly
by increasing the locality of the trace. - Proposed an architectural solution, called
Address Clustering. - Experimental results on a set of typical embedded
applications running on an ARM-based system. - Address Clustering is able to reduce the energy
consumption of a partitioned memory architecture
by 25 on average (maximum 57) with respect to
the partitioning driving by the original trace.