Title: Minimizing Memory Access Energy in Embedded Systems by Selective Instruction Compression
1Minimizing Memory Access Energy in Embedded
Systems by Selective Instruction Compression
- Luca Benini ,Enrico Macii ,Alberto Macii ,Massimo
Poncino - IEEE Transactions on Very Large Scale Integration
Systems, vol. 10, no. 5, Oct 2002 - Presenter Chi-Hung Lin
2Abstract
- We propose a technique for reducing the
energy spent in the memory-processor interface of
an embedded system during the execution of
firmware code. The method is based on the idea of
compressing the most commonly executed
instructions so as to reduce the energy
dissipated during memory access. - Instruction decompression is performed
on-the-fly by a hardware block located between
processor and memory No changes to the processor
architecture are required. Hence, our technique
is well suited for systems employing IP cores
whose internal architecture cannot be modified. - We describe a number of decompression schemes
and architectures that effectively trade off
hardware complexity and static code size increase
for memory energy and bandwidth reduction, as
proved by the experimental data we have collected
by executing several test programs on different
design templates.
3What's the problem ?
- Power optimization for embedded systems becomes
an important topic in recent times. So hardware
power minimization is essential, especially if it
is targeted at a very high level of abstraction
4Introduction
- Several techniques focusing on memory-processor
interface power reduction have been proposed in
the literature. They can be categorized into two
broad classes - bus encoding techniques
- memory organization techniques
5Bus encoding techniques
- Reduce interface power by changing the format of
the information transmitted on the
processor-memory bus. In this way, the switching
activity on the bus gets minimized, and so does
the power.
6Memory organization techniques
- Change the way information is stored in memory so
that the address streams generated by the
processor have already low transition activity.
Also in this case, power savings come solely from
reduced switching activity on the bus.
7An approach by Yoshida of Memory organization
techniques
- Idea
- Firmware running on a given embedded processor
normally uses only a small subset of the
instructions supported by the processor. - Implementation
- Replacing such instructions with binary patterns
of limited width (i.e.,log2N , where is the N
number of distinct instructions appearing in the
code) - The original instruction will be placed in
instruction decompression table (IDT)
8Hardware
9The approach by this paper
- Idea
- Yoshidos design have three disadvantage if the
number of instructions used by the program
becomes large - the IDT can become very large
- the bit-width of the compressed instructions may
become large - in case the memory is not bit-addressable, values
of that are not multiples of eight cannot be
handled very efficiently. - So this paper focus on the instructions (256
instructions) that will be used more often than
others.
10Complete profiling results
11The approach by this paper
- Implementation
- We propose to compress only a subset of fixed
cardinality (256 elements, in our specific case)
of the instructions used by a program - Less probable instructions are left unchanged and
stored as they are in memory. - So there will be compressed instructions and
uncompressed instructions both in the memory
12Hardware
13Evaluation Metric
- Energy (E)
- Energy required by a program to fetch all the
instructions - Memory traffic (T)
- Total number of times the memory buses are used
to fetch instructions - Memory usage (U)
- The memory size to store the executable code
- Compression ratio (R)
- R0 indicates no compression
- R1 indicates all instructions of the program
have been compressed
14Memory Schemes
- Devised three architectural schemes for code
compression they differ in the way memory is
organized and accessed. - Architecture 1
- Architecture 2
- Architecture 3.1
- Architecture 3.2
15Architecture 1
- The program memory consists of one 8-bit bank
Original Architecture
Architecture 1
are the instructions that will be compressed
denotes branch target instructions
16Architecture 2
- Program memory consists of four 8-bit-wide banks
Original Architecture
Architecture 2
are the instructions that will be compressed
denotes branch target instructions
17Architecture 3.1
- Program memory is organized as a single 32-bit
bank
Architecture 3.1
Original Architecture
are the instructions that will be compressed
denotes branch target instructions
18Architecture 3.2
- Program memory is organized as a single 32-bit
bank
Architecture 3.2
Original Architecture
are the instructions that will be compressed
denotes branch target instructions
19Experimental result Energy (E)
20Experimental result Traffic (T)
21Experimental result Memory Usage (U)
22Decompression Unit
23Conclusion
- The approach is based on the selection of a
dense subset of the instructions of a program.
The instructions in this subset are encoded with
8-bit patterns and stored in memory instead of
the original (32-bit) instructions in this way,
memory bandwidth is reduced, and so is the energy
required to fetch the program from memory.