A Decompression Architecture for Low Power Embedded Systems - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

A Decompression Architecture for Low Power Embedded Systems

Description:

... the SAMC algorithm and this decompress architecture as another factor to simulate (This paper) ... stored in the I-cache is decompressed. Post-cache: ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 38
Provided by: impac1
Category:

less

Transcript and Presenter's Notes

Title: A Decompression Architecture for Low Power Embedded Systems


1
A Decompression Architecturefor Low Power
Embedded Systems
Yi-hsin Tseng Date11/06/2007
  • Lekatsas, H. Henkel, J. Wolf, W.
  • Computer Design, 2000. Proceedings. 2000
    International Conference on 2000 IEEE

2
Outline
  • Introduction motivation
  • Code Compression Architecture
  • Decompression Engine Design
  • Experimental results
  • Conclusion Contributions of the paper
  • Our project
  • Relate to CSE520
  • Q A

3
Introduction motivation
4
For Embedded system
  • More complicated architecture in embedded system
    nowadays.
  • Available memory space is smaller.
  • A reduced executable program can also indirectly
    affect the chip on
  • Size
  • Weight
  • Power consumption

5
Why code compression/decompression?
  • Compress the instruction segment of the
    executable running on the embedded system
  • Reducing the memory requirements and bus
    transaction overheads
  • Compression ? Decompression

6
Related work on compressed instructions
  • A logarithmic-based compression scheme where
    32-bit instructions map to fixed but smaller
    width compressed instructions.
  • (The system using memory area only)
  • Frequently appearing instructions are compressed
    to 8 bits.
  • (fixed-length 8 or 32 bits)

7
The compressed method in this paper
  • Give comprehensive results for the whole system
    including
  • buses
  • memories (main memory and cache)
  • decompression unit
  • CPU

8
Code Compression Architecture
9
Architecture in this system (Post-cache)
  • Reason ?
  • -Increase the effective cache size
  • Improve instruction bandwidth

10
Code Compression Architecture
  • Use SAMC to compress instructions
  • (Semiadaptive Markov Compression)
  • Divide instructions into 4 groups
  • based on SPARC architecture
  • appended a short code (3-bit) in the beginning of
    each compressed instruction

11
4 Groups of Instructions
  • Group 1
  • instructions with immediates
  • Ex sub i1, 2, g3 set 5000, g2
  • Group 2
  • branch instructions
  • Ex be, bne, bl, bg, ...
  • Group 3
  • instructions with no immediates
  • Ex add o1,o2,g3 st g1,o2
  • Group 4
  • Instructions that are left uncompressed

12
Decompression Engine Design(Approach)
13
The Key idea is.
  • Present an architecture for embedded systems that
    decompresses offline-compressed instructions
    during runtime
  • to reduce the power consumption
  • a performance improvement (in most cases)

14
Pipelined Design
15
Pipelined Design (cont)
16
Pipelined Design group 1 (stage 1)
Index the Dec. Table
Input Compressed Instructions
Forward instructions
17
Pipelined Design group 1 (stage 2)
18
Pipelined Design group 1 (stage3)
19
Pipelined Design group 1 (stage 4)
20
Pipelined Design group 2 branch instructions
(stage 1)
21
Pipelined Design group 2 branch instructions
(stage 2)
22
Pipelined Design group 2 branch instructions
(stage 3)
23
Pipelined Design group 2 branch instructions
(stage 4)
24
Pipelined Design group 3instructions with no
immediates (stage 1)
No immediate instructions may appear in pairs. -gt
compressed in one byte. (lt-gt 64 bits)
256 entry table
8 bits as index to address
25
Pipelined Design group 3instructions with no
immediates (stage 2)
26
Pipelined Design group 3instructions with no
immediates (stage 3)
27
Pipelined Design group 3instructions with no
immediates (stage 4)
28
Pipelined Design group 4 uncompressed
instructions
29
Experimental results
30
Experimental results
  • Use different applications
  • an algorithm for computing 3D vectors for a
    motion picture ("i3d)
  • a complete MPEGII encoder ("mpeg ")
  • a smoothing algorithm for digital images ("smo")
  • a trick animation algorithm ("trick")
  • A simulation tool written in C for obtaining
    performance data for the decompression engine

31
Experimental results (cont)
  • The decompression engine is application specific.
  • for each application -- build a decoding table
    and a fast dictionary table that will decompress
    that particular application only.

32
Experimental results for energy and performance
33
Worse performance on smo 512-byte instruction
cache? - Do not require large memory. (Execute
in tight loops) - Generates very few misses for
this cache size. (So the compressed
architecture therefore does not help an already
almost perfect hit ratio and the slowdown by
the decompression engine prevails)
34
Conclusion Contributions of the paper
  • This paper designed an instruction decompression
    engine as a soft IP core for low power embedded
    systems.
  • Applications run faster as opposed to systems
    with no code compression (due to improved cache
    performance).
  • Lower power consumption (due to smaller memory
    requirements for the executable program and
    smaller number of memory accesses)

35
Relate to CSE520
  • Implement the system performance and power
    consumption by using Pipeline Architecture in
    system.
  • A different architecture design for lower power
    consumption on the Embedded system.
  • Smaller cache size perform better on compressed
    architecture larger cache perform better on
    no-compressed architecture.
  • Cache hit ratio

36
Our project
  • Goal
  • How to improve the efficiency of power management
    in embedded multicore system
  • Idea
  • Use different power mode within a given power
    budget, global power management policy (In Jun
    Shens presentation)
  • Use the SAMC algorithm and this decompress
    architecture as another factor to simulate (This
    paper)
  • How?
  • SimpleScalar tool set
  • try simple function at first, then try the
    different power mode

37
Thank you!Q A
38
Backup Slides
39
Critique
  • The decompression engine will slowdown the system
    if the cache generate very few misses for some
    cache size.

40
Post-cache Pre-cache
Pre-cache The instruction stored in the I-cache
is decompressed.
Post-cache The instruction stored in the I-cache
is still decompressed.
41
Problems for post-cache arch
  • Memory Relocation
  • The compression will change the instruction
    location in the memory.
  • In pre-cache arch
  • Decompression is done before fetch into I-cache,
    so the address in the I-cache neednt to be fixed.

42
SPARC Instruction Set
  • Instruction groups
  • load/store (ld, st, ...)
  • Move data from memory to a register / Move data
    from a register to memory
  • integer arithmetic (add, sub, ...)
  • Arithmetic operations on data in registers
  • bit-wise logical (and, or, xor, ...)
  • Logical operations on data in registers
  • bit-wise shift (sll, srl, ...)
  • Shift bits of data in registers
  • integer branch (be, bne, bl, bg, ...)
  • Trap (ta, te, ...)
  • control transfer (call, save, ...)
  • floating point (ldf, stf, fadds, fsubs, ...)
  • floating point branch (fbe, fbne, fbl, fbg, ...)

43
SPARC Instruction Example
Write a Comment
User Comments (0)
About PowerShow.com