Reducing Code Size with Runtime Decompression - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Reducing Code Size with Runtime Decompression

Description:

Reducing Code Size with Run-time Decompression. Charles Lefurgy, Eva Piccininni, ... Code bloat. Solution: code compression. Reduce compiled code size ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 20
Provided by: charles161
Category:

less

Transcript and Presenter's Notes

Title: Reducing Code Size with Runtime Decompression


1
Reducing Code Size with Run-time Decompression
  • Charles Lefurgy, Eva Piccininni,
  • and Trevor Mudge
  • Advanced Computer Architecture Laboratory
  • Electrical Engineering and Computer Science Dept.
  • The University of Michigan, Ann Arbor
  • High-Performance Computer Architecture (HPCA-6)
  • January 10-12, 2000

2
Motivation
  • Problem embedded code size
  • Constraints cost, area, and power
  • Fit program in on-chip memory
  • Compilers vs. hand-coded assembly
  • Portability
  • Development costs
  • Code bloat
  • Solution code compression
  • Reduce compiled code size
  • Take advantage of instruction repetition
  • Implementation
  • Hardware or software?
  • Code size?
  • Execution speed?

ROM Program
RAM
CPU
I/O
Original Program
RAM
CPU
ROM
I/O
Compressed Program
Embedded Systems
3
Software decompression
  • Previous work
  • Decompression unit whole program Tauton91
  • No memory savings
  • Decompression unit procedures Kirovski97Ernst9
    7
  • Requires large decompression memory
  • Fragmentation of decompression memory
  • Slow
  • Our work
  • Decompression unit 1 or 2 cache-lines
  • High performance focus
  • New profiling method

4
Dictionary compression algorithm
  • Goal fast decompression
  • Dictionary contains unique instructions
  • Replace program instructions with short index

32 bits
16 bits
32 bits
5
lw r15,r3
5
lw r15,r3
30
.dictionary segment
lw r15,r3
30
lw r15,r3
30
.text segment
.text segment (contains indices)
Original program
Compressed program
5
Decompression
  • Algorithm
  • 1. I-cache miss invokes decompressor (exception
    handler)
  • 2. Fetch index
  • 3. Fetch dictionary word
  • 4. Place instruction in I-cache (special
    instruction)
  • Write directly into I-cache
  • Decompressed instructions only exist in I-cache

Memory
?
?
?
Add r1,r2,r3
I-cache
Dictionary
Proc.
Indices
5
...
?
D-cache
6
CodePack
  • Overview
  • IBM
  • PowerPC
  • First system with instruction stream compression
  • Decompress during I-cache miss
  • Software CodePack

7
Compression ratio
  • CodePack 55 - 63
  • Dictionary 65 - 82

8
Simulation environment
  • SimpleScalar
  • Pipeline 5 stage, in-order
  • I-cache 16KB, 32B lines, 2-way
  • D-cache 8KB, 16B lines, 2-way
  • Memory 10 cycle latency, 2 cycle rate

9
Performance
  • CodePack very high overhead
  • Reduce overhead by reducing cache misses

10
Cache miss
  • Control slowdown by optimizing I-cache miss ratio

11
Selective compression
  • Hybrid programs
  • Only compress some procedures
  • Trade size for speed
  • Avoid decompression overhead
  • Profile methods
  • Count dynamic instructions
  • Example Thumb
  • Use when compressed code has more instructions
  • Reduce number of executed instructions
  • Count cache misses
  • Example CodePack
  • Use when compressed code has longer cache miss
    latency
  • Reduce cache miss latency

New!
12
Cache miss profiling
  • Cache miss profile reduces overhead 50
  • Loop-oriented benchmarks benefit most
  • Approach performance of native code

13
CodePack vs. Dictionary
  • More compression may have better performance
  • CodePack has smaller size than Dictionary
    compression
  • Even with some native code, CodePack is smaller
  • CodePack is faster due to using more native code

14
Conclusions
  • High-performance SW decompression possible
  • Dictionary faster than CodePack, but 5-25
    compression ratio difference
  • Hardware support
  • I-cache miss exception
  • Store-instruction instruction
  • Tune performance by reducing cache misses
  • Cache size
  • Code placement
  • Selective compression
  • Use cache miss profile for loop-oriented
    benchmarks
  • Code placement affects decompression overhead
  • Future unify code placement and compression

15
Web page
http//www.eecs.umich.edu/compress
16
Code placement
Original code
Memory
Whole compression
decompress region (in L1 cache only)
compressed code
Decompress
Same order
Selective compression
native region
compressed code
decompress region
Decompress
Different order!
17
Hardware or software decompression?
  • Hardware
  • Fast translation
  • Potential speedup
  • Tune compression for each benchmark
  • Software
  • Low cost
  • Re-target for new algorithms
  • New algorithm for each benchmark
  • Slow

18
CodePack encoding
  • 32-bit insn is split into 2 16-bit words
  • Each 16-bit word compressed separately

Encoding for upper 16 bits
Encoding for lower 16 bits
Encodes zero
0
0
x
x
x
0
0
8
1
32
0
1
x
x
x
x
x
0
1
x
x
x
x
16
64
1
0
0
x
x
x
x
x
x
1
0
0
x
x
x
x
x
23
128
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
128
1
0
1
1
0
1
256
1
1
0
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
1
1
0
256
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
1
1
1
1
1
1
Tag
Escape
Index
Raw bits
19
CodePack decompression
31
26
25
6
5
0
L1 I-cache miss address
Index table(in main memory)
Fetch index
Byte-aligned block address
Compressed bytes (in main memory)
Fetch compressed instructions
Compression Block(16 instructions)
Hi tag
Low tag
Low index
Hi index
1 compressed instruction
Decompress
High dictionary
Low dictionary
High 16-bits
Low 16-bits
Native Instruction
Write a Comment
User Comments (0)
About PowerShow.com