Automated Synthesis of Efficient Binary Decoders for Retargetable Software Toolkits - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Automated Synthesis of Efficient Binary Decoders for Retargetable Software Toolkits

Description:

2-4 times slower for instruction set simulation (ISS) ... Common software decoding schemes. Pattern testing ... Total memory of using a decoding function ... – PowerPoint PPT presentation

Number of Views:131
Avg rating:3.0/5.0
Slides: 29
Provided by: wei71
Category:

less

Transcript and Presenter's Notes

Title: Automated Synthesis of Efficient Binary Decoders for Retargetable Software Toolkits


1
Automated Synthesis of Efficient Binary Decoders
for Retargetable Software Toolkits
  • Wei Qin, Sharad Malik
  • Princeton University

2
Overview
  • Increasing number of ASIPs
  • Software tool chain to exploit programmability

Validation tools
disassembler
Synthesis tools
Machine Code
debugger
assembler/ linker
compiler
instruction set simulator
Cycle simulator
Binary translator
Machine Descriptions
3
Outline
  • Motivation
  • Background
  • Related Work
  • Problem Formulation
  • Decoder Construction
  • Experimental Results
  • Conclusion

4
Motivation
  • Software binary decoding
  • Sequential vs. hardware parallel
  • Control flow intensive
  • Error prone for complex instruction sets
  • Can be a performance bottleneck
  • 2-4 times slower for instruction set simulation
    (ISS)
  • Efficient decoder synthesis algorithm desirable
  • Focus on opcode decoding
  • Operand decoding is straightforward thereafter

5
Background
  • Instruction pattern
  • Common software decoding schemes
  • Pattern testing
  • (inst_word a_mask) a_signature ?
    instruction a
  • Table lookup
  • inst_table(inst_wordgtgtshift) mask ? get id
  • or
  • switch ((inst_wordgtgtshift) mask)
  • case id1 ? handle
    id1

----00101000-------------------- add
pattern
00001111111100000000000000000000 add
mask 00000010100000000000000000000000
add signature
6
Background (contd)
  • Masks of the ARM instruction patterns
  • Which bits to look into first?

00001110000100000000000000000000 00001111000000000
000000000000000 00001111010100000000000000000000 0
0001111010100000000000000010000 000011111111000000
00000000000000 00001110010100000000000011110000 00
001111111100000000000000010000 0000111111110000000
0000010010000 00001111111100000000000011110000 000
01111111100001111000000000000 00001110010100000000
111111110000 00001111111100000000111111110000 0000
1111111100001111111111110000 000011111111111100001
11111111111
7
Related Work
  • Sequential decoder Hadjiyiannis 99
  • List search instruction patterns
  • fool proof, straightforward
  • poor performance

8
Related Work
  • Language guided decoder generation
  • SLED in NJ machine-code toolkit Ramsey 95
  • Group fields in hierarchical tables
  • Good quality
  • Language dependent

11110000000000000000000000000000
00001111110000000000000010000011
alu mem
ldw ldh ldb stw
mode1 mode2
imm reg
9
Related Work (contd)
  • Caching decoding results in simulation Nohl 02
  • Exploit locality to avoid repeated decoding
  • Tolerant to slow decoder
  • Large cache, worst case performance

IW
PC
cache
Hit?

decoding result
decode
10
Related Work (contd)
  • Decision tree based decoding Theiling 01
  • Decoding only common significant bits
  • Relatively tall tree
  • Deadlock on certain patterns

000--- a 001--- b 01---- c 10---- d
0-1--- a -10--- b 10---- c
11
Problem Formulation
  • Definitions
  • Bit pattern p ? 0,1,?n ? cube
  • Bit string s ? 0,1n ? minterm
  • Pattern match ? minterm in the cube
  • Decoding entry
  • Triple of (pattern, label, probability).
  • Well-formed entry set
  • Entries with different labels do not overlap
  • Binary decoder
  • Mapping bit strings to matching entries
  • ----00101000-------------------- add pattern
  • 11100010100000110011000000000100 add r3, r3, 4

(----00101000--------------------, add, 0.15)
12
Problem Formulation (contd)
  • Decoding tree
  • (N?D,Edges)

f1(i)
f2(i)
f3(i)
f4(i)
Ei
Ek
Ej
General decoding tree
13
Problem Formulation (contd)
  • Decoding cost modeling
  • Execution time
  • Average decoding height
  • Memory consumption
  • Not 2n
  • Small enough to fit in a small part of the cache
  • Problem Statement
  • Input Well-formed decoding entry set and memory
    constraint
  • Output Decoding tree with minimum Havg.

?i ? probabilty of ei D ? decoding height
14
Decoder Construction
  • Decision function candidates
  • Pattern decoding ? two children
  • (iw mask)signature
  • Total number 3n-1
  • Table decoding ? 2m children
  • table(iwgtgtshift)bit_mask
  • Contiguous bits
  • Total number n(n1)/2
  • Simple, low execution time, effective

15
Decoder Construction (contd)
  • Decoding Tree Example

(000,l1,.25) (001,l2,.25) (01-,l3,.25)
(1--,l4,.25)
Havg1.5
16
Decoder Construction (contd)
  • Construction of decoding tree ? brute force
  • Problems
  • Too many function candidates
  • 3n-1 pattern function, n(n1)/2 table function
  • Prune search space
  • Too deep recursion
  • Estimate costs for subtrees

foreach decoding_function_candidate divide
entry set recursively construct trees for
subsets sum weighted costs of sub-trees and the
function itself select the function with the
least overall cost
17
Field Growing Heuristics
  • Prune function candidates
  • Field growing heuristics to prune function space

------1-------------------------
18
Tree Cost Estimation
  • Subtree cost estimation
  • Use cost of binary decoding tree as a relative
    metric
  • Tree height estimate
  • Huffman tree as a lower bound for binary tree
    height
  • Memory consumption estimation
  • Internal tree nodes
  • Decoding tables
  • Binary tree
  • E-1 nodes
  • 0 tables

19
Tree Cost Estimation (contd)
  • Total memory of using a decoding function
  • Memory efficiency ratio
  • Overall cost function of a decoding function

?i ? Probability of sub-tree i Hi ? Huffman tree
height ?? Memory penalty factor
20
Decoder Construction (contd)
  • Decoding Tree Example

(000,l1,.25) (001,l2,.25) (01-,l3,.25)
(1--,l4,.25)
Havg1.5
2 nodes 4 table entries 6 units
21
Decoder Construction (contd)
  • Decoding Tree Alternative

(000,l1,.25)
000
(001,l2,.25)
001
(000,l1,.25) (001,l2,.25) (01-,l3,.25)
(1--,l4,.25)
(010,l3,.125)
010
011
(011,l3,.125)
100
(inst7)
(100,l4,.0625)
101
110
(101,l4,.0625)
(110,l4,.0625)
111
Havg1
(111,l4,.0625)
1 node 8 table entries 9 units
22
Decoder Construction (contd)
  • Theiling tree

(000,l1,.25)
0
(000,l1,.25) (001,l2,.25)
0
(000,l1,.25) (001,l2,.25) (01-,l3,.25)
(inst1)
1
(000,l1,.25) (001,l2,.25) (01-,l3,.25)
(1--,l4,.25)
0
(001,l2,.25)
1
(01-,l3,.25)
inst2
inst4
1
(1--,l4,.25)
Havg2
23
Decoder Construction (contd)
  • Deadlock breaking

(001---,a,.1)
00
(011---,a,.1)
0
(011---,a,.1) (010---,b,.1)
(0-1---,a,.2) (-10---,b,.2) (10----,c,.6)
01
1
(010---,b,.1)
(instgtgt3)1
10
(instgtgt4)3
(10----,c,.6)
11
(110---,b,.1)
Havg1.2
24
Experimental Results
  • Two ISAs
  • ARM 137 instructions 50 unused patterns
  • PowerPC 148 instructions 130 unused patterns
  • Benchmarks
  • Training set go, li, compress, gcc, gzip
  • Running set mcf, parser, vortex, bzip2, twolf
  • Instruction Set Simulators (100X)
  • ARM 8.88MIPS
  • PowerPC 8.15MIPS

IDEF(ld1_imm_p, 0x0f500000, 0x05100000,
1.923343e-01) IDEF(mov_2, 0x0ff00010,
0x01a00000, 1.271039e-01) IDEF(branch,
0x0f000000, 0x0a000000, 7.793878e-02)
On PIII 800MHz Linux
25
Experimental Results (contd)
  • Average Decoding Height

Memory penalty factor
26
Experimental Results (contd)
  • Memory usage

27
Experimental Results (contd)
  • Comparison of decoders
  • Trained sequential
  • Dependent on benchmarks, compiler
  • For SPECFp (alvinn, art, equake), Havg29.9,
    5.27MIPS for PowerPC

28
Conclusions
  • Decision tree based binary decoder
  • Based on pattern decoding and table decoding
    primitives
  • Patterns split to reduce tree height
  • Field growing heuristics to prune search space
  • Huffman tree height as execution time estimation
  • Memory utilization ratio as memory estimation
  • Advantages
  • High quality with ensured correctness
  • Speed comparable with hand-coded decoder
  • No limitation on instruction set
  • Safe to use on ASIPs with irregular encoding
  • Simple input format
  • Can be obtained from any machine description with
    encoding information
Write a Comment
User Comments (0)
About PowerShow.com