Effective Compilation Support for Variable Instruction Set Architecture - PowerPoint PPT Presentation

About This Presentation
Title:

Effective Compilation Support for Variable Instruction Set Architecture

Description:

Title: Effective Compilation Support for Variable Instruction Set Architecture Last modified by: Fred Chow Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 25
Provided by: ncs104
Category:

less

Transcript and Presenter's Notes

Title: Effective Compilation Support for Variable Instruction Set Architecture


1
Effective Compilation Support for Variable
Instruction Set Architecture
  • Jack Liu
  • Timothy Kong
  • Fred Chow
  • Cognigine Corp.
  • www.cognigine.com

2
Outline
  1. VISC Architecture
  2. Compile-time Configurable Code Generation
  3. Managing the Dictionary
  4. Concluding Remarks

3
Configurable Computing
  • Motivation
  • Higher performance
  • processor and instruction set customized to type
    of application
  • Lower hardware cost
  • non-essential features excluded
  • Shorter time-to-market

4
Variable Instruction Set Architecture (VISC
ArchitectureTM)
  • A new approach to configurable computing
  • Fixed processor hardware
  • Many types of operations provided
  • Numerous instruction variants (CISC-style)
  • Per-program instruction set tailoring during
    compile time

5
Background of this work
  • Cognigine CGN16100 Network Processor
  • Single-chip, fully programmable network processor
  • Processing cores
  • 16 Re-configurable Communications Units (RCU)
    processor cores
  • VISC architecture
  • 4 64-bit parallel execution units
  • Multi-threaded
  • 512 KB on-chip memory (text and data)

6
VISC ArchitectureTM
Dictionary (instruction set for current program)
dictionary entry 32-bit 2 operations 64-bit 4
operations 128-bit 8 operations
256 entries
instruction
opcode opnd0 opnd1 opnd2 opnd3
opcode 8-bit
7
Motivation for VISC Architecture
  • Efficient way to encode/decode the many operation
    variants with different addressing modes
  • Not all used in each program
  • High instruction encoding density
  • Small opcode bit count
  • Operands shared among multiple operations
  • Simplified control logic for VLIW-style ILP
  • Up to 8 operations per cycle

8
Operation Specification
  • In Dictionary Entry (only specified once)
  • Operation name
  • Operation variants
  • Signed and unsigned
  • Operand and result sizes 8-bit, 16-bit, 32-bit,
    64-bit
  • Support different sizes among operand(s) or
    result
  • Vector 64v8, 64v16, 64v32, 32v8, 32v16
  • Data path to each operand/result
  • In Instruction
  • Operands encoding formats
  • Actual operands

9
RCU Architecture
  • 5 Stage Pipeline
  • 4-way multi-threaded
  • Hardware RSF synchronization
  • 128 bit reconfigurable address path
  • 256 bit reconfigurable data path

10
Roles of Compiler for VISC Architecture
  • Determine best instruction set stored in
    dictionary for best execution time performance
  • Generate optimized code sequence based on best
    instruction set
  • Cater to various hardware limitations
  • Dictionary limit
  • Data path constraints
  • Dictionary and Instruction encoding constraints

11
New Compilation Approach Configurable Code
Generation
  • Exact form of generated instructions decided in
    the last instruction scheduling phase
  • Direct result of instruction compaction based on
    what is allowed by the hardware

12
Compiler Implementation Method
  • Retarget SGI Pro64 (Open64) compiler to an
    Abstract Machine
  • Code generator operates on an Abstract Operation
    Representation
  • Code generation optimizations left intact
  • Add new Instruction and Dictionary Finalization
    (IDF) phase as post-pass
  • IDF Phase 1
  • Instruction scheduling and folding
  • Abstract operations converted to target code
    sequence
  • IDF Phase 2
  • Output VISC instructions and dictionary entries

13
Compiler Phase Structure
C
GNU / Pro64TM Front-end
WHIRL Optimizer
Pro64TM Back-end
Code Generator
IDF
Assembly Program Instructions Dictionary
14
Abstract Operation Representation (AOR)
  • Each operation corresponds to a micro-operation
    in the core execution units
  • RISC-like formats
  • r1 op r2, r3
  • r2 load ltoffsetgt(ltbasegt)
  • store r2 ltoffsetgt(ltbasegt)
  • r1 loadimm ltimmgt
  • Optimizations in AOR reflected in final code
  • No pre-disposition of compiler to any specific
    instruction format

15
Multiple AOR ops can be combined to single target
operation
  • Operations taking immediate operand
  • r2 move ltimmgt gt r3 addi r1 ltimmgt
  • r3 add r1, r2
  • Operations supporting memory operands
  • r2 load 4(sp) gt r3 add r1 4(sp)
  • r3 add r1, r2
  • Post incre/decre memory operations
  • r2 load 0(r1) gt r2 load 0(r1)
  • r1 addi r1, 4
  • Branches on condition codes
  • r1 add r2, r3
  • . . . r1 add r2, r3
  • compare (r1 ! 0) gt br.z label (only if
    immediately after)
  • br.z label
  • Others

16
IDF Approach
  • Instruction scheduling following tasks
  • Instruction folding
  • Opcode selection
  • Modelling of irregular hardware constraints
  • Modelling of encoding constraints
  • Monitoring of states of condition codes and
    transient registers
  • Keeping track of dictionary contents
  • Use enumeration (branch and bound) approach

17
Example of IDF Processing
Dictionary
Input
add xor sub nop
w80 move 0x55 w91 move 0xf8 w70 add
w70, w80 w71 xor w92, w80 w90 sub w92,
w91 store 8(p1) w90
add xor sub nop
3
instruction
op3 8(p1) w70 0x55 0xf8
  • move and store instructions subsumed
  • w71, w92 mapped to transient registers

18
IDF Scheduling Algorithm
Input Sequence of operations in BB
Estimate initial boundsch
  • To speed up the search
  • Shrink solution space by
  • Coming up with high initial boundsch
  • Prune useless search paths continuously
  • Tight hardware constraints help

Search for schedule with length lt boundsch
boundsch boundsch1
no
yes
19
Managing the Dictionary
  • Dictionary usage increases due to
  • Program size more variety of operations
  • High ILP more combination of operations
  • Library code linked in
  • Currently, dictionary contents fixed for each
    executable
  • Role of linker
  • Merge dictionary entries with identical contents
    across files/libraries
  • Error message on dictionary overflow
  • Role of compiler
  • Maximize dictionary entry re-use

20
Dictionary Compilation
  • Strategy
  • Keep track of existing dictionary entries during
    compilation
  • Extract dictionary entries from
  • Libraries and .s files being linked
  • .o files compiled before current file
  • Example cc a.c b.o c.s
  • Maintain table of existing dictionary entries
  • Add to table as new entries are generated
  • Re-use existing dictionary entries
  • Bias scheduling towards dictionary conservation
    as dictionary fills up

21
User Control of Dictionary Compilation
  • Best program performance demands near-full
    dictionary.
  • When dictionary overflow, needs to re-compile.
  • Provide user control mechanisms
  • Trade-off between dictionary consumption and
    program performance
  • Command line option -CGdict_usagen n
    010
  • Embedded in code pragma dict_usage n
  • dict_usage is dictionary budget guideline for IDF
  • Low dict_usage
  • Less new dictionary entries created
  • Low ILP
  • High dict_usage
  • Tighter instruction schedule
  • More dictionary entries created

22
IDF Support of dict_usage
  • Additional search goal bounddict
  • Number of new dictionary entries allowed for
    current BB
  • Automatically adjust lower with more pre-existing
    entries
  • When bounddict reached during enumeration,
    disallow creating new dictionary entry (unless
    single operation)

23
Experimental Results
  • Summary (with dict_usage10)
  • ILP from IDF scheduling 1.38 ops per instruction
  • ILP from relaxed scheduling 1.51 ops per
    instruction
  • 23 of all subsumable operations subsumed
  • Each dictionary entry referred to by 2.63
    instructions (statically)
  • Scheduling via enumeration 100 times slower than
    one-pass schedulers
  • Compilation time 1 to 2 minutes per program

24
Concluding Remarks
  • VISC approach most suitable as embedded
    processors
  • Limited program size
  • Dictionary space less of an issue
  • Slow compilation tolerable
  • CISC-style instructions enable small code size
  • Compilation support key to deploying applications
    on VISC
  • Very hard to write in assembly language
  • Advanced optimizations performed by compiler
  • Dictionary managed by compiler with user hints
  • Compile-time configurable code generation enables
    RISC compilation techniques to generate CISC
    output
Write a Comment
User Comments (0)
About PowerShow.com