Introduction to the CRISP Software Platform - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Introduction to the CRISP Software Platform

Description:

Introduction to the CRISP Software Platform. Tom Vander Aa ... Various bug fixes. New. Elcor Stages. Optimizations. Disabled. Control CPR ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 31
Provided by: FranciscoB150
Category:

less

Transcript and Presenter's Notes

Title: Introduction to the CRISP Software Platform


1
Introduction to the CRISP Software Platform
  • Tom Vander Aa
  • http//www.esat.kuleuven.ac.be/tvandera/crisp/
  • ESAT, K.U.Leuven
  • SMART, DESICS, IMEC
  • Feb, 2005

2
Overview
  • Architecture Overview
  • Tool Flow
  • Trimaran vs. Impact
  • Crisp vs. Trimaran
  • Dresc vs. Crisp
  • Infrastructure
  • GUI
  • Glue-scripts
  • Impact Front End
  • Stages
  • Additions of CRISP
  • Data Path Clustering
  • Elcor Back End
  • Stages
  • Loop Buffer Insertion
  • Software Pipelining
  • Simulator
  • Power Estimation
  • Report Generation
  • Conclusion
  • Possible reuse

3
Architecture Overview(data memory hierarchy is
not shown)
4
Architecture Key Concepts
  • Clustered VLIW
  • Reuse VLIW Compiler
  • High Performance
  • RISC-Type operations
  • High ILP
  • Low Power
  • Data Path Clustering
  • Multiple Register Files
  • Instruction Path Clustering
  • Multiple Loop Buffers
  • Scalable Fetch Path
  • Reconfigurable clusters are only used if needed
  • Clock Gating

5
Architecture Software View
  • PXML
  • Processor XML
  • Extensible processor modeling language
  • Made for design space exploration
  • Extended with
  • Wires, Muxes, Power Info, Area Info, Benchmark
    Info

Demo
6
Tool flow based on XML and Trimaran
C code
Trimaran MDES
Enhanced Trimaran Compiler
XSLT converter
Delay calculation
Asm Code (XML)
XML processor model
Annotated XML processor model
Area calculation
Trimaran Simulator
Power calculation
Trace
XSLT Based Report Generation
Total power calculation
Total power results
Report (HTML)
7
trimaran_workspace/
benchmarks/ (link to impact benchmarks)
params/
machines/
projects/
mediabench/
test/
myown/

std/

elcor/
vliw_c64.xml

impact/

simu/
vliw_c64.myown.g721_decode.h.1/
impact_intermediate/
html/
simu_intermediate/
elcor_intermediate/
power_intermediate/
8
Extra Components in Trimaran
  • Compared to IMPACTDRESC
  • GUI
  • Editor for parameters
  • Visualization tools based on xvg
  • trimaran_workspace
  • Params
  • Projects
  • Machines
  • Elcor Backend
  • Modulo scheduling
  • Rotating registers
  • Stand Alone, C
  • Simulator
  • Compiled Simulator

Demo
trimaran_workspace/ projects/ vliw_c67.lb.mpeg2dec
.h.2/
9
Extra Components in CRISP
  • Machine Description
  • XML, High Level
  • Framework to add custom instructions
  • Reporting Tools
  • XSLT Based (XML to HTML)
  • Portable
  • Power Estimation
  • Machine file annotated
  • Thanks to XML/XSLT
  • Trace File Based
  • Stand Alone, C
  • Reuse of existing models
  • Wattch, Cacti, HotLeakage, IMEC SDRAM,
  • Tons of benchmarks
  • MediaBench, Spec, IMEC, TI,

10
Stages Main Purpose
  • Impact Front End
  • Inlining
  • Loop Unrolling
  • Hyperblock Formation
  • Lower Level Optimizations
  • Memory Alias Determination
  • Elcor Back End
  • Loop Selection for Loop Buffer
  • Including Code Generation
  • Software Pipelining
  • Modulo Scheduling
  • Rotating Register Allocation
  • Scalar Register Allocation and Scheduling
  • Report Generation

11
Stages Impact Details
  • EDG Front End
  • Commercial, but we have the source
  • HtoL
  • From AST Level to Register Transfer Level
    (virtual registers)
  • Lopti classical compiler opt.
  • Dead Code Elim
  • Common Subexpr.
  • Code Hoisting
  • LHyper/LSuperscalar
  • Hyperblock Creation
  • Optional (prog1.O_tr)
  • LHpl_pd
  • Machine Dep. Backend
  • HPLabs PlayDoh

12
Impact Front End Params (I)
  • Parameter Tuning
  • crisp/code/tools/trimaran/impact/parms/
  • trimaran-workspace/params/ltnamegt/impact/
  • Inlining
  • Tune to your own needs
  • For Example

(Pinline impact_baseline_parms Max. stack
frame size max_sf_size_limit 1000000
Upper bound on code expansion. The number of
operations is the basis for code size.
max_expansion_ratio 3 end)
13
Impact Front End Params (II)
  • Lcode Optimizations

(Lopti impact_baseline_parms opti_level
4 end)
  • Loop Unrolling
  • Disable
  • Do manually
  • Future with pragma

(Lsuperscalar impact_baseline_parms Disable
loop unrolling do_loop_unroll no end)
14
Impact Front End Memory Aliasing (I)
  • Problem
  • Can a loads/store pair be reordered?
  • Parallelism in software pipelined loops
  • Example
  • Can the load w be done before the store y of
    the previous iteration?
  • Solution
  • Compiler has some intelligence
  • Different base address
  • Same base, different offset
  • See l_indep_mem.c
  • Restrict pointer
  • Programmer says w andy do not alias

int restrict y int restrict w for (i10
i1ltN1-N2 i1) (y) 0.0 for (i20
i2ltN2 i2) y (w) (x)
15
Impact Front End Memory Aliasing (II)
  • Before
  • After

16
CRISP Additions to Impact
New
  • EDG Front End
  • pragma support
  • Restrict pointer support
  • Custom Instructions Support
  • Code layout
  • Loop based code layout for the loop buffer
  • XML-based framework to add custom instructions
  • Trimedia SIMD instructions implemented
  • Done by Pieter
  • Ask Praveen
  • Various bug fixes

17
Elcor Stages
  • Optimizations
  • Disabled
  • Control CPR
  • ??
  • Find SWP Loops, Find Loop Bufferable Loops
  • Selection only
  • Liveness and Data Flow Analysis
  • Predication!
  • Modulo Scheduler
  • Incl. Rotation Register Allocation
  • Pre-pass Scalar Scheduling
  • List Based Scheduler
  • Register Allocation
  • Incl. Spill Code Insertion
  • Post-pass Scalar Scheduling
  • Add Loop Buffer Code

Elcor/src/Main.cpp/common_process_function()
18
Loop Buffer Usage (I)
  • machine.xml file
  • Define the loop buffer as a regular cache
  • Elcor selection of loops to be loop-buffered
  • Two phases
  • Phase One
  • before scheduling
  • pre-selection of loops
  • based on ops (static)
  • based on weight, invocations vs. iterations
  • adds pre-/post-loop block, but no lbon operation
  • Phase Two
  • After Scheduling
  • post-selection of loops
  • based on schedule length vs. loop buffer size
  • adds lbon (called intrinsic) operation in
    pre-/post-loop block
  • In Control/el_loop.cpp

19
Loop Buffer Usage (II)
  • Simulator
  • Code layout based on Loop Buffer
  • Logging
  • Turn dbg_status to at least 1
  • Check compilation.log
  • Also in html Report

0x0000
LB CODE
0x8000
LB CODE
0x10000
20
Software Pipelining (Elcor)
  • Summary
  • No SWP No IPC
  • Criteria
  • See elcor/src/Control/el_loop.cpp
  • Counted Loop (i.e. no data dependent while loops)
  • find_induction_variable_info
  • One Back Edge
  • One Exit Edge
  • No Function Calls
  • No Internal Control Flow
  • Use hyperblocks w. Predication
  • Logging
  • Turn dbg_status to at least 1
  • Check compilation.log
  • Also in html Report

Elcor/src/MSched/
21
Data Path Clustering
  • Only for SWP Loops
  • Before SWP
  • Find No. of Clusters to use
  • Minimize II, with minimal resources
  • More clusters interesting if RecII gt ResII
  • Cluster Assignment
  • Assign operations to clusters
  • Based on cost function
  • Load balancing between clusters
  • Minimize communication
  • Modulo Scheduling
  • Per Cluster
  • Cluster Communication
  • Dedicated copy functional unit
  • Connected to every register file

Elcor/src/MSched/
22
Mismatch on Parallelism on Typical Applications
  • Non-loop code
  • Low Parallelism
  • Random Branches
  • 95 code size
  • Loop code
  • High Parallelism
  • Predictable Branches
  • 5 code size

Loop
Non-loop
23
Architecture for Non-Loop Code
  • Non-loop code
  • Low Parallelism
  • Random Branches
  • 95 code size

24
Architecture for Non-Loop Code
4 to 5 FUs should be enough for every-one
25
Architecture for Loop Code
  • Loop code
  • High Parallelism
  • Predictable Branches
  • 5 code size

Loop Buffer
Loop Buffer
Loop Buffer
FU
FU
FU
FU
FU
FU
FU
FU
FU
FU
RF
Register File
Register File
Main Cluster
Reconfigurable Cluster
Reconfigurable Cluster
26
Rotating Register Allocation
  • Concepts
  • Rotating register R6 is R7 after a special branch
    (BRF_B_B_F)
  • Modulo the number of registers
  • Virtual registers are REMAPped to rotating
    registers
  • Using static single assignment
  • If assigned to every iteration (predication!)
  • Benefit
  • Registers of different iterations are different?
    Software Pipelining
  • Rotating Register Allocator
  • Does not insert spill code
  • Compiler will go in in infinite loop, if not
    enough rotating registers!

27
Main Additions to Elcor
New
  • Support for Clustered Register Files
  • Only in modulo scheduled code
  • Using copy slots (bad idea)
  • Algorithm for UPC (Gonzalez)
  • Scalable Fetch Path
  • Enable more or less clusters depending on
    available ILP
  • Support for software controlled loop buffers
  • Analysis
  • Special Instructions
  • XML Output
  • Enables HTML Report generation
  • Source code references for debugging

28
Caveats in Elcor
  • HPL/PD Architecture Features
  • Read the HPL-PD Manual
  • Instruction Set
  • Branch and load/store are decoupled
  • Branch Compare Prepare to branch Branch
  • Load Add Load
  • Speculative Instructions
  • Special hardware support to catch exceptions
  • Hardware
  • Needs Rotating Registers
  • Needs Special Register Files
  • (Artifact that can be removed)
  • Not always clean code
  • Written by HP summer interns

29
Simulator
  • Features
  • Compiled Simulator
  • Cycle True
  • Cache Simulation
  • Multilevel
  • Additions in CRISP
  • Machine.xml input
  • Cache Simulation
  • Trace File
  • XML Output
  • Debugging Support!

Asm Code (XML)
XML processor model
Code Generator
Operation Emulation Library
C-source
Standard C Library
Host C Compiler
Machine State Library
Executable
Execute
Trace
Report
30
Power Estimation Tools
New
  • Features
  • C based
  • plus XSLT
  • Plugins for
  • Wattch
  • Cacti
  • Custom SDRAMS
  • XML Philosophy
  • User writes a simple machine description
  • XSLT Trafos
  • Complex Machine Description comes out
  • Converted to HTML

PXML
Add Explicit Interconnect
Wattch
Cacti
Add Per Component Power Info
PXML
HotLeakage
Siemens
Add per Benchmark Energy
PXML
Simulator
Trace
Activation Count
PXML
HTML Report
Broken
XSLT Based Report Generation
Processor Heat Map (SVG)
31
Report Generation
New
C code
XSLT Based Report Generation
Enhanced Trimaran Compiler
Per file/function statistics (HTML)
Per basic block schedule (HTML)
Asm Code (XML)
XML processor model
Trimaran Simulator
Processor Heat Map (SVG)
Broken
Per component powerenergy consumption (HTML)
Trace
Total power calculation
Total power results
Demo
32
Conclusion
  • Potential Reuse of
  • Algorithms/Ideas
  • List Scheduler Register Allocator
  • Modulo Scheduler for the Clustered VLIW
  • Cycle True Simulator
  • Standalone modules
  • Directory structure scripts ???
  • GUI ??
  • Report Generation ???
  • Power Estimator ?
  • Models, Machine Description
  • Website Components
  • Doxygen, Bugzilla

First port done in 2 hours
Demo
Write a Comment
User Comments (0)
About PowerShow.com