The Microarchitecture of FPGABased Soft Processors - PowerPoint PPT Presentation

About This Presentation
Title:

The Microarchitecture of FPGABased Soft Processors

Description:

Our goal is to study the architecture of soft processors. FPGA ... Can be tuned by designers. 4. Don't we already understand processor architecture? ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 31
Provided by: looie6
Category:

less

Transcript and Presenter's Notes

Title: The Microarchitecture of FPGABased Soft Processors


1
The Microarchitectureof FPGA-Based Soft
Processors
  • Peter Yiannacouras
  • Jonathan Rose
  • Greg Steffan
  • University of Toronto
  • Electrical and Computer Engineering

2
Processors and FPGAs
  • Processors present in many digital systems

Processor
Custom Logic
  • Soft processors - implemented in FPGA fabric

3
Motivation for understanding soft processor
architecture
  • Soft processors are popular
  • 16 of FPGA designs use a soft processor
  • FPGA Journal, November 2003
  • This number has and will continue to increase
  • Soft processors are end-user customizable
  • Application-specific architectural tradeoffs
  • Can be tuned by designers

4
Dont we already understand processor
architecture?
  • Not accurately/completely
  • Accurate cycle-to-cycle behaviour
  • Estimated area/power
  • No clock frequency impact
  • Not in FPGA domain
  • Lookup tables vs transistors
  • Dedicated RAMs and Multipliers fast

5
Research Goals
  • Generate soft processor implementations
  • System for generating RTL
  • Develop measurement methodology
  • Metrics for comparing soft processors
  • Develop understanding of architectural tradeoffs
  • Analyze area/performance/power space

6
Soft Processor Rapid Exploration Environment
(SPREE)
7
Input Instruction Set Architecture (ISA)
Description
  • Graph of Generic Operations (GENOPs)
  • Edges indicate flow of data
  • ISA
  • Datapath

MIPS ADD add rd, rs, rt
FETCH
SPREE
RFREAD
RFREAD
ADD
RFWRITE
8
Input Datapath Description
  • Interconnection of hand-coded components
  • Allows efficient synthesis
  • Described using C
  • ISA
  • Datapath

Ifetch
Reg File
Ifetch
Reg File
SPREE
Mul
Data Mem
Mul

ALU
Shifter
Write Back
ALU
SPREE Component Library
9
Step 1.ISA vs Datapath Verification
  • ISA
  • Datapath
  • Components described using GENOPs

Verify
FETCH
SPREE
RFREAD
RFREAD
ADD
RFWRITE
10
Step 2.Datapath Instantiation
  • ISA
  • Datapath
  • Multiplexer insertion
  • Unused connection/component removal

SPREE
11
Step 3.Control Generation
  • ISA
  • Datapath

Control
Control
Control
Control
Mul
Reg File
Ifetch

Write Back
SPREE
ALU
Data Mem
12
Output Verilog RTL Description
  • ISA
  • Datapath

Verilog RTL
Control
Control
Control
Control
Mul
Reg File
SPREE
Ifetch

Write Back
ALU
RTL
Data Mem
13
Back-end Infrastructure
Benchmarks (MiBench, Dhrystone 2.1, RATES, XiRisc)
Quartus II 4.2 CAD Software
Modelsim RTL Simulator
Stratix 1S40
2. Resource Usage 3. Clock Frequency 4. Power
  • Cycle Count

14
Metrics for Measurement
  • Area Equivalent Stratix Logic Elements (LEs)
  • Relative silicon areas used for RAMs/Multipliers
  • Performance Wall clock time
  • Cycle count clock frequency
  • Arithmetic mean across benchmark set
  • Energy Dynamic Energy (eg. nJ/instr)
  • Excluding I/O

15
Trace-Based Verification
  • Ensure SPREE generates functional processors

Trace
RTL
110100 101011 111101
Modelsim (RTL Simulator)
?
Compare
Benchmark Applications
Trace
?
MINT (Instruction-set Simulator)
110100 101011 111101
16
Architectural Exploration Results
17
Architectural Features Explored
  • Hardware vs software multiplication
  • Shifter implementation
  • Pipelining
  • Depth
  • Organization
  • Forwarding

18
Validation of SPREE Through Comparison to
Alteras Nios II
  • Has three variations
  • Nios II/e unpipelined, no HW multiplier
  • Nios II/s 5-stage, with HW multiplier
  • Nios II/f 6-stage, dynamic branch prediction
  • Caveats not completely fair comparison
  • Very similar but tweaked ISA
  • Nios II Supports exceptions, OS, and caches
  • We do not and save on the hardware costs

19
SPREE vs Nios II
faster
  • 3-stage pipe
  • HW multiply
  • Multiply-based
  • shifter

smaller
20
Architectural Features Explored
  • Hardware vs software multiplication
  • Shifter implementation
  • Pipelining
  • Depth
  • Organization
  • Forwarding

21
Hardware vs Software Multiplication
  • Hardware multiply is fast but not always needed
  • Wastes area (220 LEs) and can waste energy

22
Shifter Implementation
  • Shifters are expensive in FPGAs
  • We explore three implementations
  • Serial shifter (shift register)
  • Multiplier-based barrel shifter (hard multiplier)
  • LUT-based barrel shifter (multiplexer tree)

23
Performance-Area of Different Shifter
Implementations
faster
smaller
24
Pipeline Depth
  • Explored between 2 and 7 stages
  • 1-stage and 6-stage pipeline not interesting

F/D/R/EX/M
WB
2-stage
F/D
R/EX/M
WB
3-stage
F
D
R/EX/M
WB
4-stage
F
D
R/EX
EX/M
WB
5-stage
F
D
EX
EX/M
WB
R
EX
(new) 7-stage
25
Pipeline Depth and Performance
26
Pipeline Organization Tradeoff
4-stage (A)
F
D
R/EX/M
WB
4-stage (B)
F/D
R/EX
EX/M
WB
27
Pipeline Forwarding
F
D/R
EX
M
WB
  • Prevent stalls when data hazards occur
  • MIPS has two source operands (rs rt)
  • Four forwarding configuration are possible
  • No forwarding
  • Forward rs
  • Forward rt
  • Forward both rs and rt

28
Pipeline Forwarding
29
Summary of Presented Architectural Conclusions
  • Hardware multiplication can be wasteful
  • Multiplier-based shifter is a sweet spot
  • 3-stage pipelines are attractive
  • Tradeoffs exist within pipeline organization
  • Forwarding
  • Improves performance by 20
  • Favours the rs operand

30
Future Work
  • Explore other exciting architectural axes
  • Branch prediction, aggressive forwarding
  • ISA changes
  • VLIW datapaths
  • Caches and memory hierarchy
  • Compiler optimizations
  • Port to other devices
  • Explore aggressive customization
  • Add exceptions and OS support
Write a Comment
User Comments (0)
About PowerShow.com