Title: EVE: A CAD Tool Providing Placement and Pipelining Assistance for HighSpeed FPGA Circuit Designs
1EVE A CAD Tool ProvidingPlacement and
Pipelining Assistance for High-Speed FPGA Circuit
Designs
- William Chow
- Supervisor Prof. Jonathan Rose
- M.A.Sc. Thesis
- Edward S. Rogers Sr. Department of
- Electrical and Computer Engineering,
- University of Toronto
- September 28, 2001
2Motivation
- Context High-speed circuit designs, how?
- Push-button design flow
- Automatic design -gt circuit
- 0.18 ?m, struggling to achieve 150MHz
- Von Herzens paper VonH97
- 250MHz FPGA, 0.6?m in 1997!
- Useful Event Horizon concept (later)
- EVE EVent horizon Editor
3Xilinx Virtex-E CLB architecture
4Event Horizon
src CLB
250MHz ? budget 4ns
Max clock skew 0.1ns, clock-to-output delay
1.3ns, LUT delayFF setup time 1.5ns
Max routing delay 4.0-0.1-1.3-1.5 1.1ns
5Context
- Von Herzens approach
- Set speed goal
- Build by construction using Event Horizon concept
- EVE
- Start with placed and routed design
- Increase speed by manual editing small designs
6Push-button vs Event Horizon Methodology
7Goals
- Construct a manual editor focussing on
packing/placement/pipelining level of the Event
Horizon design methodology to allow a designer to
increase speed easier - Gain insights to better placement and routing
techniques through extensive manual circuit
editing experience
8Design Objectives of EVE
- Target real FPGA architecture Xilinx Virtex-E
- Give full low-level control
- Give instant performance feedback
- Assist pipelining
- (34) not supported by Xilinx Tools
9EVE two operating modes
- Timing Exact Microscopic Placement (TEMP) Mode
- Change placement and packing of circuit
components - Instant timing feedback
- Invoke horizon suggest good placement positions
- Pipelining Mode
- Maintain correct functionality during flip-flop
insertion and flip-flop motion - Instant feedback of new circuit speed estimation
- Flip-flop placement optimizations
10Horizon
(mode 1)
Definition Display effect of critical path delay
should a circuit element moved to indicated
positions
- From Event Horizon
- Gradient of colours
- Horizon Radius
- Where to evaluate
- Limit computation
- Display timing
- -ve speed improves
- ve speed degrades
Radius 1
11Timing Exact Microscopic Placement(TEMP) Mode
- Placement
- Packing
- Timing Feedback
- Horizon
- More info
- Better answer
Radius 3
12Implementation of TEMP mode
- Instant feedback
- Internal Timing Analysis
- Accurate timing
- Database of real delays
- Compression by 100x (100MB-gt1MB)
- High Interactivity
- Integrate tightly with Xilinx backend (FPGA
Editor) for quick incremental PR,timing
13Partial Incremental Timing Analysis
- Full Timing Analysis (TA)
- O(n) Forward Backward Sweep as in HSC83
- Faster Only rebuild modified portion of circuit
14Delay Database
- Delay Extraction
- RC Models Elmore, Penfield Rubinstein
- Not possible in EVE
- Extracting Logic Delays
- Extracting Routing Delays
- Delay Database Compression
15Routing Delay Compression
Symmetric!
Pin-to-pin delay (ns)
Dc(c)
BRAMs
Intersect
Dr(r)
Row of source pin
Column of source pin
16Backend Integration
- Existing tools are insufficient
- Lack ease for incremental flow
- Full CAD flow is slow
- Solution Interface with Xilinx manual editor -
FPGA Editor - Full set of commands for circuit editing
- Use named pipes on WIN NT platform
17Event Horizon Pipelining
(Mode 2)
Original Event Horizon
dst CLB
src CLB
- Pipeline to extend Event Horizon
18Features of Pipelining Mode
- Represent circuit for easy pipelining
- Maintain correct functionality during flip-flop
insertion and flip-flop motion - Instant feedback of new circuit speed estimation
- Flip-flop placement optimizations
19Pipelining Mode
(Leave for demo)
20Baseline Circuits Generation
- (Push-button flow baseline)
- Input is VHDL or Verilog
- Synthesize using Synplify Pro 6.2, freq s
- Place and route using Xilinx backend tools
- Obtain frequency from reports
- repeat step (2) to (4), increasing s 10 until
done - Using frequency in (5), do Multi-Pass PlaceRoute
(MPPR) for 10 runs, pick the best design 10
(skip!)
21Results Using TEMP mode only
(Note Area is unchanged!)
12.7!
22Example Vision
23Vision Before
203.3MHz
24Vision After
224.8MHz
25Results Using both TEMP and pipelining modes
(Note FF inserted once only)
26Observations (1)
- Pack and unpack slices during placement and
routing is good
Slice
Slice
27Observations (2)
Focusing on improving k-most critical path is
effective
28Observations (3)
Partial re-routing of timing-critical regions is
effective
Reroute!
29Observations (4)
CAD Tool should show high speed routing
resources on the chip, help user make better
decisions
Fast Routing!
30Live Demo
31Conclusion
- Proposed a high-speed manual circuit design
methodology - Created a manual editor
- Targets real designs Xilinx Virtex-E
- Focus on pipelining, placement, packing
- Full low-level control
- Instant exact timing feedback
- Results speed increased up to 19, avg 12.7
for 8 ccts
32Future Work
- Synthesis in Event Horizon framework
- Extend EVE to support Virtex-II, etc.
- Automate manual optimizations in EVE
- Make pipelining mode more useful
33Xilinx Virtex-E Routing Architecture
34Xilinx HDL based hw design flow
35Flip-Flop Insertability
- Non flip-flop insertable edges
- Routing edges COUT-gtCIN
- Routing edges F5-gt F5IN
- Non-Routing edges
- Edges in transitive fanin of async reset pins
36Loop Elimination
37Flip-flop Insertion
38Flip-Flop Motion
39Flip-Flop Tracing
40Flip-Flop Synthesis and Placement
- Determine placement of flip-flop on most critical
edge first - An area of valid locations are explored, and the
best location is picked for each FF - Real routing delay values (previously stored in
delay database) are used to evaluate positions
41Limitations
- Only synchronous, single clock.
- No tri-state buffers
- No Block RAM and LUT-RAM
- Synthesized without I/O pads
- XCV100E or below.
42Software Architecture
43Extracting Logic Delays
Logic Slice
- Enumerate all delay paths in a slice, for each
path - Construct a circuit with the path using XDL
- Query Xilinx Timing Analyzer (TRACE)
- Record delay in delay-matching table
44Extracting Routing Delays (1)
45Extracting Routing Delays (2)
- Query each pin-to-pin routing delay from delay
reporter - Delay search space too big, can take 1 month to
produce database with Manhattan distance lt5,
takes up 100MB of space - Solution A Routing Delay Database
Compression Scheme
46Routing Delay Compression (2)
- Delay values are grouped into delay groups, each
with a notation G(S1,P1,S2,P2,X,Y) - S1,P1 source slice pin
- S2,P2 target slice pin
- X,Y relative location of target pin to source
pin - An intersect point is identified for capturing
column and row vectors of delay values to
describe delays in the whole group
47Routing Delay Compression (3)
- Convert delays into integers by a scaling factor
of 0.02ns - Normalize delays using delay at the intersect pt
- 2 1-D vector of delay values are collected at the
intersect - Zeroes in the delay vectors are eliminated
- Duplicates in the delay vectors are eliminated
- Make use of symmetry of pins P1X P1 Y, P1
XQ P1 YQ - Record data points explicitly when the scheme
fails - Compression Ratio achieved 100x
48Pipelining Rule 1
- Forward Retiming
- Backward Retiming
49Pipelining Rule 2
- Flip-Flop with CE
- Flip-Flop with SR
50Loop Detection
51Flip-Flop Insertion
- Based on a continuous forward and backward
sweeping algorithm
FFs at 4-gt6 4-gt7 5-gt7 8-gt9
52Flip-Flop Motion
- Make use of transitive fanin fanout calculations
FFs at 4-gt6 7-gt9 8-gt9
53Flip-Flop Placement
- Edges sorted in increasing order on edge slack
- For each edge, E
- CLB locations in the neighbourhood of the two end
points of E are explored inside out until N CLBs
from the center (Delay database stores delays
with Manhattan distance lt N) - If not found, continue to explore for distance gt
N - If still not found, report resource error
54Period Estimation Case 1
55Period Estimation Case 2
56Horizon Calculation
- Evaluate multiple placement alternatives
- Goverened by Horizon Radius
- First check if move is valid
- Build temporary circuit
- Full timing analysis to obtain cct speed
- Display the horizon
- Speed 2s to display horizon of radius 3 on 1GHz
Pentium III
57Backend Integration Summary