Title: Physical Design for Reconfigurable Computing Systems using Firm Templates
1Physical Design for Reconfigurable Computing
Systems using Firm Templates
K. Bazargan R. Kastner M. Sarrafzadeh
- Department of Electrical
- Computer Engineering
- Northwestern University
2Outline
Outline
- FPGA What and why?
- What is Reconfigurable Computing System (RCS)?
- Application example
- RCS System components
- Online placement problem definition and our
approach - Offline placement and scheduling
- Flexible modules and firm templates
- Conclusion and future work
3Outline
Outline
- FPGA What and why?
- What is Reconfigurable Computing System (RCS)?
- Application example
- RCS System components
- Online placement problem definition and our
approach - Offline placement and scheduling
- Flexible modules and firm templates
- Conclusion and future work
?
?
4The Architecture of a Reconfigurable System
Data Memory
Data
Data
CPU
Control
Data
RFUOPs
Instruction Memory (Program)
5Execution of a Sample Program
6Outline
Outline
- FPGA What and why?
- What is Reconfigurable Computing System (RCS)?
- Application example
- RCS System components
- Online placement problem definition and our
approach - Offline placement and scheduling
- Flexible modules and firm templates
- Conclusion and future work
?
7Application Example Image Restoration
The value of the center pixel in the next
iteration xk1 ?y xk - ?
(dxk)
y the pixel value from the original degraded
image xk the pixel value from the previous
iteration dxk denotes the weighted sum
r1 ? (eight neighbor pixels) r0 center pixel
r1
r1
r1
r1
r1
r0
r1
r1
r1
8Image Restoration (cont.)
- Incentive
- Processing of large images using FPGAs with
limited resources - Strategy
- Segmentation of the image intosmaller sized
images suitablefor the FPGA - Segments of size m x nare surrounded by an
overlap of o.
9Image Restoration Data Flow Strategy
- Data flow strategy
- Pixels of individual segments are restored in
parallel by hardware. - Restored segments are written back after the
overlap is discarded
MEMORY
RFU
10Image Restoration Example
Degraded Image
Restored Image
11Outline
Outline
- FPGA What and why?
- What is Reconfigurable Computing System (RCS)?
- Application example
- RCS System components
- Online placement problem definition and our
approach - Offline placement and scheduling
- Flexible modules and firm templates
- Conclusion and future work
?
12System Components
CPU
Data
Data Memory
Data
Data
13Outline
Outline
- FPGA What and why?
- What is Reconfigurable Computing System (RCS)?
- Application example
- RCS System components
- Online placement problem definition and our
approach - Offline placement and scheduling
- Flexible modules and firm templates
- Conclusion and future work
?
14Online Placement Problem Definition
- Input
- RFU dimensions (W, H)
- List of RFUOP events (w, h, arrival, departure)
15Online Placement
Current Placement
?
- When a new RFUOP arrives,
- Is there enough room?
- If yes, which location is best?
- Previous work
- Bin-packing heuristics (1-D) - O(n2)
- First Fit, Best Fit, Shelf, Look ahead,
- Chazelle83 The Bottom-Left heuristic. O(n2)
- Healy-Creavin97 O(n2 lg n)
16Our Online Placement
- Our approach
- Divide the empty space into explicit empty
rectangles - When a new RFUOP arrives
- Is there enough room? (any ER large
enough?) - If yes, which location is best? (which ER is
best?)
?
- Packing rule
- Best Fit, Bottom Left, First Fit
?
17Heuristics for Choosing an Empty Rectangle
18Our Online Placement
- Our approach
- Divide the empty space into explicit empty
rectangles - When a new RFUOP arrives
- Is there enough room? (any ER large enough?)
- If yes, which location is best? (which ER is
best?)
?
- Managing the empty space
- Keep empty rectangles explicitly, use range
tree to store/access empty rects. - Efficient use of RFU real estate
- KAMER Keep all O(n2) maximal empty rectangles
?
19Keeping All Empty Rectangles
20Our Online Placement
- Our approach
- Divide the empty space into explicit empty
rectangles - When a new RFUOP arrives
- Is there enough room? (any ER large enough?)
- If yes, which location is best? (which ER is
best?) - Managing the empty space
- Keep empty rectangles explicitly, use range
tree to store/access empty rects. - Efficient use of RFU real estate
- KAMER Keep all O(n2) maximal empty rectangles
- Fast but sub-optimal
- Keep only O(n) empty rectangles
- Shorter Seg. (SSEG), Square Empty Rects. (SQR),
...
?
?
21Keeping O(n) Empty Rectangles - SSEG
22Heuristics for Choosing a Segment
A
S1
C
A
C
B
B
S2
D
D
?
?
SSEG (Shorter Seg)
BER (Balanced Empty Rects)
LSQR (Larger Rect Square)
Chooses the shorter of the two segments.
Chooses the segment which creates less area
difference.
Chooses the segment which creates the larger
rectangle closer to square.
Area(B) - Area(A) gt Area(D) - Area(C)
S1 lt S2
AspectRatio(B) gt AspectRatio(D)
A
C
S1
A
C
B
B
S2
D
D
?
?
?
?
LER (Large Empty Rects)
SQR (Square Rects)
LSEG (Longer Seg)
Chooses the segment which creates empty
rectangles closer to squares.
Chooses the longer of the two segments.
Chooses the segment which creates the larger
empty rectangle.
MaxAR(A),AR(B) lt MaxAR(C),AR(D) AR
AspectRatio
S1 lt S2
Area(B) gt Area(D)
23How Good is a Placement?
- Acceptance rate
- percentage of modules accepted (placed)
- Volume penalty
- Area ? complexity
- Time-span in the system ? loop iterations
- Penalty of rejecting a module penalty
volume area time - Input data
- Randomly generated dimensions
- Randomly generated enter/leave time
24Program snapshot
25Online Placement Results
Percentage of accepted modules using different
bin-packing and empty space partitioning rules
26Online Placement Results (cont.)
27Online Placement Results (cont.)
28Outline
Outline
- FPGA What and why?
- What is Reconfigurable Computing System (RCS)?
- Application example
- RCS System components
- Online placement problem definition and our
approach - Offline placement and scheduling
- Flexible modules and firm templates
- Conclusion and future work
?
293-D Floorplanning
DFG
Schedule
RFU CPU
RFU area
time
RFU
303-D Floorplanning
DFG
Schedule
RFU CPU
RFU
313-D Floorplanning
DFG
Schedule
RFU CPU
RFU
323-D Floorplanning
DFG
Schedule
RFU CPU
RFU
333-D Floorplanning
DFG
Schedule
RFU CPU
RFU
34Our Current 3-D Floorplanners
- No change in the schedule
- Fixed insertion and deletions of RFUOPs
- Annealing based.
- Move set
- Move operation from CPU set to RFU set
- Move operation from RFU set to CPU set
- Displace an already placed RFUOP on the RFU
- Cost function
- Penalty in rejecting modules (sum of volumes of
the RFUOPs in the CPU set) - No overlap allowed during annealing
- Greedy
- Sort the modules on decreasing vol., apply KAMER
35Our Current 3-D Floorplanners (cont.)
- KAMER-BF-Decreasing
- Sort the modules on their volumes
- Use KAMER to find a fast placement of the modules
- Low-temp. annealing (LTSA)
- Similar to KAMER-BFD, but use KAMER to place only
the X largest modules - Use low-temp annealing to place the rest
- Zero-temp. annealing (ZTSA) -- Greedy
- Use KAMER to place as many modules as you can
- Use only displace and move from CPU to RFU
annealing moves.
36Our Current 3-D Floorplanners (cont.)
- BFOP - Best Fit Online Placement
- Sort the RFUOPs on volume (decreasing)
- For each RFUOP, find candidate corners
- Choose the corner which results in min wasted
area(similar to well-studied 2-D Bin Packing
problem)
corners
t1
t1
A Floor corresponding to time t1
t
y
x
37Annealing-Based Offline vs. Online
Percentage of accepted modules and penalties
using two offline parameters. The higher the RFU
acceptance rate and lower the penalty, the better
the algorithm.
38Offline Placement Results - All
39Outline
Outline
- FPGA What and why?
- What is Reconfigurable Computing System (RCS)?
- Application example
- RCS System components
- Online placement problem definition and our
approach - Offline placement and scheduling
- Flexible modules and firm templates
- Conclusion and future work
?
40Flexible Modules
- Library of soft templates
- Flexible shapes
- Constant area, different width,height
- Problem? Hard to build (PD should be done for
each shape) - Median
- Use the same area, but square shape
- Rotation
- Placement method
- Use best shape (min wasted area)
41Using Flexible Modules in BFOP
Median uses a square module with the same area
42Flexible Modules (cont.)
- Firm templates
- Slice the module into x horizontal or vertical
strips - If cannot place the module, use the 2-split,
3-split, until you can fit. - Problem?
- Routing!
- Limited module types can be split (like carry
chains, etc. with min communication between
stages)
Vertical 3-split
43Quality Improvements Using Firm Templates
44Outline
Outline
- FPGA What and why?
- What is Reconfigurable Computing System (RCS)?
- Application example
- RCS System components
- Online placement problem definition and our
approach - Offline placement and scheduling
- Flexible modules and firm templates
- Conclusion and future work
?
45Conclusion
- Which online algorithm?
- If speed is an issue, SSEG, ow KAMER
- Online or offline?
- If you have the schedule gt offline
- Which offline algorithm?
- BFOP is the best (fasterbetter quality)
- Median? Flexibility? Firm templates?
- Surprisingly, median gives little improvement
- If flexible shape avail, better than splitting
(no additional routing problem) - How many splits?
- no-split ? 2-split 23 improvement
- 5-split ? 6-split 3 improvement