Physical Design for Reconfigurable Computing Systems using Firm Templates - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

Physical Design for Reconfigurable Computing Systems using Firm Templates

Description:

Application Example: Image Restoration. The value of the center pixel in the next iteration: ... Pixels of individual segments are restored in parallel by hardware. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 46
Provided by: kiar
Learn more at: https://cseweb.ucsd.edu
Category:

less

Transcript and Presenter's Notes

Title: Physical Design for Reconfigurable Computing Systems using Firm Templates


1
Physical Design for Reconfigurable Computing
Systems using Firm Templates
K. Bazargan R. Kastner M. Sarrafzadeh
  • Department of Electrical
  • Computer Engineering
  • Northwestern University

2
Outline
Outline
  • FPGA What and why?
  • What is Reconfigurable Computing System (RCS)?
  • Application example
  • RCS System components
  • Online placement problem definition and our
    approach
  • Offline placement and scheduling
  • Flexible modules and firm templates
  • Conclusion and future work

3
Outline
Outline
  • FPGA What and why?
  • What is Reconfigurable Computing System (RCS)?
  • Application example
  • RCS System components
  • Online placement problem definition and our
    approach
  • Offline placement and scheduling
  • Flexible modules and firm templates
  • Conclusion and future work

?
?
4
The Architecture of a Reconfigurable System
Data Memory
Data
Data
CPU
Control
Data
RFUOPs
Instruction Memory (Program)
5
Execution of a Sample Program
6
Outline
Outline
  • FPGA What and why?
  • What is Reconfigurable Computing System (RCS)?
  • Application example
  • RCS System components
  • Online placement problem definition and our
    approach
  • Offline placement and scheduling
  • Flexible modules and firm templates
  • Conclusion and future work

?
7
Application Example Image Restoration
The value of the center pixel in the next
iteration xk1 ?y xk - ?
(dxk)
y the pixel value from the original degraded
image xk the pixel value from the previous
iteration dxk denotes the weighted sum
r1 ? (eight neighbor pixels) r0 center pixel
r1
r1
r1
r1
r1
r0
r1
r1
r1
8
Image Restoration (cont.)
  • Incentive
  • Processing of large images using FPGAs with
    limited resources
  • Strategy
  • Segmentation of the image intosmaller sized
    images suitablefor the FPGA
  • Segments of size m x nare surrounded by an
    overlap of o.

9
Image Restoration Data Flow Strategy
  • Data flow strategy
  • Pixels of individual segments are restored in
    parallel by hardware.
  • Restored segments are written back after the
    overlap is discarded

MEMORY
RFU
10
Image Restoration Example
Degraded Image
Restored Image
11
Outline
Outline
  • FPGA What and why?
  • What is Reconfigurable Computing System (RCS)?
  • Application example
  • RCS System components
  • Online placement problem definition and our
    approach
  • Offline placement and scheduling
  • Flexible modules and firm templates
  • Conclusion and future work

?
12
System Components
CPU
Data
Data Memory
Data
Data
13
Outline
Outline
  • FPGA What and why?
  • What is Reconfigurable Computing System (RCS)?
  • Application example
  • RCS System components
  • Online placement problem definition and our
    approach
  • Offline placement and scheduling
  • Flexible modules and firm templates
  • Conclusion and future work

?
14
Online Placement Problem Definition
  • Input
  • RFU dimensions (W, H)
  • List of RFUOP events (w, h, arrival, departure)

15
Online Placement
Current Placement

?
  • When a new RFUOP arrives,
  • Is there enough room?
  • If yes, which location is best?
  • Previous work
  • Bin-packing heuristics (1-D) - O(n2)
  • First Fit, Best Fit, Shelf, Look ahead,
  • Chazelle83 The Bottom-Left heuristic. O(n2)
  • Healy-Creavin97 O(n2 lg n)

16
Our Online Placement
  • Our approach
  • Divide the empty space into explicit empty
    rectangles
  • When a new RFUOP arrives
  • Is there enough room? (any ER large
    enough?)
  • If yes, which location is best? (which ER is
    best?)

?
  • Packing rule
  • Best Fit, Bottom Left, First Fit

?
17
Heuristics for Choosing an Empty Rectangle

18
Our Online Placement
  • Our approach
  • Divide the empty space into explicit empty
    rectangles
  • When a new RFUOP arrives
  • Is there enough room? (any ER large enough?)
  • If yes, which location is best? (which ER is
    best?)

?
  • Managing the empty space
  • Keep empty rectangles explicitly, use range
    tree to store/access empty rects.
  • Efficient use of RFU real estate
  • KAMER Keep all O(n2) maximal empty rectangles

?
19
Keeping All Empty Rectangles
20
Our Online Placement
  • Our approach
  • Divide the empty space into explicit empty
    rectangles
  • When a new RFUOP arrives
  • Is there enough room? (any ER large enough?)
  • If yes, which location is best? (which ER is
    best?)
  • Managing the empty space
  • Keep empty rectangles explicitly, use range
    tree to store/access empty rects.
  • Efficient use of RFU real estate
  • KAMER Keep all O(n2) maximal empty rectangles
  • Fast but sub-optimal
  • Keep only O(n) empty rectangles
  • Shorter Seg. (SSEG), Square Empty Rects. (SQR),
    ...

?
?
21
Keeping O(n) Empty Rectangles - SSEG
22
Heuristics for Choosing a Segment
A
S1
C
A
C
B
B
S2
D
D
?
?
SSEG (Shorter Seg)
BER (Balanced Empty Rects)
LSQR (Larger Rect Square)
Chooses the shorter of the two segments.
Chooses the segment which creates less area
difference.
Chooses the segment which creates the larger
rectangle closer to square.
Area(B) - Area(A) gt Area(D) - Area(C)
S1 lt S2
AspectRatio(B) gt AspectRatio(D)
A
C
S1
A
C
B
B
S2
D
D
?
?
?
?
LER (Large Empty Rects)
SQR (Square Rects)
LSEG (Longer Seg)
Chooses the segment which creates empty
rectangles closer to squares.
Chooses the longer of the two segments.
Chooses the segment which creates the larger
empty rectangle.
MaxAR(A),AR(B) lt MaxAR(C),AR(D) AR
AspectRatio
S1 lt S2
Area(B) gt Area(D)
23
How Good is a Placement?
  • Acceptance rate
  • percentage of modules accepted (placed)
  • Volume penalty
  • Area ? complexity
  • Time-span in the system ? loop iterations
  • Penalty of rejecting a module penalty
    volume area time
  • Input data
  • Randomly generated dimensions
  • Randomly generated enter/leave time

24
Program snapshot
25
Online Placement Results
Percentage of accepted modules using different
bin-packing and empty space partitioning rules
26
Online Placement Results (cont.)
27
Online Placement Results (cont.)
28
Outline
Outline
  • FPGA What and why?
  • What is Reconfigurable Computing System (RCS)?
  • Application example
  • RCS System components
  • Online placement problem definition and our
    approach
  • Offline placement and scheduling
  • Flexible modules and firm templates
  • Conclusion and future work

?
29
3-D Floorplanning
DFG
Schedule
RFU CPU
RFU area
time
RFU
30
3-D Floorplanning
DFG
Schedule
RFU CPU
RFU
31
3-D Floorplanning
DFG
Schedule
RFU CPU
RFU
32
3-D Floorplanning
DFG
Schedule
RFU CPU
RFU
33
3-D Floorplanning
DFG
Schedule
RFU CPU
RFU
34
Our Current 3-D Floorplanners
  • No change in the schedule
  • Fixed insertion and deletions of RFUOPs
  • Annealing based.
  • Move set
  • Move operation from CPU set to RFU set
  • Move operation from RFU set to CPU set
  • Displace an already placed RFUOP on the RFU
  • Cost function
  • Penalty in rejecting modules (sum of volumes of
    the RFUOPs in the CPU set)
  • No overlap allowed during annealing
  • Greedy
  • Sort the modules on decreasing vol., apply KAMER

35
Our Current 3-D Floorplanners (cont.)
  • KAMER-BF-Decreasing
  • Sort the modules on their volumes
  • Use KAMER to find a fast placement of the modules
  • Low-temp. annealing (LTSA)
  • Similar to KAMER-BFD, but use KAMER to place only
    the X largest modules
  • Use low-temp annealing to place the rest
  • Zero-temp. annealing (ZTSA) -- Greedy
  • Use KAMER to place as many modules as you can
  • Use only displace and move from CPU to RFU
    annealing moves.

36
Our Current 3-D Floorplanners (cont.)
  • BFOP - Best Fit Online Placement
  • Sort the RFUOPs on volume (decreasing)
  • For each RFUOP, find candidate corners
  • Choose the corner which results in min wasted
    area(similar to well-studied 2-D Bin Packing
    problem)

corners
t1
t1
A Floor corresponding to time t1
t
y
x
37
Annealing-Based Offline vs. Online
Percentage of accepted modules and penalties
using two offline parameters. The higher the RFU
acceptance rate and lower the penalty, the better
the algorithm.
38
Offline Placement Results - All
39
Outline
Outline
  • FPGA What and why?
  • What is Reconfigurable Computing System (RCS)?
  • Application example
  • RCS System components
  • Online placement problem definition and our
    approach
  • Offline placement and scheduling
  • Flexible modules and firm templates
  • Conclusion and future work

?
40
Flexible Modules
  • Library of soft templates
  • Flexible shapes
  • Constant area, different width,height
  • Problem? Hard to build (PD should be done for
    each shape)
  • Median
  • Use the same area, but square shape
  • Rotation
  • Placement method
  • Use best shape (min wasted area)

41
Using Flexible Modules in BFOP
Median uses a square module with the same area
42
Flexible Modules (cont.)
  • Firm templates
  • Slice the module into x horizontal or vertical
    strips
  • If cannot place the module, use the 2-split,
    3-split, until you can fit.
  • Problem?
  • Routing!
  • Limited module types can be split (like carry
    chains, etc. with min communication between
    stages)

Vertical 3-split
43
Quality Improvements Using Firm Templates
44
Outline
Outline
  • FPGA What and why?
  • What is Reconfigurable Computing System (RCS)?
  • Application example
  • RCS System components
  • Online placement problem definition and our
    approach
  • Offline placement and scheduling
  • Flexible modules and firm templates
  • Conclusion and future work

?
45
Conclusion
  • Which online algorithm?
  • If speed is an issue, SSEG, ow KAMER
  • Online or offline?
  • If you have the schedule gt offline
  • Which offline algorithm?
  • BFOP is the best (fasterbetter quality)
  • Median? Flexibility? Firm templates?
  • Surprisingly, median gives little improvement
  • If flexible shape avail, better than splitting
    (no additional routing problem)
  • How many splits?
  • no-split ? 2-split 23 improvement
  • 5-split ? 6-split 3 improvement
Write a Comment
User Comments (0)
About PowerShow.com