Sudarshan Banerjee, Elaheh Bozorgzadeh, - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Sudarshan Banerjee, Elaheh Bozorgzadeh,

Description:

Sudarshan Banerjee, Elaheh Bozorgzadeh, Nikil Dutt. Center for Embedded Computer Systems (CECS) ... xi,j,k , ri,j,k : Task i scheduled (reconfigured) at time-step j, ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 31
Provided by: sudarshan
Category:

less

Transcript and Presenter's Notes

Title: Sudarshan Banerjee, Elaheh Bozorgzadeh,


1
Physically-aware HW-SW Partitioning For
Reconfigurable Architectures with Partial Dynamic
Reconfiguration
  • Sudarshan Banerjee, Elaheh Bozorgzadeh,
  • Nikil Dutt
  • Center for Embedded Computer Systems (CECS)
  • Donald Bren School of Information and Computer
    Sciences
  • University of California, Irvine
  • http//www.cecs.uci.edu/aces

Work partially supported by NSF grants
CCR-0203813, CCR-0205712
2
Outline
  • Introduction
  • Problem overview
  • Dynamically reconfigurable architecture
  • Placement considerations in scheduling
  • Detailed problem description
  • Related work
  • Approach
  • Experiments
  • Conclusion

3
Introduction
Design Specification
Partial Dynamic Reconfiguration (RTR)
Partitioning
SW
HW
I/f
Reuse hardware ? More performance
.
M
HW
P
Co-design flow
Communication
4
Problem Overview
Task Dependency graph
HW-SW platform
Objective Minimize application execution
time (schedule length) (Map each task to
Hardware or Software)
5
Dynamically Reconfigurable Architecture
Off-chip memory
On-chip shared memory
Height
CLB
Single context
Column-based Partial RTR
Tj
Width
6
Criticality of linear placement simple example
T3
T1
T2
Width
C2
C1
C3
C4
T4
T1
T2
T3
Execution time
t2
T4
Simple example of infeasibility
7
Criticality of linear placement simple example
Width
Width
C2
C1
C3
C4
C2
C1
C3
C4
T1
T2
T3
T1
T2
T3
Execution time
Execution time
t2
t2
T4
T4
Infeasible
Feasible
8
Criticality of linear placement- detailed example
C2
C1
C3
C4
C5
C9
C6
C7
C8
1
2
3
4
5
6
7
8
9
T10
10
Simultaneous scheduling and placement to
guarantee feasibility
9
Outline
  • Introduction
  • Problem overview
  • Dynamically reconfigurable architecture
  • Placement considerations in scheduling
  • Detailed problem description
  • Related work
  • Approach
  • Experiments
  • Conclusion

10
Detailed Problem Description
Key considerations for HW-SW partitioning
  • Consider physical constraints (columnar task
    placement)
  • as an integral part of problem
  • Consider configuration prefetch to hide
    reconfiguration overhead
  • Consider multiple task implementation points
  • compiler optimizations, heterogeneity

Dedicated resources (embedded multipliers)
Feasible, high-quality solutions (short schedule
length)
  • For each task, suitable implementation point
    is determined
  • For each task, start of execution is
    determined
  • For each task implementation on HW, task
    location is determined

11
Related work
  • Large body of work in HW-SW partitioning
  • Gupta et al (93) Vahid et al (97)
    Eles et al (97) Chatha et al (00)
  • ? NO partial RTR considerations
  • Work on joint scheduling and placement for
    dependency graphs
  • Fekete et al (DATE 01), Yuh et al (ICCAD
    04)
  • ? Theoretical treatment (closer to
    rectangle-packing)
  • NO configuration prefetch considerations
  • Other work on hiding reconfiguration latency
  • Configuration reuse, configuration caching
    (Li et al, FCCM 00), etc
  • ? NO joint scheduling, placement
    considerations

12
Related Work in HW-SW partitioning for partial RTR
  • Mei et al ProRISC
    2000
  • Genetic algorithm for HW-SW partitioning
    with partial RTR
  • Columnar architecture
  • No configuration prefetch considerations
  • Jeong et al ASPDAC 2000
  • ILP, heuristic for HW-SW partitioning with
    partial RTR
  • Detailed configuration prefetch
    considerations
  • Bottleneck of reconfiguration
    mechanism
  • No placement considerations

13
Outline
  • Introduction
  • Problem overview
  • Dynamically reconfigurable architecture
  • Placement considerations in scheduling
  • Detailed problem description
  • Related work
  • Approach
  • Experiments
  • Conclusion

14
Approach
  • Exact formulation (ILP integer linear
    programming)
  • Key variables
  • xi,j,k , ri,j,k Task i scheduled
    (reconfigured) at time-step j,
  • placed on FPGA starting
    from column k
  • Constraints
  • (traditional HW-SW partitioning contiguous
    linear placement
  • configuration prefetch)
  • Implementation on CPLEX- extremely slow
  • Heuristic formulation
  • KLFM-based approach

15
KLFM quick overview
  • KLFM (Kernighan-Lin/Fiduccia-Matheyes)-based
    partitioning
  • Move-based approach
  • Widely used in circuit partitioning
  • Integration with scheduling for HW-SW
    partitioning
  • Chatha et al, (00), Vallejo
    et al, (03), etc

Kernel of KLFM-based approaches while (more
unlocked nodes) best Node SELECT NODE
MOVE AND LOCK (best Node) UPDATE
NEIGHBOURS (best Node) endwhile
16
Modified KLFM for HW-SW partitioning with partial
RTR
  • Additional considerations
  • Linear placement
  • Multiple implementation points

Modified KLFM kernel for partial RTR while (more
unlocked nodes) for each unlocked node
for each non-current implementation point of
node Calculate makespan by
physically-aware list-scheduling best Node
SELECT NODE MOVE AND LOCK (best Node,
implementation point) UPDATE NEIGHBOURS
(best Node) endwhile
17
Physically-aware List-scheduling
  • Each task bound to implementation point
  • Priority of HW tasks based on physical
    considerations

List-scheduling kernel for next task selection
for each schedulable task, // all parent
dependencies satisfied compute EST
(earliest start time of computation) EFT
(earliest finish time) EFT task execution
time Choose task that maximizes f (EST,
path length, width, EFT)
  • EST computation embeds physical and
    architectural constraints

f - A width - B EST C (path length) -
D EFT
18
EST computation Example

Task 6 on SW
EXECUTE Task 1
PREFETCH gap
RECONFIG Task 5
HW-SW comm (6,5)
19
EST computation approach

C1
C2
C5
C3
Time
C4
C6
First-fit placement
1
E2
E1
2
Find earliest slot task can be placed
3
R3
Find earliest slot reconfig. cntrl free
(and space available)
4
R4
5
Gap
If (reconfig. finish time lt parent finish)
EST parent finish time
6
E3
E4
(overhead hidden- possible gap)
7
Else EST reconfig finish time
8
9
20
Heterogeneous Architecture
Off-chip memory
On-chip shared memory
Height
CLB
Heterogeneous
Single context
Column-based Partial RTR
Width
21
Heterogeneity considerations
  • Heterogeneous implementations often more
    efficient
  • Synthesis of 2-dimensional DCT on
    Virtex-II chip, XC2V2000
  • Columnar placement and routing
    constraints
  • Homogenous implementation
    64 MHz
  • Heterogeneous implementation
    88 MHz
  • embedded multipliers (MULTX18) embedded
    memory (BRAM)
  • Limited resources, only at fixed columnar
    locations
  • Extending approach for heterogeneity
  • Primary modification task placement
  • Simple approach add type descriptor
    for each column

22
Outline
  • Introduction
  • Problem overview
  • Dynamically reconfigurable architecture
  • Placement considerations in scheduling
  • Detailed problem description
  • Related work
  • Approach
  • Experiments
  • Conclusion

23
Experimental Setup
  • Application graph structures, and, large set of
    problem
  • instances generated with TGFF
    (Dick et al, CODES 98)
  • Varying Graph size
  • Varying In-degree/Out-degree
  • Varying area constraints
  • Detailed JPEG case study
  • Multiple implementation points,
  • Heterogeneity

24
Experiments on feasibility
Placement-unaware
Placement-aware
Topt
Theu
Exact schedule length with area consideration
Exact schedule length with columnar placement
(new ILP)
25
Heuristic quality (sample results)
LPF nodes on Longest Path First
Sample experiments for graphs with 60 tasks
26
Heuristic quality (aggregate results)
Quality (Tlpf Theu)/Theu 100
Gain increases as area constraint increases
Gain increases as graph size increases
27
Case study on JPEG encoding
  • Basis for numerical data
  • Synthesis under placement, routing
    constraints on XC2V2000
  • Execution times corresponding to 256 X 256
    colour image

Aggregate unconstrained task area 11 columns
(homogenous) Area constraint 8 columns
28
Execution time
Execution time O(minutes) for graphs with
100 nodes, 20 columns
180
Execution time (s)
90
45
50
10
100
20
Graph size (number of vertices)
Run-time measurements SunOS 5.8 with
502 MHz sparcv9 processor
29
Conclusion
Contribution
  • Comprehensive HW-SW partitioning for partial RTR
  • Partitioning, Scheduling, Linear placement
  • Multiple implementation points, Heterogeneity
  • Reasonable run-time O(minutes) for graphs with
    100s of tasks
  • Current limitation of simple approach
  • More sophisticated placement considerations
  • Future work
  • More investigations on multiple
    implementations, heterogeneity

30
Thank You !
  • Questions/Comments?
  • E-mail banerjee_at_uci.edu
Write a Comment
User Comments (0)
About PowerShow.com