Increasing Hardware Efficiency with Multifunction Loop Accelerators - PowerPoint PPT Presentation

About This Presentation
Title:

Increasing Hardware Efficiency with Multifunction Loop Accelerators

Description:

Na ve. Joint scheduling. Datapath union. Synthesis results. University of Michigan ... Na ve method: Design single function accelerators, place side by side ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 19
Provided by: fank
Category:

less

Transcript and Presenter's Notes

Title: Increasing Hardware Efficiency with Multifunction Loop Accelerators


1
Increasing Hardware Efficiency with Multifunction
Loop Accelerators
  • Kevin Fan, Manjunath Kudlur,
  • Hyunchul Park, Scott Mahlke
  • Advanced Computer Architecture Laboratory
  • University of Michigan
  • October 25, 2006

2
Introduction
  • Emerging applications have high performance,
    cost, energy demands
  • H.264, wireless, software radio, signal
    processing
  • 10-100 Gops required
  • 200 mW power budget
  • Applications dominated by tight loops processing
    large amounts of streaming data

CPU
Accelerators
3
Loop Accelerators
  • Order-of-magnitude performance and efficiency
    wins
  • Viterbi 100x speedup vs. ARM9

4
Prescribed Throughput Accelerators
  • Traditional behavioral synthesis
  • Directly translate C operatorsinto gates

Operation graph
Datapath
5
Outline
  • Loop accelerator schema and design flow
  • Cost sensitive scheduling
  • Designing multifunction accelerators
  • Naïve
  • Joint scheduling
  • Datapath union
  • Synthesis results

6
Loop Accelerator Template
  • Hardware realization of modulo scheduled loop
  • Parameterized execution resources, storage,
    connectivity

7
Loop Accelerator Design Flow
FU Alloc
FU
FU
.c
RF
C Code, Performance (Throughput)
Abstract Arch
8
Datapath Derived from Schedule
  • Schedule to abstract architecture (FUs)
  • Determine register and interconnect requirements
    from schedule

r1 Memr2 r3 r1 12
Source Code
9
Cost Sensitive Scheduling
  • Traditional scheduling is hardware unaware
  • Intelligent scheduling needed to reduce hardware
    cost

FU1
FU2
FU3
0
1
2
FU1
FU2
FU3
1
time
LD1
1
2
2
LD2
LD1
LD2
  • 27 cost reduction with same performance MICRO
    05

10
Multifunction Accelerator
  • Map multiple loops to single accelerator
  • Improve hardware efficiency via reuse
  • Opportunities for sharing
  • Disjoint stages(loops 2, 3)
  • Pipeline slack(loops 4, 5)

Loop 1
Frame Type?
Loop 2
Loop 3
Loop 4
Block 5

Application
11
Design Strategies
  • Naïve method Design single function
    accelerators, place side by side
  • Misses potential hardware sharing of FUs,
    storage, interconnect

Cost SensitiveModulo Scheduler
FU
FU
Loop 1
FU
FU
FU
FU
Cost SensitiveModulo Scheduler
FU
FU
Loop 2
Multifunction datapath
12
Joint Scheduling
Loop 1
JointCost SensitiveModulo Scheduler
Loop 2
  • Loops are independent possible schedules
    exponential in of loops!
  • Infeasible for modest problems

13
Multifunction Gate Costs
A
B
C
D
E
F
G
H
I
J
  • 43 average savings over sum of accelerators

14
Datapath Union
Cost SensitiveModulo Scheduler
FU
FU
Loop 1
Cost SensitiveModulo Scheduler
Loop 2
FU
FU
15
Datapath Union
  • Combine similar components? better hardware
    sharing? lower cost
  • Trade off FU and register cost
  • Combining dissimilar FUs can enable register cost
    savings
  • ILP formulation minimizes FU and register cost


-
M
M
Accel 1
Accel 2

Multi- function accel
16
Multifunction Gate Costs
A
B
C
D
E
F
G
H
I
J
  • Smart union within 3 of joint scheduling solution

17
Conclusion
  • Multifunction accelerators highly effective in
    exploiting coarse grained hardware sharing
  • Joint scheduling achieves 43 average cost
    savings, but is impractical
  • Smart union of independent accelerators achieves
    40 average savings
  • Compile times of 5 minutes 1 hour

18
Questions?
Write a Comment
User Comments (0)
About PowerShow.com