An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction - PowerPoint PPT Presentation

About This Presentation

Title:

An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction

Description:

with LEAST area and power penalty [Lin, TCAD'06]. Vdd Programmable ... and effective driving resistance of switch has been ... of switches along this path. ... – PowerPoint PPT presentation

Number of Views:94

Avg rating:3.0/5.0

Slides: 25

Provided by: Fei64

Learn more at: http://eda.ee.ucla.edu

Category:

more less

Transcript and Presenter's Notes

Title: An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction

1
An Efficient Chiplevel Time Slack Allocation
Algorithm for Dual-Vdd FPGA Power Reduction

Yan Lin1, Yu Hu1, Lei He1 and Vijay Raghunathan2
1EE Department, UCLA
2Purdue University
Partially supported by NSF.
Address comments to lhe_at_ee.ucla.edu

2
Outline

Background, Motivation and Problem Formulation
Chip-level Vdd-level Assignment Algorithm
for mixed length wire segments, Hu et al,
DAC06
Network Flow Based Vdd Level Assignment
Formulation
Experimental Results
Conclusions

3
Background

Existing FPGAs are power inefficient compared to
ASICs.
Interconnect is the dominant component of FPGA
power dissipation (dynamic and leakage). Li,
TCAD05
Power aware FPGA architectures and CAD algorithms
have been studied extensively.
CAD algorithms to minimize power-delay
productLamoureux, ICCAD03
Configuration inversion for leakage
reductionAnderson, FPGA04
Vdd-programmable FPGA logic blocks
Li, FPGA04 Li, DAC04
Vdd-programmable FPGA interconnects
Li, ICCAD04 Gayasen, FPL04 Anderson,
ICCAD04 Lin, DAC05

4
Vdd Programmable Interconnect Arch.

Island style and mixed wire segment length.
Routing switch/connection block (Two PMOS power
transistors M3 and M4 are inserted between the
tri-state buffer and VddH, VddL power rails,
respectively.) Li, ICCAD04
Level converter free in routing tree (Guarantee
that no VddL switch drives VddH switches.) with
LEAST area and power penaltyLin, TCAD06.

5
Limitation of Existing Approaches

Uniform wire segment length was assumed, and
cannot be extended to mixed wire segment
directly.
LP based formulation is timing consuming and
computational instable.

Time consuming runtime goes up quickly for large
circuit
Computational instability small size circuit
uses long runtime
6
Problem Formulations

Dual-Vdd Level Assignment Problem
Given placement and routing results of a FPGA
design
Find A Vdd-level assignment to each interconnect
switch
Objective Minimize interconnect (dynamic and
leakage) power
Constraints
Meet the delay target Tspec
Vdd-level converters are inserted ONLY at CLB
inputs/outputs

7
Outline

Background, Motivation and Problem Formulation
Chip-level Vdd-level Assignment Algorithm for
mixed length wire segments, Hu et al, DAC06
Interconnect Power Reduction Estimation
LP Based Vdd-level Assignment Algorithm
Network Flow Based Vdd Level Assignment
Formulation
Experimental Results
Conclusions

8
Delay and Power Model for Interconnect

Delay Model
Intrinsic delay and effective driving resistance
of switch has been pre-characterized using SPICE.
Elmore delay is used to calculate routing delay.
Interconnect Power Model
Dynamic power Pd(Vddjj)0.5fclkCVddjj2
Leakage power Pl(Vddjj) is pre-characterized
using SPICE
Interconnect power reduction estimation is the
essential part of dual-Vdd assignment algorithm.

9
Review of Vdd Level Assignment Algorithm Lin,
DAC'05
Interconnect power reduction estimation
Problem remained How to calculate VddL
possibility for mixed wire segment?
The net-level bottom-up Vdd assignment guarantees
the legalization of final solutions. Lin, DAC05
Leverage all extra slack with VddL switches Lin,
DAC05
10
VddL Possibility Calculation

Represent timing slack in number of switches
si Li ( Si / Di )
si is the number of VddL switches can be inserted
in the path from source to jth sink in the
routing tree.
Li is the number of switches along this path.
si how many switches can be turned to VddL
along source-to-sink-i path for the given timing
slack Si.
VddL possiblity for switch j at sink i based on
load capacity
f(i,j) si (cij / Ci)
Key idea distribute timing slack to each switch
based on cap.

f(2,2) 1 f(2,3) 1 f(2,4) 1/2
L2 3 D2 12 s2 3(10/12)5/2
11
Power Reduction Estimation for Mixed Wire Segments

The lower bound estimation Y. Lin, DAC'05 for
interconnect power reduction is no longer valid
for mixed wire segments.
Our solution develop the upper bound estimation
of VddL switch number
Consistent upper bound of power reduction
Remove the non-linear term "min" and the
corresponding extra LP constraints from lower
bound estimation

1.7 slack left -1.8 needed! Only 1.0 VddL switch
assignment
b1, 16x, need 1.8 slack
fn(i,1) 0.9 fn(i,2) 0.5 lower bound of VddL
switches 0.9 .5 1.4
b2, 8x, need 1.0 slack
Consume 1.0
S 2.7
S 2.7
Problem here Lower bound gt actual number!
Sum up all VddL possibility
12
LP formulation for dual-Vdd Level Assignment

Basic timing constraints
Slack constraints
Objective function

Arrival time for prim-output
Arrival time for prim-input
Arrival time constraints
Slack upper bound
Slack constraints
Slack non-negative
13
Outline

Motivation
Problem Formulations
Chip-level Vdd-level Assignment Algorithm
for mixed length wire segments, Hu et al,
DAC06
Network Flow Based Vdd Level Assignment
Formulation
Overview of network flow based timing slack
budgeting
Primal-dual reformulation
Experimental Results
Conclusions

14
Network Flow Based Timing Slack Budgeting

Motivated by Ghiasi, ICCAD04 for logic level
optimization
Step 1 Reorganize objective function
Step 2 Eliminate timing slack variables (by
substitution)

15
Network Flow Based Timing Slack Budgeting (cont.)

Step 3 Reorganize objective function by timing
nodes
Step 4 Generate dual-problem

Constant terms, remove
Constant coefficients
Node by node
Edge by edge
Edge by edge
Node by node
16
Link Induced Network from Timing Graph
Flow in backward arch (dot segments)
Flow in forward arch (solid segments)
Demand in node i

No negative weight cycle exists in the induced
network. A min-cost flow can be found for sure!
A shortest path based algorithm is used to
produce the solution for primal problem.

17
Outline

Motivation
Problem Formulations
Chip-level Vdd-level Assignment Algorithm
for mixed length wire segments, Hu et al,
DAC06
Network Flow Based Formulation
Experimental Results
Conclusions

18
Experimental Setting

Cluster-based Island Style FPGA Structure
Size-10 cluster and size-4 LUT
100 buffered interconnects, subset switch block
60 length-4 and 40 length-8l wire segments
25x buffer for length-4 and 10x buffer for
length-8
ITRS 100nm technology, 1.3v for VddH and 0.8v for
VddL
Use VPR Betz-Rose-Marquardt for placement and
routing
Use fpgaEva-LP2 Lin et al, FPGA05 for power
calculation
Considering short-circuit power, glitch power and
input vector
8 average error compared to SPICE simulation
20 biggest sequential MCNC benchmarks are tested
Use LPsolver to solve LP

19
Dual-Vdd Assignment for FPGAs with Mixed Wire
Segments

Both LP-based and Netflow-based algorithm
achieves 85 VddL assignment on average.

20
Interconnect Power Reduction
52 total interconnect power reduction is
achieved!
21
Runtime comparison
More significant speedup is expected for larger
circuits.
Netflow based algorithm gets consistent speedup
and stable runtime
22
Outline

Motivation
Problem Formulations
Chip-level Vdd-level Assignment Algorithm
for mixed length wire segments
Network Flow Based Formulation
Experimental Results
Conclusions

23
Conclusions

A min-cost network flow based timing budgeting
formulation which speedups up the budgeting
procedure and the overall design flow up to 6000x
and 20x, respectively, compared to LP based one.
Both chip-level dual-Vdd assignment algorithms
are for mixed length wire segment. Experimental
results show an interconnect power reduction of
53 on average compared to single-Vdd FPGA
designs.