Title: An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction
1An Efficient Chiplevel Time Slack Allocation
Algorithm for Dual-Vdd FPGA Power Reduction
- Yan Lin1, Yu Hu1, Lei He1 and Vijay Raghunathan2
- 1EE Department, UCLA
- 2Purdue University
- Partially supported by NSF.
- Address comments to lhe_at_ee.ucla.edu
2Outline
- Background, Motivation and Problem Formulation
- Chip-level Vdd-level Assignment Algorithm
- for mixed length wire segments, Hu et al,
DAC06 - Network Flow Based Vdd Level Assignment
Formulation - Experimental Results
- Conclusions
3Background
- Existing FPGAs are power inefficient compared to
ASICs. - Interconnect is the dominant component of FPGA
power dissipation (dynamic and leakage). Li,
TCAD05 - Power aware FPGA architectures and CAD algorithms
have been studied extensively. - CAD algorithms to minimize power-delay
productLamoureux, ICCAD03 - Configuration inversion for leakage
reductionAnderson, FPGA04 - Vdd-programmable FPGA logic blocks
- Li, FPGA04 Li, DAC04
- Vdd-programmable FPGA interconnects
- Li, ICCAD04 Gayasen, FPL04 Anderson,
ICCAD04 Lin, DAC05
4Vdd Programmable Interconnect Arch.
- Island style and mixed wire segment length.
- Routing switch/connection block (Two PMOS power
transistors M3 and M4 are inserted between the
tri-state buffer and VddH, VddL power rails,
respectively.) Li, ICCAD04 - Level converter free in routing tree (Guarantee
that no VddL switch drives VddH switches.) with
LEAST area and power penaltyLin, TCAD06.
5Limitation of Existing Approaches
- Uniform wire segment length was assumed, and
cannot be extended to mixed wire segment
directly. - LP based formulation is timing consuming and
computational instable.
Time consuming runtime goes up quickly for large
circuit
Computational instability small size circuit
uses long runtime
6Problem Formulations
- Dual-Vdd Level Assignment Problem
- Given placement and routing results of a FPGA
design - Find A Vdd-level assignment to each interconnect
switch - Objective Minimize interconnect (dynamic and
leakage) power - Constraints
- Meet the delay target Tspec
- Vdd-level converters are inserted ONLY at CLB
inputs/outputs
7Outline
- Background, Motivation and Problem Formulation
- Chip-level Vdd-level Assignment Algorithm for
mixed length wire segments, Hu et al, DAC06 - Interconnect Power Reduction Estimation
- LP Based Vdd-level Assignment Algorithm
- Network Flow Based Vdd Level Assignment
Formulation - Experimental Results
- Conclusions
8Delay and Power Model for Interconnect
- Delay Model
- Intrinsic delay and effective driving resistance
of switch has been pre-characterized using SPICE. - Elmore delay is used to calculate routing delay.
- Interconnect Power Model
- Dynamic power Pd(Vddjj)0.5fclkCVddjj2
- Leakage power Pl(Vddjj) is pre-characterized
using SPICE - Interconnect power reduction estimation is the
essential part of dual-Vdd assignment algorithm.
9Review of Vdd Level Assignment Algorithm Lin,
DAC'05
Interconnect power reduction estimation
Problem remained How to calculate VddL
possibility for mixed wire segment?
The net-level bottom-up Vdd assignment guarantees
the legalization of final solutions. Lin, DAC05
Leverage all extra slack with VddL switches Lin,
DAC05
10VddL Possibility Calculation
- Represent timing slack in number of switches
- si Li ( Si / Di )
- si is the number of VddL switches can be inserted
in the path from source to jth sink in the
routing tree. - Li is the number of switches along this path.
- si how many switches can be turned to VddL
along source-to-sink-i path for the given timing
slack Si. - VddL possiblity for switch j at sink i based on
load capacity - f(i,j) si (cij / Ci)
- Key idea distribute timing slack to each switch
based on cap.
f(2,2) 1 f(2,3) 1 f(2,4) 1/2
L2 3 D2 12 s2 3(10/12)5/2
11Power Reduction Estimation for Mixed Wire Segments
- The lower bound estimation Y. Lin, DAC'05 for
interconnect power reduction is no longer valid
for mixed wire segments. - Our solution develop the upper bound estimation
of VddL switch number - Consistent upper bound of power reduction
- Remove the non-linear term "min" and the
corresponding extra LP constraints from lower
bound estimation
1.7 slack left -1.8 needed! Only 1.0 VddL switch
assignment
b1, 16x, need 1.8 slack
fn(i,1) 0.9 fn(i,2) 0.5 lower bound of VddL
switches 0.9 .5 1.4
b2, 8x, need 1.0 slack
Consume 1.0
S 2.7
S 2.7
Problem here Lower bound gt actual number!
Sum up all VddL possibility
12LP formulation for dual-Vdd Level Assignment
- Basic timing constraints
- Slack constraints
- Objective function
Arrival time for prim-output
Arrival time for prim-input
Arrival time constraints
Slack upper bound
Slack constraints
Slack non-negative
13Outline
- Motivation
- Problem Formulations
- Chip-level Vdd-level Assignment Algorithm
- for mixed length wire segments, Hu et al,
DAC06 - Network Flow Based Vdd Level Assignment
Formulation - Overview of network flow based timing slack
budgeting - Primal-dual reformulation
- Experimental Results
- Conclusions
14Network Flow Based Timing Slack Budgeting
- Motivated by Ghiasi, ICCAD04 for logic level
optimization - Step 1 Reorganize objective function
- Step 2 Eliminate timing slack variables (by
substitution)
15Network Flow Based Timing Slack Budgeting (cont.)
- Step 3 Reorganize objective function by timing
nodes - Step 4 Generate dual-problem
Constant terms, remove
Constant coefficients
Node by node
Edge by edge
Edge by edge
Node by node
16Link Induced Network from Timing Graph
Flow in backward arch (dot segments)
Flow in forward arch (solid segments)
Demand in node i
- No negative weight cycle exists in the induced
network. A min-cost flow can be found for sure! - A shortest path based algorithm is used to
produce the solution for primal problem.
17Outline
- Motivation
- Problem Formulations
- Chip-level Vdd-level Assignment Algorithm
- for mixed length wire segments, Hu et al,
DAC06 - Network Flow Based Formulation
- Experimental Results
- Conclusions
18Experimental Setting
- Cluster-based Island Style FPGA Structure
- Size-10 cluster and size-4 LUT
- 100 buffered interconnects, subset switch block
- 60 length-4 and 40 length-8l wire segments
- 25x buffer for length-4 and 10x buffer for
length-8 - ITRS 100nm technology, 1.3v for VddH and 0.8v for
VddL - Use VPR Betz-Rose-Marquardt for placement and
routing - Use fpgaEva-LP2 Lin et al, FPGA05 for power
calculation - Considering short-circuit power, glitch power and
input vector - 8 average error compared to SPICE simulation
- 20 biggest sequential MCNC benchmarks are tested
- Use LPsolver to solve LP
19Dual-Vdd Assignment for FPGAs with Mixed Wire
Segments
- Both LP-based and Netflow-based algorithm
achieves 85 VddL assignment on average.
20Interconnect Power Reduction
52 total interconnect power reduction is
achieved!
21Runtime comparison
More significant speedup is expected for larger
circuits.
Netflow based algorithm gets consistent speedup
and stable runtime
22Outline
- Motivation
- Problem Formulations
- Chip-level Vdd-level Assignment Algorithm
- for mixed length wire segments
- Network Flow Based Formulation
- Experimental Results
- Conclusions
23Conclusions
- A min-cost network flow based timing budgeting
formulation which speedups up the budgeting
procedure and the overall design flow up to 6000x
and 20x, respectively, compared to LP based one. - Both chip-level dual-Vdd assignment algorithms
are for mixed length wire segment. Experimental
results show an interconnect power reduction of
53 on average compared to single-Vdd FPGA
designs.
24