An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction - PowerPoint PPT Presentation

About This Presentation
Title:

An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction

Description:

with LEAST area and power penalty [Lin, TCAD'06]. Vdd Programmable ... and effective driving resistance of switch has been ... of switches along this path. ... – PowerPoint PPT presentation

Number of Views:94
Avg rating:3.0/5.0
Slides: 25
Provided by: Fei64
Learn more at: http://eda.ee.ucla.edu
Category:

less

Transcript and Presenter's Notes

Title: An Efficient Chiplevel Time Slack Allocation Algorithm for Dual-Vdd FPGA Power Reduction


1
An Efficient Chiplevel Time Slack Allocation
Algorithm for Dual-Vdd FPGA Power Reduction
  • Yan Lin1, Yu Hu1, Lei He1 and Vijay Raghunathan2
  • 1EE Department, UCLA
  • 2Purdue University
  • Partially supported by NSF.
  • Address comments to lhe_at_ee.ucla.edu

2
Outline
  • Background, Motivation and Problem Formulation
  • Chip-level Vdd-level Assignment Algorithm
  • for mixed length wire segments, Hu et al,
    DAC06
  • Network Flow Based Vdd Level Assignment
    Formulation
  • Experimental Results
  • Conclusions

3
Background
  • Existing FPGAs are power inefficient compared to
    ASICs.
  • Interconnect is the dominant component of FPGA
    power dissipation (dynamic and leakage). Li,
    TCAD05
  • Power aware FPGA architectures and CAD algorithms
    have been studied extensively.
  • CAD algorithms to minimize power-delay
    productLamoureux, ICCAD03
  • Configuration inversion for leakage
    reductionAnderson, FPGA04
  • Vdd-programmable FPGA logic blocks
  • Li, FPGA04 Li, DAC04
  • Vdd-programmable FPGA interconnects
  • Li, ICCAD04 Gayasen, FPL04 Anderson,
    ICCAD04 Lin, DAC05

4
Vdd Programmable Interconnect Arch.
  • Island style and mixed wire segment length.
  • Routing switch/connection block (Two PMOS power
    transistors M3 and M4 are inserted between the
    tri-state buffer and VddH, VddL power rails,
    respectively.) Li, ICCAD04
  • Level converter free in routing tree (Guarantee
    that no VddL switch drives VddH switches.) with
    LEAST area and power penaltyLin, TCAD06.

5
Limitation of Existing Approaches
  • Uniform wire segment length was assumed, and
    cannot be extended to mixed wire segment
    directly.
  • LP based formulation is timing consuming and
    computational instable.

Time consuming runtime goes up quickly for large
circuit
Computational instability small size circuit
uses long runtime
6
Problem Formulations
  • Dual-Vdd Level Assignment Problem
  • Given placement and routing results of a FPGA
    design
  • Find A Vdd-level assignment to each interconnect
    switch
  • Objective Minimize interconnect (dynamic and
    leakage) power
  • Constraints
  • Meet the delay target Tspec
  • Vdd-level converters are inserted ONLY at CLB
    inputs/outputs

7
Outline
  • Background, Motivation and Problem Formulation
  • Chip-level Vdd-level Assignment Algorithm for
    mixed length wire segments, Hu et al, DAC06
  • Interconnect Power Reduction Estimation
  • LP Based Vdd-level Assignment Algorithm
  • Network Flow Based Vdd Level Assignment
    Formulation
  • Experimental Results
  • Conclusions

8
Delay and Power Model for Interconnect
  • Delay Model
  • Intrinsic delay and effective driving resistance
    of switch has been pre-characterized using SPICE.
  • Elmore delay is used to calculate routing delay.
  • Interconnect Power Model
  • Dynamic power Pd(Vddjj)0.5fclkCVddjj2
  • Leakage power Pl(Vddjj) is pre-characterized
    using SPICE
  • Interconnect power reduction estimation is the
    essential part of dual-Vdd assignment algorithm.

9
Review of Vdd Level Assignment Algorithm Lin,
DAC'05
Interconnect power reduction estimation
Problem remained How to calculate VddL
possibility for mixed wire segment?
The net-level bottom-up Vdd assignment guarantees
the legalization of final solutions. Lin, DAC05
Leverage all extra slack with VddL switches Lin,
DAC05
10
VddL Possibility Calculation
  • Represent timing slack in number of switches
  • si Li ( Si / Di )
  • si is the number of VddL switches can be inserted
    in the path from source to jth sink in the
    routing tree.
  • Li is the number of switches along this path.
  • si how many switches can be turned to VddL
    along source-to-sink-i path for the given timing
    slack Si.
  • VddL possiblity for switch j at sink i based on
    load capacity
  • f(i,j) si (cij / Ci)
  • Key idea distribute timing slack to each switch
    based on cap.

f(2,2) 1 f(2,3) 1 f(2,4) 1/2
L2 3 D2 12 s2 3(10/12)5/2
11
Power Reduction Estimation for Mixed Wire Segments
  • The lower bound estimation Y. Lin, DAC'05 for
    interconnect power reduction is no longer valid
    for mixed wire segments.
  • Our solution develop the upper bound estimation
    of VddL switch number
  • Consistent upper bound of power reduction
  • Remove the non-linear term "min" and the
    corresponding extra LP constraints from lower
    bound estimation

1.7 slack left -1.8 needed! Only 1.0 VddL switch
assignment
b1, 16x, need 1.8 slack
fn(i,1) 0.9 fn(i,2) 0.5 lower bound of VddL
switches 0.9 .5 1.4
b2, 8x, need 1.0 slack
Consume 1.0
S 2.7
S 2.7
Problem here Lower bound gt actual number!
Sum up all VddL possibility
12
LP formulation for dual-Vdd Level Assignment
  • Basic timing constraints
  • Slack constraints
  • Objective function

Arrival time for prim-output
Arrival time for prim-input
Arrival time constraints
Slack upper bound
Slack constraints
Slack non-negative
13
Outline
  • Motivation
  • Problem Formulations
  • Chip-level Vdd-level Assignment Algorithm
  • for mixed length wire segments, Hu et al,
    DAC06
  • Network Flow Based Vdd Level Assignment
    Formulation
  • Overview of network flow based timing slack
    budgeting
  • Primal-dual reformulation
  • Experimental Results
  • Conclusions

14
Network Flow Based Timing Slack Budgeting
  • Motivated by Ghiasi, ICCAD04 for logic level
    optimization
  • Step 1 Reorganize objective function
  • Step 2 Eliminate timing slack variables (by
    substitution)

15
Network Flow Based Timing Slack Budgeting (cont.)
  • Step 3 Reorganize objective function by timing
    nodes
  • Step 4 Generate dual-problem

Constant terms, remove
Constant coefficients
Node by node
Edge by edge
Edge by edge
Node by node
16
Link Induced Network from Timing Graph
Flow in backward arch (dot segments)
Flow in forward arch (solid segments)
Demand in node i
  • No negative weight cycle exists in the induced
    network. A min-cost flow can be found for sure!
  • A shortest path based algorithm is used to
    produce the solution for primal problem.

17
Outline
  • Motivation
  • Problem Formulations
  • Chip-level Vdd-level Assignment Algorithm
  • for mixed length wire segments, Hu et al,
    DAC06
  • Network Flow Based Formulation
  • Experimental Results
  • Conclusions

18
Experimental Setting
  • Cluster-based Island Style FPGA Structure
  • Size-10 cluster and size-4 LUT
  • 100 buffered interconnects, subset switch block
  • 60 length-4 and 40 length-8l wire segments
  • 25x buffer for length-4 and 10x buffer for
    length-8
  • ITRS 100nm technology, 1.3v for VddH and 0.8v for
    VddL
  • Use VPR Betz-Rose-Marquardt for placement and
    routing
  • Use fpgaEva-LP2 Lin et al, FPGA05 for power
    calculation
  • Considering short-circuit power, glitch power and
    input vector
  • 8 average error compared to SPICE simulation
  • 20 biggest sequential MCNC benchmarks are tested
  • Use LPsolver to solve LP

19
Dual-Vdd Assignment for FPGAs with Mixed Wire
Segments
  • Both LP-based and Netflow-based algorithm
    achieves 85 VddL assignment on average.

20
Interconnect Power Reduction
52 total interconnect power reduction is
achieved!
21
Runtime comparison
More significant speedup is expected for larger
circuits.
Netflow based algorithm gets consistent speedup
and stable runtime
22
Outline
  • Motivation
  • Problem Formulations
  • Chip-level Vdd-level Assignment Algorithm
  • for mixed length wire segments
  • Network Flow Based Formulation
  • Experimental Results
  • Conclusions

23
Conclusions
  • A min-cost network flow based timing budgeting
    formulation which speedups up the budgeting
    procedure and the overall design flow up to 6000x
    and 20x, respectively, compared to LP based one.
  • Both chip-level dual-Vdd assignment algorithms
    are for mixed length wire segment. Experimental
    results show an interconnect power reduction of
    53 on average compared to single-Vdd FPGA
    designs.

24
  • Thank you!
  • Q/A
Write a Comment
User Comments (0)
About PowerShow.com