Virtual Embedded Blocks: A Methodology for Evaluating Embedded Elements in FPGAs - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Virtual Embedded Blocks: A Methodology for Evaluating Embedded Elements in FPGAs

Description:

Chun Hok Ho, Philip Leong, Wayne Luk. Imperial College London. S. Wilton and S. Lopez-Buedo. University of British Columbia, Universidad Aut noma de Madrid ... – PowerPoint PPT presentation

Number of Views:102
Avg rating:3.0/5.0
Slides: 18
Provided by: phwl6
Category:

less

Transcript and Presenter's Notes

Title: Virtual Embedded Blocks: A Methodology for Evaluating Embedded Elements in FPGAs


1
Virtual Embedded BlocksA Methodology for
Evaluating Embedded Elements in FPGAs
  • Chun Hok Ho, Philip Leong, Wayne Luk
  • Imperial College London
  • S. Wilton and S. Lopez-Buedo
  • University of British Columbia, Universidad
    Autónoma de Madrid

2
Motivation
  • Coarse-grained elements common in FPGA devices
  • Memories, multipliers, DSP, uP
  • Effect of other blocks?
  • FPUs, barrel shifter, function generator

3
This work
  • Methodology to study the effects of embedded
    elements in FPGAs.
  • High level estimation of area and delay of an
    application.
  • Model embedded blocks using LC resources
    virtual embedded block (VEB).
  • Explore effects of changing area/delay of VEB
    parameters.

4
Virtual Embedded Blocks
  • Area matched by instantiating same area of logic
    cells
  • Delay matched by introducing combinatorial delays
    using logic cells
  • Position matched by specifying placement
    constraints
  • Can be used with most of the development tools
    (commercial or otherwise)
  • Requires timing analysis tool and floorplanner
  • Retiming

5
VEB design flow (generic)
6
Area Model
  • Use logic cells to model real ASIC embedded
    blocks
  • Estimate LC area (normalised against feature
    size) from die photos

Xilinx Virtex II XQR2V3000 (0.12 mm, 8
metal,16x16mm)
Xilinx Virtex II XC2V1000 (0.12 mm, 8 metal,
9.7x9.7mm)
7
Area Model
  • Estimates of logic cell area include
    configuration bit, buffer and interconnect.
  • Normalised to feature size
  • Area model can be used to compare logic cell area
    to any embedded block
  • Logic Cell (LC) 442, 000 1 LC
  • Multiplier 2, 751, 000 (normalised) 6 LC

8
Delay Model
  • Match delays using LC
  • Used adder carry chain to model delay
  • Delay chain formula
  • Use timing analysis tools to verify
  • For small blocks, may fail to match both area and
    delay

9
VEB design flow (vendor specific)
Benchmark Circuit
FPGA toolchain SynthesisPlace and RouteTiming
analysis
System Performancetiming, area
VEB (matched area and timing) in RPM
(Relationally Placed Macro)
synthesis Synplicity Synplify floorplan, place
and route Xilinx ISE 7.1i device Xilinx
Virtex II XC2V6000-6-FF1152 FPGA
10
Embedded Multiplier
  • Verify VEB approach
  • Compare a VEB one with an existing embedded
    multiplier
  • Explore the speedup by increasing the performance
    of embedded multiplier (EM)
  • Normalise the result against original EM

11
Benchmarks
  • Large unsigned multiplier (mul34, mul68, mul136)
  • BGM interest rate model via Monte Carlo
    simulation
  • A good example for retiming because of its
    complexity
  • Fly compiler generated
  • Allow a single description to generate a circuit
    with different arithmetic operations
  • Digital sine cosine generator (dscg)
  • ordinary differential equation solver (ode)
  • 3x3 matrix multiplication (mm3)
  • Finite impulse response filter (fir4)
  • Butterfly stage for discrete Fourier transform
    (bfly)
  • 18-bit fixed point and double precision floating
    point circuits generated

12
Verification of VEB using EM
Difference at most 11
13
System performance vs EM Performance (bgm)
14
Embedded FPU
  • Evaluate effect of embedding FPU on application
    performance
  • FPU delay and area based on Blue Gene data
  • 4.26 mm2 570 LCs
  • 700MHz, 5 stages pipeline
  • For FPGAs, reduce latency and clock frequency by
    a factor of 5, 140 MHz, one clock cycle latency
  • Explore the speedup by increasing the performance
    of FPU
  • Test on floating-point butterfly circuit (bfly)

15
System Performance for Different Benchmarks
speedup due to reduced clock cycles in the
embedded FPU
16
System performance vs FPU performance (bfly)
17
Conclusion
  • Methodology for estimating the effects of
    introducing embedded blocks to existing FPGA
    devices
  • Allow commercial EDA tools to be used
  • Do not require actual implementation of FPGA
  • Predict impact of embedded FPU
  • 3.7 times area reduction and speedup of 4.4
  • Future work
  • Modelling connection costs
  • More extensive benchmarks
Write a Comment
User Comments (0)
About PowerShow.com