A Survey of Logic Block Architectures - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

A Survey of Logic Block Architectures

Description:

... the 'issues' in reconfigurable fabric design for compute intensive applications ... involved in making a fabric to accelerate multimedia reconfigurable ... – PowerPoint PPT presentation

Number of Views:122
Avg rating:3.0/5.0
Slides: 74
Provided by: MustafaI7
Category:

less

Transcript and Presenter's Notes

Title: A Survey of Logic Block Architectures


1
A Survey of Logic Block Architectures
  • For Digital Signal Processing Applications

2
Presentation Outline
  • Considerations in Logic Block Design
  • Computation Requirements
  • Why Inefficiencies?
  • Representative Logic Block Architectures
  • Proposed
  • Commercial
  • Conclusions What is suitable Where?

3
Why DSP??? ?The Context
  • Representative of computationally intensive class
    of applications ? datapath oriented and
    arithmetic oriented
  • Increasingly large use of FPGAs for DSP ?
    multimedia signal processing, communications, and
    much more
  • To study the issues in reconfigurable fabric
    design for compute intensive applications ? What
    is involved in making a fabric to accelerate
    multimedia reconfigurable computing possible?

4
Elements of a Reconfigurable Architecture
  • Logic Block/Processing Element
  • Differing Grains FinegtgtCoarsegtgtALUs
  • Routing
  • Dynamic Reconfiguration

5
So whats wrong with the typical FPGA?
  • Meant to be general purpose ? lower risks
  • Toooo Flexible! ? Result Efficiency Gap
  • Higher Implementation Cost, Larger Delay, Larger
    Power Consumption than ASICs
  • Performance vs. Flexibility Tradeoff ? Postponing
    Mapping and Silicon Re-use

6
Solution? See how FPGAs are Used?
  • FPGAs are being used for classes of
    applications ? Encryption, DSP, Multimedia etc.
  • Here lies the Key ? Design FPGAs for a class of
    applications
  • Application Domain Characterization ? Application
    Domain Tuning

7
Domain Specialization
  • COMPUTATION ? defines ? ARCHITECTURE
  • Target Application Characteristics known
    beforehand? Yes
  • Characterize the application domain
  • Determine a balance b/w flexibilty vs efficiency
  • Tune the architecture according

8
Categorizing the Computation
  • Control ? Random Logic Implementation
  • Datapath ? Processing of Multi-bit Data
  • Conflicting Requirements???

9
Datapath Element Requirements
  • Operates on Word Slices or Bit Slices
  • Produces multi-bit outputs
  • Requires many smaller elements to produce each
    bit output ? i.e. multiple small LUTs

10
Control Logic Requirements
  • Produces a single output from many single bit
    inputs
  • Benefits from large grain LUT as logic levels
    gets reduced

11
Logic Block Design Considerations
  • How much of what kinds of computations to
    support?
  • Tradeoff Generality vs Specialization

12
How much of What? ?Applications benchmarking
13
So what do we have to support?
  • Datapath functionality, in particular arithmetic,
    is dominant in DSP.
  • The datapath functions have different bit-widths.
  • DSP designs heavily use multiplexers of various
    size. Thus, an efficient mapping of multiplexers
    should be supported.
  • DSP functions do contain random logic. The amount
    of random logic varies per design.
  • Some DSP designs use wide boolean functions.

14
DSP Building Blocks
  • Some techniques widely used to achieve area-speed
    efficient DSP implementations
  • Bit Serial Computations
  • Routing Efficient
  • Bit Level Pipelining Increases throughput even
    more
  • Digit Serial Computation
  • Combining Area efficiency of bit-serial and
    with Time efficiency of Bit-parallel

15
Classes of DSP-optimized FPGA Architectures
  • Architectures with Dedicated DSP Logic
  • Homogeneous
  • Hetrogeneous
  • Globally Homogeneous, Locally Heterogenous
  • Architectures of Coarser Granularity
  • With DSP Specific Improvements (e.g. Carry
    Chains, Input Sharing, CBS)

16
Some Representative Architectures
17
Bit-Serial FPGA with SR LUT
  • Bit-serial paradigm suites the existing FPGA so
    why not optimize the FPGA for it!
  • Logic block to support efficient implementation
    of bit-serial data path and bit-level pipelining
  • LUTs can be used for combinational logic as well
    as for Shift Registers

18
A Bit-Serial Adder
A Bit-Serial Adder which processes two bits at a
time
Interface Block Diagram
19
A Bit-Serial Multiplier Cell
20
The Proposed Bit Serial Logic Block Architecture
  • 4x4-input LUTs and 6 flip-flops.
  • The two multiplexers in front of the LUTs are
    targeted mainly for carry-save operations which
    are frequently used in bit-serial computations.
  • There are 18 signal inputs and 6 signal outputs,
    plus a clock input.
  • Feed-back inputs c2, c3, c4, c5 can be connected
    to either GND or VDD or to one of the 4 outputs
    d0, d1, d2, d3. Therefore, each LUT can implement
    any 4-input functions controlled by inputs a0,
    a1, a2, a3 or b0, b1, b2, b3.
  • Programmable switches connected to inputs a4 and
    b4 control the functionality of the four
    multiplexers at the output of LUTs. As a result,
    2 LUTs can implement any 5-input functions.
  • The final outputs d0, d1, d2, d3 can either be
    the direct outputs from the multiplexers or the
    outputs from flip-flops. All bit-serial operators
    use the outputs from flip-flops therefore the
    attached programmable switches are actually
    unnecessary. They are only present in order to
    implement any other logic functions other than
    bit-serial datapath circuits.
  • Two flip-flops are added (inputs c0 and c1) to
    implement shift registers which are frequently
    used in bit-serial operations.

21
The Modified LUT Implementing a Shift Register
22
Performance Results
23
Digit-Serial Logic Block Architecture
  • DigitSerial Architectures process one digit (N4
    bits) at a time
  • They offer area efficiency similar to bit-serial
    architectures and time-efficiency close to
    bit-parallel architectures
  • N4 bits can serve as an optimal granularity for
    processing larger digit sizes (N8,16 etc)

24
Digit-Serial Building Blocks
A Digit-Serial Adder
A Digit-Serial Unsigned Multiplier
25
Digit-Serial Building Blocks
A Pipelined Digit-Serial Unsigned Multiplier For
Y8 bits
26
Digit-Serial Signed Multiplier Blocks
Middle Stages Module
First Stage Module
Last Stage Module
27
Signed Digit-Serial Multiplier
A Digit-Serial Signed Booths Pipelined
Multiplier with Y8
28
Proposed Digit-Serial Logic Block
29
Detailed Structure of Digit-Serial Logic Block
30
The Basic Logic Module (LM)
Table of Functions Implemented
The Structure of the LM
31
Examples of Implementations
N4 Unsigned Multiplier
N4 Signed Multiplier
Two N2 Multipliers
Bit-Level Pipelined
32
Area Comparison with Xilinx 4000 Series
33
Mixed-Grain Logic Block Architecture
  • Exploits the adder inverting property
  • Efficiently implements both datapath and random
    logic in the same logic block design

34
Adder Inverting Property
Full Adder and Equations Showing The Inverting
Property
An optimal structure derived from the property
35
LUT Bits Utilization in Datapath and Logic Modes
36
Structure of a Single Slice
37
Complete Logic Block
38
Modified ALU Like Functionality
39
Comparison Results
40
Comparison Results (Cont)
41
Comparison Results (cont)
42
Coarser ALU Like Architectures
43
CHESS Architecture
44
CHESS ALU Based Logic Block
45
Structure of a Switch Box
46
Comparison Results
47
Computation Field Programmable Architecture
  • A Heterogeneous architecture with cluster of
    datapath logic blocks
  • Separate LUT Based Logic Blocks for supporting
    random logic mapping
  • Basic Logic Block called a Partial Adder
    Subtraction Multiplier (PASM) Module

48
PASM Logic Block of CFPA
49
Cluster of PASM Logic Blocks
50
Comparison Results
51
Some Industry Architectures Designs
52
Altera APEX II Logic Element
53
Altera MAX II Logic Element
54
LE Configuration in Arithmetic Mode
55
LE in Random Logic Implementation
56
Altera Stratix Logic Element
57
Altera Stratix II Architecture
58
Stratix II Adaptive Logic Module
59
Stratix II ALM in Arithmetic Mode
60
Various Configurations in an ALM of Stratix II
61
Multiplier Resources in Stratix II
62
Structure of a DSP Block in Stratix II
63
XILINX Virtex II Pro Architecture
64
Basic Logic Element of Virtex II Pro
65
Dedicated Multipliers in Virtex II Pro
66
Processor-Programmable Logic Coupled Architecture
67
PiCoGA Architecture Coupled with a VLIW processor
68
PiCoGA Logic Block
69
Conclusions
  • Traditional general purpose FPGA inefficient for
    data path mapping
  • Logic blocks with DSP specific enhancements seem
    a promising solution
  • Coarse Grained Logic can achieve better
    application mapping for data path but sacrifice
    flexibility
  • Dedicated Blocks (Multipliers) increase
    performance but also increases cost significantly

70
Conclusions
  • PDSPs with embedded FPGA can achieve a good
    balance between performance and power consumption
  • SoWhich approach is the best? ? No single best
    exists

71
Suitability of Approaches
  • Highly computationally intensive applications
    with large amounts of parallelism can use
    platform FPGAs where often large resources are
    required and power consumption is not an issue.
  • Here cost/function will be lowest

72
Suitability of Approaches
  • Field Programmable Logic based coprocessors can
    benefit from coarse grained blocks where most
    control functions are implemented by the PDSP
    itself

73
Suitability of Approaches
  • Higher flexibility and lower cost can be achieved
    with logic blocks with DSP specific enhancements
    but flexibility to implement control logic in an
    efficient manner.
Write a Comment
User Comments (0)
About PowerShow.com