Logic Emulation - PowerPoint PPT Presentation


Title: Logic Emulation


1
Part III
  • Logic Emulation

2
What is a Logic Emulation System?
  • 1. A programmable hardware built with
    programmable logic (FPGA) and programmable
    interconnect devices (PID).
  • 2. A software which automatically programs the
    hardware according to the circuit under design
  • 3. Control HW/SW to support operation of the
    emulated design as a hardware component operating
    in real time.

3
Typical Logic Emulation Environment
Compiler, runtime software
Stimulus generator, logic analyzer
4
Why we need Logic Emulation?
  • Design verification issues.
  • Real-time operation.
  • System-level testing.
  • Rapid prototyping.

5
Design Verification Issues
  • Simulation-based verification methods have run
    out of steam when chip complexity grows.
  • Emulation is a verification technology that grows
    along with design size.

6
Real-Time Operation
  • Simulation requires test vector development which
    is costly and difficult.
  • Verification depends on test vector correctness.
  • Certain applications must be verified in real
    time - human perception audio and video.
  • Emulation connected to actual hardware can run
  • real diagnostic code,
  • operating systems, and
  • applications.

7
System-Level Testing
  • Often the chip meets its specifications but it
    fails in the system.
  • We have to verify the system-level interactions
    between the chip and other components. They are
    hard to formalize.
  • Internal probing is impossible when the chip is
    fabbed and placed in a system
  • But it is possible using emulation.

8
Rapid Prototyping
  • Once emulated design is debugged it is available
    for immediate use by software developers for
    software debugging.
  • Emulated design is available for demo and
    experiments with architecture on real
    applications and data.

9
Programmable Hardware includes programmable
interconnect
10
Considerations for programmable interconnect
  • The capacity of logic and interconnection depends
    on package constraints.
  • This forces a hierarchical system.
  • Chips gt boards gt boxes gt system
  • The interconnect structure must
  • 1. Provide successful connectivity,
  • 2. Maximize FPGA utilization, and
  • 3. Minimize delay and skew.
  • Rents rule applies to predict the interconnect
    needs.

11
Structures of Multi-FPGA Systems
  • Topologies
    - Mesh - nearest neighboring.

    - Crossbar - full and partial.
  • Interconnect scheme
    - Circuit switched.
    - Time multiplexed.

12
Nearest Neighbor Interconnection
13
Advantages and Disadvantages of Nearest Neighbor
Interconnection
  • Advantages
  • Uniform all chips the same.
  • Easy to lay out on PCB.
  • Disadvantages
  • Routing is easily blocked.
  • The through pins limit the logic utilization of
    FPGAs.
  • Long and unpredictable delays.
  • No natural hierarchical extension.

14
Nearest Neighbor Extensions
Connect to non-neighbors
Add more neighbors
15
Advantages and Disadvantages of nearest-neighbor
extended architectures
  • Advantages
  • More choices for router by adding diagonal lines
    skip lines.
  • Disadvantages
  • More complex PCB.
  • More complex routing software.

16
Partial Crossbar Interconnect
Logic blocks
Crossbars
B pins
C pins
D pins
A pins
Second-level crossbars
17
Partial Crossbar Interconnect
  • Partial crossbar consists of a set of small full
    crossbars,
  • connected to logic blocks
  • but not to each other.
  • I/O pins of each FPGA are divided into subsets.
  • Each subset is connected by a full crossbar
    circuit switch.
  • Partial crossbar is a potentially blocking
    network.

18
Characteristics of Partial Crossbar Architecture
  • Partial crossbars size is proportional to the
    number of FPGA pins.
  • All interconnections go through one/three
    crossbar chips for a one-level/two-level partial
    crossbar interconnect
  • delays are uniform and bounded.

19
Mixed Full and Partial Crossbar
External connections
Partial crossbar
Full crossbar
20
Circuit Switched versus Time Multiplexed
Interconnect Schemes
  • Trade-offs between the operating speed and the
    hardware cost.
  • Time-multiplexing method
  • can greatly expand available interconnect.
  • allows lower cost IC package and PCB.
  • makes partitioning easier.
  • BUT
  • System power increases due to frequent signal
    switching (higher hardware cost).
  • Complex scheduling software.
  • Slow operating speed.

21
Virtual Wires
FPGA
FPGA
Logical outputs
Logical inputs
Physical wires
FPGA
FPGA
DeMux
Mux
I change space to time
22
Logic Emulation Systems and their interconnection
schemes
  • System with mesh topology - Quickturns RPM and
    Virtual Machine Works (IKOS).
  • System with partial crossbar - Quickturns
    Enterprise, Mars, and System Realizer.
  • System with mixed full and partial crossbar -
    Aptix Prototyping System.
  • System using time-multiplexed interconnect -
    Virtual Machine Works (IKOS) , CoBALT and Arkos
    (Quickturn).

23
Memory Solutions in Emulators and future
devices/systems
  • Goal programmable memories with different
    width/depth/port combinations.
  • FPGA-based memories
  • inefficient of using logic resources.
  • timing correctness is difficult to be insured.
  • large or highly multi-ported memories must be
    partitioned across several FPGAs.
  • SRAMs with dedicated or programmable controllers.

24
Logic Emulation Design Flow
25
Logic Emulation Design Compiler and its components
  • Logic emulation design compiler is a large and
    complex EDA tool which includes
  • Front-end design importer.
  • HDL-based synthesizer.
  • Clock and timing analyzer.
  • Partitioner.
  • System-level placer and router.
  • FPGA-based placer and router.

26
Objectives of logic emulation compiler
  • Fast compilation time.
  • Fast emulation clock.
  • Timing correctness.
  • Easy (ECO ENGINEERING Change Order).
  • Minimize circuit size.

27
Design Considerations for Logic Emulators
  • HDL synthesis
  • Trade-off run-time and quality.
  • CLB-based vs. gate-based designs.
  • Clock and timing analysis
  • Timing correctness, hold-time violation free.
  • Clock skew minimization.
  • Partitioning
  • Run time.
    -
  • Timing and area.

28
Design Considerations for Logic Emulators
  • System placement and routing
  • Timing.
  • Completeness of routing.
  • FPGA-based placement and routing
  • Fast run time.
  • Parallel compilation.

Remember you emulate not the same logic as your
design
29
Hold-Time Violation
Clock distribution problem (Skew)!!!
Hold-time violation occurs when Routing delay gt
LUT delay!!!
30
Timing Correctness
Delay insertion
Delay element
CLB
Routing delay
31
Timing Correctness
Use clock enables for gated clocks
Q
Q
D
D
LUT
CK
CK
CE
CLB
Clock path
Primary clock
Low-skew net
32
Methodology and components of Logic Emulator
System
  • Pre-configuration preparation - prepare netlists
    and control files for configuration.
  • Testbed preparation - prepare emulation-based
    operation environment.
  • Full-chip configuration - download design to the
    emulator.
  • In-circuit emulation - test the design.

33
Pre-Configuration in Emulator System
  • Translate the leaf-cell libraries into emulation
    primitives.
  • Translated libraries must be verified for
    functional equivalence to original.
  • Modify and redesign some components to attain
    compatibility with emulation techniques, such as
    precharge logic circuits.
  • Assemble all the gate-level netlists for the
    entire design.

34
Testbed in Logic Emulator
  • Design and implement the target ICE board
    combining the emulated design with real hardware.
  • Slowdown testbed to emulation speed.
  • Assemble the testbed and emulation equipment.

35
Full-Chip Configuration In-Circuit Emulation
  • Full-chip configuration
  • Prepare control files.
  • Partition the design to fit into the emulation
    system.
  • Download design into the system.
  • Verify that the emulation model faithfully
    implements the design as specified by RTL.
  • In-circuit emulation

36
Part IV
  • Reconfigurable Computing and Systems

37
General-Purpose Computing vs. Custom Computing
  • General-purpose computing - applying applications
    on a general-purpose computer.
  • Custom computing - applying applications on a
    custom-made application-specific hardware.
  • Field-programmable devices make this into a
    reality.

38
Goals of Reconfigurable Computing
  • Tailor the architecture to the application.
  • Minimize or eliminate instruction interpretation.
  • Exploit fine grained parallelism.
  • Map software to hardware.

39
Applications of reconfigurable computing
  • Database search and analysis.
  • Image processing and machine vision.
  • Data compression.
  • Signal processing.
  • Neural networks.
  • Biology computing.
  • Medical computing.
  • Design Automation (PSU)
  • Many more.

40
Multi-Mode Systems map various applications to a
reconfigurable system
ROM
Reconfigurable system
Application 1
Application 2
  • Different configurations for read write
  • operations of a tape driver (Honeywell).
  • Different configurations for different
  • printer controllers (Tektronix).

41
Run-Time Reconfiguration in military image
recognition system
Jeep?
Image data
I/O
?
Tank?
  • Break single computation into multiple pieces.
  • Page in components as needed (virtual
    hardware),
  • ex., automatic target recognition.

42
Custom Computing
  • Application-specific systems.
  • Numerous applications for similar reconfigurable
    systems.
  • Offers hardware performance, flexibility to
    handle numerous algorithms.
  • Multi-FPGA systems can be viewed as hardware
    supercomputers.

Tell about DEC Perle
43
Reconfigurable Co-processors
Program 2
Inst2
- Provide custom instructions on a
per-application basis.
44
Types of Reprogrammable Systems
Three ways to attach custom computing units
Attached processing unit
PU processing Unit
45
Types of Reprogrammable Systems
  • Attached and standalone processing units are
    reprogrammable systems on computer add-on cards
    and separate reprogrammable cabinets.
  • Considerations large communication overhead may
    over-shadow the speed gain.
  • Application-specific coprocessors can achieve
    significant improvement over a wide range of
    applications.

46
Types of Reprogrammable Systems
  • Integrate the reprogrammable logic into the
    processor itself.
  • A reprogrammable functional unit can be
    configured on a per-algorithm basis.
  • Providing some special-purpose instructions
    tailored to the needs of a given application.

47
Architectures of Multi-FPGA (Reconfigurable)
Systems
  • The most commonly used topologies
  • Mesh 1D (linear array), 2D, and 3D.
  • Crossbar full, partial, mixed, and
    hierarchical.
  • Hybrid between mesh and crossbar.
  • Application-specific architecture.

48
Hybrid Topology of a reconfigurable system
Splash 2 augments a linear array of FPGAs with
a crossbar switch. Goal Supporting
systolic circuits.
49
Hybrid Topology
Host interface
Anyboard A linear array of FPGAs augmented
by global buses.
50
Hybrid Topology
RAM
Host interface
RAM
4 X 4 mesh of FPGAs
RAM
RAM
DECPeRLe-1 a 4 X 4 mesh of FPGAs augmented
with shred global buses.
51
Application-Specific Topology of MARC-1, one
subsystem
Connections to other FPGAs
1
4
5
2
3
1
3
4
5
2
1
4
3
2
5
1
The Marc-1 subsystem 1.
52
Application-Specific Topology of Marc-1, cont.
  • Application in circuit simulation where the
    program to be executed can be optimized on a
  • per-run basis.
  • This is done for values constant within that
    run,
  • but which may vary from dataset to dataset.

1
The Marc-1
2
3
4
5
53
Application-Specific Topology
RAM
FPGA
FPGA
FPGA
RAM
RAM
The RM-nc system neural network.
54
Architecture for Computer Prototyping
VME bus
FPGA
FPGA
FPGA
Cache memory
FPGA
FPGA
FPGA
Register file
FPGA
ALU
FPU
The Mushroom processor prototyping system.
55
Expandable Topologies
  • Hierarchical crossbar topology can be expanded
    by adding extra level.
    - Quickturn systems.
  • Expandable mesh topology can be expanded by
    connecting individual boards to form a large
    mesh.
  • The Virtual Wires Emulation System (IKOS).

56
Topology for Adapting Other Components
  • Many multi-FPGA systems include non-FPGA
    resources to provide more general purpose
    solutions.
  • The MORRPH system - sockets next to FPGAs which
    allow to add arbitrary devices to the array.
  • The G800 board - contains two FPGAs and four
    sockets.

57
Topology for Adapting Other Components
  • The COBRA system
  • Contains
  • based modules (expanding to 2D mesh),
  • RAM modules,
  • I/O modules,
  • and bus modules.
  • The Springbok system
  • a pre-made daughter board which is able to
    contain an arbitrary device (on the top) and an
    FPGA (on the bottom).
  • Daughter boards are mounted on a baseplate.

58
Topology for Adapting Other Components
  • The Quickturn systems - external component
    adapters.
  • The Aptix FPCB - a reprogrammable PCB.

59
Design Methodology for general-purpose
configurable systems
Mapping
60
Typical Software Methodology for general-purpose
configurable systems
61
Typical Software Methodology for general-purpose
configurable systems
62
Considerations for such complex software systems
  • Architectural-specific design tasks.
  • Design automation process.
  • The mapping time dominates the setup time for
    operating the system.
  • Run-time reconfigurability.

63
Design Specification and Languages for
reconfigurable software systems
  • Standard software programming languages,
  • e.g., C, C, FORTRAN, and assembly language,
    vs. HDLs.
  • Standard software programming languages - a
    sequential execution model.
  • HDLs - a parallel execution model.
  • Who will use it and which one is more suitable
    for system description???

64
Compilation Issues
  • Translate code from software languages into
    hardware without losing the inherent concurrency
    of hardware.
  • Compiler techniques for parallelizing code.
  • Straight-line code, control flow, and loops.
  • Transmogrifier C compiler.

65
System-level and High-level Synthesis
  • System-level design evaluation and analysis.
  • Design estimation.
  • Hardware-software partitioning.
  • Interface synthesis.
  • RTL synthesis.
  • Logic synthesis and technology mapping.

66
Partitioning and Placement
  • Topology-aware partitioning methods.
  • Partitioning onto a multi-FPGA system is
    equivalent to a placement problem.
  • Logic utilization and timing.

67
Pin Assignment and Routing
  • Pin-assignment - the process of determining which
    I/O pins to be used for each inter-FPGA signal.
  • Pin-assignment for a pre-fabricated multi-FPGA
    system is equivalent to the global routing
    problem.
  • Pin-assignment will greatly affect the quality of
    FPGAs logic utilization and routability.

68
Run-Time Reconfigurability
This is a new issue in system design how much of
the processor is virtual, when to reconfigure?
  • Virtual hardware ltgt virtual memory. What are
    their relations? Artificial Intelligence,
    robotics. Vision.
  • Hardware on demand.
  • What is the Initial Un-configured structure?What
    are the reconfiguring methods.
  • Software supporting time-varying mapping.
  • Many open problems need to be solved in the forth
    coming years.

69
Applications Splash 2
  • Stream oriented systolic and SIMD applications.
  • Scalable linear array of 16 to 256 processing
    elements (1 XC4010 with 1/2 Mbyte).
  • VHDL based.
  • Sequence comparison - 2300M0.75M cell
    updates/sec (Splash 2Sparc 10).
  • Edge detection - 10M242K pixels/sec (Splash
    2Sparc 10).

70
Applications PAM (DEC)
  • Programmable Active Memory (PAM).
  • C based and mesh arrays of XC3090 (DECPeRLe-1).
  • Applications
  • Multiple precision arithmetic.
  • RSA encryption.
  • Video compression (JPEG, MPEG, DCT). -
  • High energy physics.
  • Telecommunications.

71
Sources of some slides
  • Peter Alfke
  • Xilinx, Inc
  • peter.alfke_at_xilinx.com
View by Category
About This Presentation
Title:

Logic Emulation

Description:

many sources used ... Logic Emulation What is a Logic Emulation System? 1. A programmable hardware built with programmable logic (FPGA) and programmable interconnect ... – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Date added: 2 November 2020
Slides: 72
Provided by: MarekPe2
Learn more at: http://web.cecs.pdx.edu
Category:
Tags: cobra | emulation | logic

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Logic Emulation


1
Part III
  • Logic Emulation

2
What is a Logic Emulation System?
  • 1. A programmable hardware built with
    programmable logic (FPGA) and programmable
    interconnect devices (PID).
  • 2. A software which automatically programs the
    hardware according to the circuit under design
  • 3. Control HW/SW to support operation of the
    emulated design as a hardware component operating
    in real time.

3
Typical Logic Emulation Environment
Compiler, runtime software
Stimulus generator, logic analyzer
4
Why we need Logic Emulation?
  • Design verification issues.
  • Real-time operation.
  • System-level testing.
  • Rapid prototyping.

5
Design Verification Issues
  • Simulation-based verification methods have run
    out of steam when chip complexity grows.
  • Emulation is a verification technology that grows
    along with design size.

6
Real-Time Operation
  • Simulation requires test vector development which
    is costly and difficult.
  • Verification depends on test vector correctness.
  • Certain applications must be verified in real
    time - human perception audio and video.
  • Emulation connected to actual hardware can run
  • real diagnostic code,
  • operating systems, and
  • applications.

7
System-Level Testing
  • Often the chip meets its specifications but it
    fails in the system.
  • We have to verify the system-level interactions
    between the chip and other components. They are
    hard to formalize.
  • Internal probing is impossible when the chip is
    fabbed and placed in a system
  • But it is possible using emulation.

8
Rapid Prototyping
  • Once emulated design is debugged it is available
    for immediate use by software developers for
    software debugging.
  • Emulated design is available for demo and
    experiments with architecture on real
    applications and data.

9
Programmable Hardware includes programmable
interconnect
10
Considerations for programmable interconnect
  • The capacity of logic and interconnection depends
    on package constraints.
  • This forces a hierarchical system.
  • Chips gt boards gt boxes gt system
  • The interconnect structure must
  • 1. Provide successful connectivity,
  • 2. Maximize FPGA utilization, and
  • 3. Minimize delay and skew.
  • Rents rule applies to predict the interconnect
    needs.

11
Structures of Multi-FPGA Systems
  • Topologies
    - Mesh - nearest neighboring.

    - Crossbar - full and partial.
  • Interconnect scheme
    - Circuit switched.
    - Time multiplexed.

12
Nearest Neighbor Interconnection
13
Advantages and Disadvantages of Nearest Neighbor
Interconnection
  • Advantages
  • Uniform all chips the same.
  • Easy to lay out on PCB.
  • Disadvantages
  • Routing is easily blocked.
  • The through pins limit the logic utilization of
    FPGAs.
  • Long and unpredictable delays.
  • No natural hierarchical extension.

14
Nearest Neighbor Extensions
Connect to non-neighbors
Add more neighbors
15
Advantages and Disadvantages of nearest-neighbor
extended architectures
  • Advantages
  • More choices for router by adding diagonal lines
    skip lines.
  • Disadvantages
  • More complex PCB.
  • More complex routing software.

16
Partial Crossbar Interconnect
Logic blocks
Crossbars
B pins
C pins
D pins
A pins
Second-level crossbars
17
Partial Crossbar Interconnect
  • Partial crossbar consists of a set of small full
    crossbars,
  • connected to logic blocks
  • but not to each other.
  • I/O pins of each FPGA are divided into subsets.
  • Each subset is connected by a full crossbar
    circuit switch.
  • Partial crossbar is a potentially blocking
    network.

18
Characteristics of Partial Crossbar Architecture
  • Partial crossbars size is proportional to the
    number of FPGA pins.
  • All interconnections go through one/three
    crossbar chips for a one-level/two-level partial
    crossbar interconnect
  • delays are uniform and bounded.

19
Mixed Full and Partial Crossbar
External connections
Partial crossbar
Full crossbar
20
Circuit Switched versus Time Multiplexed
Interconnect Schemes
  • Trade-offs between the operating speed and the
    hardware cost.
  • Time-multiplexing method
  • can greatly expand available interconnect.
  • allows lower cost IC package and PCB.
  • makes partitioning easier.
  • BUT
  • System power increases due to frequent signal
    switching (higher hardware cost).
  • Complex scheduling software.
  • Slow operating speed.

21
Virtual Wires
FPGA
FPGA
Logical outputs
Logical inputs
Physical wires
FPGA
FPGA
DeMux
Mux
I change space to time
22
Logic Emulation Systems and their interconnection
schemes
  • System with mesh topology - Quickturns RPM and
    Virtual Machine Works (IKOS).
  • System with partial crossbar - Quickturns
    Enterprise, Mars, and System Realizer.
  • System with mixed full and partial crossbar -
    Aptix Prototyping System.
  • System using time-multiplexed interconnect -
    Virtual Machine Works (IKOS) , CoBALT and Arkos
    (Quickturn).

23
Memory Solutions in Emulators and future
devices/systems
  • Goal programmable memories with different
    width/depth/port combinations.
  • FPGA-based memories
  • inefficient of using logic resources.
  • timing correctness is difficult to be insured.
  • large or highly multi-ported memories must be
    partitioned across several FPGAs.
  • SRAMs with dedicated or programmable controllers.

24
Logic Emulation Design Flow
25
Logic Emulation Design Compiler and its components
  • Logic emulation design compiler is a large and
    complex EDA tool which includes
  • Front-end design importer.
  • HDL-based synthesizer.
  • Clock and timing analyzer.
  • Partitioner.
  • System-level placer and router.
  • FPGA-based placer and router.

26
Objectives of logic emulation compiler
  • Fast compilation time.
  • Fast emulation clock.
  • Timing correctness.
  • Easy (ECO ENGINEERING Change Order).
  • Minimize circuit size.

27
Design Considerations for Logic Emulators
  • HDL synthesis
  • Trade-off run-time and quality.
  • CLB-based vs. gate-based designs.
  • Clock and timing analysis
  • Timing correctness, hold-time violation free.
  • Clock skew minimization.
  • Partitioning
  • Run time.
    -
  • Timing and area.

28
Design Considerations for Logic Emulators
  • System placement and routing
  • Timing.
  • Completeness of routing.
  • FPGA-based placement and routing
  • Fast run time.
  • Parallel compilation.

Remember you emulate not the same logic as your
design
29
Hold-Time Violation
Clock distribution problem (Skew)!!!
Hold-time violation occurs when Routing delay gt
LUT delay!!!
30
Timing Correctness
Delay insertion
Delay element
CLB
Routing delay
31
Timing Correctness
Use clock enables for gated clocks
Q
Q
D
D
LUT
CK
CK
CE
CLB
Clock path
Primary clock
Low-skew net
32
Methodology and components of Logic Emulator
System
  • Pre-configuration preparation - prepare netlists
    and control files for configuration.
  • Testbed preparation - prepare emulation-based
    operation environment.
  • Full-chip configuration - download design to the
    emulator.
  • In-circuit emulation - test the design.

33
Pre-Configuration in Emulator System
  • Translate the leaf-cell libraries into emulation
    primitives.
  • Translated libraries must be verified for
    functional equivalence to original.
  • Modify and redesign some components to attain
    compatibility with emulation techniques, such as
    precharge logic circuits.
  • Assemble all the gate-level netlists for the
    entire design.

34
Testbed in Logic Emulator
  • Design and implement the target ICE board
    combining the emulated design with real hardware.
  • Slowdown testbed to emulation speed.
  • Assemble the testbed and emulation equipment.

35
Full-Chip Configuration In-Circuit Emulation
  • Full-chip configuration
  • Prepare control files.
  • Partition the design to fit into the emulation
    system.
  • Download design into the system.
  • Verify that the emulation model faithfully
    implements the design as specified by RTL.
  • In-circuit emulation

36
Part IV
  • Reconfigurable Computing and Systems

37
General-Purpose Computing vs. Custom Computing
  • General-purpose computing - applying applications
    on a general-purpose computer.
  • Custom computing - applying applications on a
    custom-made application-specific hardware.
  • Field-programmable devices make this into a
    reality.

38
Goals of Reconfigurable Computing
  • Tailor the architecture to the application.
  • Minimize or eliminate instruction interpretation.
  • Exploit fine grained parallelism.
  • Map software to hardware.

39
Applications of reconfigurable computing
  • Database search and analysis.
  • Image processing and machine vision.
  • Data compression.
  • Signal processing.
  • Neural networks.
  • Biology computing.
  • Medical computing.
  • Design Automation (PSU)
  • Many more.

40
Multi-Mode Systems map various applications to a
reconfigurable system
ROM
Reconfigurable system
Application 1
Application 2
  • Different configurations for read write
  • operations of a tape driver (Honeywell).
  • Different configurations for different
  • printer controllers (Tektronix).

41
Run-Time Reconfiguration in military image
recognition system
Jeep?
Image data
I/O
?
Tank?
  • Break single computation into multiple pieces.
  • Page in components as needed (virtual
    hardware),
  • ex., automatic target recognition.

42
Custom Computing
  • Application-specific systems.
  • Numerous applications for similar reconfigurable
    systems.
  • Offers hardware performance, flexibility to
    handle numerous algorithms.
  • Multi-FPGA systems can be viewed as hardware
    supercomputers.

Tell about DEC Perle
43
Reconfigurable Co-processors
Program 2
Inst2
- Provide custom instructions on a
per-application basis.
44
Types of Reprogrammable Systems
Three ways to attach custom computing units
Attached processing unit
PU processing Unit
45
Types of Reprogrammable Systems
  • Attached and standalone processing units are
    reprogrammable systems on computer add-on cards
    and separate reprogrammable cabinets.
  • Considerations large communication overhead may
    over-shadow the speed gain.
  • Application-specific coprocessors can achieve
    significant improvement over a wide range of
    applications.

46
Types of Reprogrammable Systems
  • Integrate the reprogrammable logic into the
    processor itself.
  • A reprogrammable functional unit can be
    configured on a per-algorithm basis.
  • Providing some special-purpose instructions
    tailored to the needs of a given application.

47
Architectures of Multi-FPGA (Reconfigurable)
Systems
  • The most commonly used topologies
  • Mesh 1D (linear array), 2D, and 3D.
  • Crossbar full, partial, mixed, and
    hierarchical.
  • Hybrid between mesh and crossbar.
  • Application-specific architecture.

48
Hybrid Topology of a reconfigurable system
Splash 2 augments a linear array of FPGAs with
a crossbar switch. Goal Supporting
systolic circuits.
49
Hybrid Topology
Host interface
Anyboard A linear array of FPGAs augmented
by global buses.
50
Hybrid Topology
RAM
Host interface
RAM
4 X 4 mesh of FPGAs
RAM
RAM
DECPeRLe-1 a 4 X 4 mesh of FPGAs augmented
with shred global buses.
51
Application-Specific Topology of MARC-1, one
subsystem
Connections to other FPGAs
1
4
5
2
3
1
3
4
5
2
1
4
3
2
5
1
The Marc-1 subsystem 1.
52
Application-Specific Topology of Marc-1, cont.
  • Application in circuit simulation where the
    program to be executed can be optimized on a
  • per-run basis.
  • This is done for values constant within that
    run,
  • but which may vary from dataset to dataset.

1
The Marc-1
2
3
4
5
53
Application-Specific Topology
RAM
FPGA
FPGA
FPGA
RAM
RAM
The RM-nc system neural network.
54
Architecture for Computer Prototyping
VME bus
FPGA
FPGA
FPGA
Cache memory
FPGA
FPGA
FPGA
Register file
FPGA
ALU
FPU
The Mushroom processor prototyping system.
55
Expandable Topologies
  • Hierarchical crossbar topology can be expanded
    by adding extra level.
    - Quickturn systems.
  • Expandable mesh topology can be expanded by
    connecting individual boards to form a large
    mesh.
  • The Virtual Wires Emulation System (IKOS).

56
Topology for Adapting Other Components
  • Many multi-FPGA systems include non-FPGA
    resources to provide more general purpose
    solutions.
  • The MORRPH system - sockets next to FPGAs which
    allow to add arbitrary devices to the array.
  • The G800 board - contains two FPGAs and four
    sockets.

57
Topology for Adapting Other Components
  • The COBRA system
  • Contains
  • based modules (expanding to 2D mesh),
  • RAM modules,
  • I/O modules,
  • and bus modules.
  • The Springbok system
  • a pre-made daughter board which is able to
    contain an arbitrary device (on the top) and an
    FPGA (on the bottom).
  • Daughter boards are mounted on a baseplate.

58
Topology for Adapting Other Components
  • The Quickturn systems - external component
    adapters.
  • The Aptix FPCB - a reprogrammable PCB.

59
Design Methodology for general-purpose
configurable systems
Mapping
60
Typical Software Methodology for general-purpose
configurable systems
61
Typical Software Methodology for general-purpose
configurable systems
62
Considerations for such complex software systems
  • Architectural-specific design tasks.
  • Design automation process.
  • The mapping time dominates the setup time for
    operating the system.
  • Run-time reconfigurability.

63
Design Specification and Languages for
reconfigurable software systems
  • Standard software programming languages,
  • e.g., C, C, FORTRAN, and assembly language,
    vs. HDLs.
  • Standard software programming languages - a
    sequential execution model.
  • HDLs - a parallel execution model.
  • Who will use it and which one is more suitable
    for system description???

64
Compilation Issues
  • Translate code from software languages into
    hardware without losing the inherent concurrency
    of hardware.
  • Compiler techniques for parallelizing code.
  • Straight-line code, control flow, and loops.
  • Transmogrifier C compiler.

65
System-level and High-level Synthesis
  • System-level design evaluation and analysis.
  • Design estimation.
  • Hardware-software partitioning.
  • Interface synthesis.
  • RTL synthesis.
  • Logic synthesis and technology mapping.

66
Partitioning and Placement
  • Topology-aware partitioning methods.
  • Partitioning onto a multi-FPGA system is
    equivalent to a placement problem.
  • Logic utilization and timing.

67
Pin Assignment and Routing
  • Pin-assignment - the process of determining which
    I/O pins to be used for each inter-FPGA signal.
  • Pin-assignment for a pre-fabricated multi-FPGA
    system is equivalent to the global routing
    problem.
  • Pin-assignment will greatly affect the quality of
    FPGAs logic utilization and routability.

68
Run-Time Reconfigurability
This is a new issue in system design how much of
the processor is virtual, when to reconfigure?
  • Virtual hardware ltgt virtual memory. What are
    their relations? Artificial Intelligence,
    robotics. Vision.
  • Hardware on demand.
  • What is the Initial Un-configured structure?What
    are the reconfiguring methods.
  • Software supporting time-varying mapping.
  • Many open problems need to be solved in the forth
    coming years.

69
Applications Splash 2
  • Stream oriented systolic and SIMD applications.
  • Scalable linear array of 16 to 256 processing
    elements (1 XC4010 with 1/2 Mbyte).
  • VHDL based.
  • Sequence comparison - 2300M0.75M cell
    updates/sec (Splash 2Sparc 10).
  • Edge detection - 10M242K pixels/sec (Splash
    2Sparc 10).

70
Applications PAM (DEC)
  • Programmable Active Memory (PAM).
  • C based and mesh arrays of XC3090 (DECPeRLe-1).
  • Applications
  • Multiple precision arithmetic.
  • RSA encryption.
  • Video compression (JPEG, MPEG, DCT). -
  • High energy physics.
  • Telecommunications.

71
Sources of some slides
  • Peter Alfke
  • Xilinx, Inc
  • peter.alfke_at_xilinx.com
About PowerShow.com