ECE 697F Reconfigurable Computing Lecture 24 Course Wrapup - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

ECE 697F Reconfigurable Computing Lecture 24 Course Wrapup

Description:

Computation using hardware that can adapt at the logic ... Hardware/Software. Relatively new research area. Acknowledgement: Wolf text. Design abstractions ... – PowerPoint PPT presentation

Number of Views:30

Avg rating:3.0/5.0

Slides: 40

Provided by: RussTe7

Category:

more less

Transcript and Presenter's Notes

Title: ECE 697F Reconfigurable Computing Lecture 24 Course Wrapup

1
ECE 697FReconfigurable ComputingLecture
24Course Wrap-up
2
What is Reconfigurable Computing?

Computation using hardware that can adapt at the
logic level to solve specific problems
Why is this interesting?
Some applications are poorly suited to
microprocessor.
VLSI explosion provides increasing resources.
Hardware/Software
Relatively new research area.
Acknowledgement Wolf text

3
Design abstractions
4
Processor FPGA
Three possibilities
daughtercard
Proc
FPGA
chip
Backplane bus (e.g. PCI)
1. FPGA serves as coprocessor for data
intensive applications possible project.
FPGA
chip
Proc
2. FPGA serves as embedded computer for low
latency transfer.
Reconfigurable Functional Unit
5
Xilinx XC4000 Cell

2 4-input look-up tables
1 3-input look-up table
2 D flip flops

6
Xilinx XC4000 Routing
25
7
Actel Programmable Gate Arrays
I/O Buffers, Programming and Test Logic
Rows of programmable logic building
blocks rows of interconnect
I/O Buffers, Programming and Test Logic
Anti-fuse Technology Program Once
I/O Buffers, Programming and Test Logic
Use Anti-fuses to build up long wiring runs
from short segments
I/O Buffers, Programming and Test Logic
Logic Module
Wiring Tracks
8 input, single output combinational logic
blocks FFs constructed from discrete cross
coupled gates
8
Altera Max 7000 Macrocell
9
Example DPGA Prototype
10
FPGA vs. DPGA Compare
11
Min-cut bisecting partitioning
B
A
C
D
partition 1
partition 2
12
Hill Climbing Algorithms

To avoid getting trapped in local minima,
consider hill-climbing approach
Need to accept worse solutions or make bad
moves to get global minima.
Acceptance is probabalistic. Only accept
cost-increasing moves some of the time.

Cost
Solution space
13
Routing Tradeoffs

Bias router to find first, best route.
Vary number of node expansions using
pcosti (1 a) x pcosti-1 ncosti a x disti

14
Architectural Limitation

Routing architecture necessitates domain
selection.
Bigger effect for multi-fanout nets

15
Two-dimensional Layout

Control network supports distributed signals.
Data routed as four-bit values.

16
Rapid Datapath

Segmented linear architecture
All RAMs and ALUs are pipelined
Bus connectors also contain registers

17
Basic Functional Unit

Two inputs from adjacent blocks.
Local memory for instructions, data.

18
Chess Interconnect

More like an FPGA
Takes advantage of near-neighbor connectivity

19
FPICs

High internal connectivity
Not always cost effective

20
Hierarchical Crossbar

Full connectivity occurs at top level
Routing between FPGAs requires determining level
at which source and destination share an
ancestor.
Simplifies routing

21
Linear Array

Current hardware
Programs implemented as systolic array
Input key
Search each RAM bank for sequence

22
Emulation Software Steps
Netlist Translation
Technology Mapping
Many of these are dependent on device
interconnect topology
Divide netlist into fixed-sized chunks
Partitioner
Global Placer
Locate an FPGA for a chunk
Global Router
Make connections between devices
FPGA-specific PR
Xilinx PR
FPGA bitstreams
23
Simulation Acceleration

FPGA system takes the place of one portion of
simulated design
Inputs transported to FPGA system.
Outputs returned from FPGA system.

24
Network Routing

FPGAs popular in network hardware
New protocols implemented directly in silicon
Easy to upgrade in the field
Washington University Gigabit Switch (WUGS)
Switch provides up to 160 Gbps of bandwidth.

25
Pyramid Operations

Gaussian Pyramid
Down sample image to compress image size for
communication.
Average over a set of points to create new point
Laplacian Pyramid
Determine error found from Gaussian Pyramid
Expand contracted picture and compare with
original

26
Gaussian Pyramid Implementation

Systolic array in which each device performs a
separate function.
Limited by clock rate of slowest device.

27
Proposed Data Acquisition System
Gigabit Ethernet Interface
64K X 16 DUAL PORT RAM
GIGABIT ETHERNET PHY
RJ45
Radar Control Interface
36
36
Hard Disk Interface
32
32
FPGA2 Stratix EP1S40 (Storage Control)
FPGA1 Stratix EP1S40 (Data Processing)
3.3 V BUFFER
Gigabit Ethernet core
30
30
ATA66 IDE Channel 0
3.3 to 5 V BUFFER
14
AD6645 (105 MSPS)
H Channel
Analog
64
30
30
AD6645 (105 MSPS)
14
3.3 to 5 V BUFFER
ATA66 IDE Channel 1
V - Channel
Radar Unit
Analog
16
AD974 (200 KSPS)
Radar Positioner Data channel
SRAM 1 x 512K X36 DATA PROCESSING MEMORY
SRAM 3 x 512K X36 DATA PROCESSING MEMORY
16
10/100 Mbps Ethernet Interface
16
62
ETHERNET PHY
RJ45
MAX 7000A PLD
ATMEL AT91RM9200 MICROCONTROLLER ARM - RISC
CORE (209 MHz 32 BIT)
ETHERNET CONTROLLER
USB INTERFACE
SOFTWARE FLASH 1 X 4M X 16 CONFIGURATION MEMORY
BOOT FLASH 2 X 1M X 16 PROGRAM MEMORY
SDRAM 2 X 8M X 16 DATA MEMORY
USB BLOCK
JTAG PORT
RS232 DRIVER
SERIAL PORT
28
Detailed View of Dharma
29
Chimaera Architecture

Live copy of register file values feed into array
Each row of array may compute from register of
intermediates
Tag on array to indicate RFUOP

30
Chimaera Architecture

Array can operate on values as soon as placed in
register file.
Logic is combinational
When RFUOP matches
Stall until result ready
Drive result from matching row

31
Chimaera Results

Three Spec92 benchmarks
Compress 1.11 speedup
Eqntott 1.8
Life 2.06
Small arrays with limited state
Small speedup
Perhaps focus on global router rather than local
optimization.

32
Garp

Integrate as coprocessor
Similar bandwidth to processor as functional unit
Own access to memory
Support multi-cycle operation
Allow state
Cycle counter to track operation
Configuration cache, path to memory

33
Garp Array

Row-oriented logic
Dedicated path for processor/memory
Processor does not have to be involved in
array-memory path

34
System Model Adaptive Viterbi Decoder
35
Compression Techniques

Effectively we can consider an FPGA device as a
collection of cells, each with (x, y) location.
Instead of using a serial bit stream, could
consider loading data cell-by-cell like a
standard memory.
Specify location of cell through use of two
registers.

Row
36
Hardware Support for Runlength

Initially latch in base
Down counter indicates number of strides to take.
Offset used to augment initial base
Fairly simple to implement.

37
Determining Communication Level
Send, Receive, Wait
Application hardware (custom)
Register reads/writes
I/O driver
Interrupt service
Bus transactions
I/O bus
Interrupts