Welcome to the ECE 449 Computer Design Lab - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Welcome to the ECE 449 Computer Design Lab

Description:

High Level Language (HLL) Design Methodology. 2. ECE 448 FPGA and ASIC Design with VHDL ... High Level Language (HLL) Design Methodology. Handel C. 27 ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 50
Provided by: kam878
Category:
Tags: ece | computer | design | hll | lab | welcome

less

Transcript and Presenter's Notes

Title: Welcome to the ECE 449 Computer Design Lab


1
Lecture 18 FPGA Boards FPGA-based
Supercomputers High Level Language (HLL)Design
Methodology
2
Resources
PCI http//en.wikipedia.org/wiki/Peripheral_Compo
nent_Interconnect PCI-X http//en.wikipedia.org/
wiki/PCI-X Reconfigurable Supercomputing T.
El-Ghazawi, K. Gaj, D. Buell, D. Pointer Tutorial
at the Supercomputing 2005 conference http//hpcl.
seas.gwu.edu/openfpga/tutorial_html/index.html
3
FPGA Device Capacity Trends
Virtex-5 550 MHz 24M gates
Virtex-II Pro 450 MHz 8M gates
Virtex-4 500 MHz 16M gates
Virtex-II 450 MHz 8M gates
Spartan-3 326 MHz 5M gates
Virtex-E 240 MHz 4M gates
Xilinx Device Complexity
Virtex 200 MHz 1M gates
XC4000 100 MHz 250K gates
Spartan-II 200 MHz 200K gates
Spartan 80 MHz 40K gates
XC3000 85 MHz 7.5K gates
XC5200 50 MHz 23K gates
XC2000 50 MHz 1K gates
1985
1991
1987
1995
1998
1999
2000
2002
2003
2004
2006
Year
Source http//class.ece.iastate.edu/cpre583/lectu
res/Lect-01.ppt
4
FPGA Boards
5
General Architecture of an FPGA-Based Board
6
Reconfigurable Computing Boards
  • Boards may have one or several interconnected
    FPGA chips
  • Support different bus standards, e.g. PCI, PCI-X,
    USB, etc.
  • May have direct real-time data I/O through a
    daughter board
  • Boards may have local onboard memory (OBM) to
    handle large data while avoiding the system bus
    (e.g. PCI) bottleneck

7
Reconfigurable Computing Boards
  • Many boards per node can be supported
  • Host program (e.g. C) to interface user (and mP)
    with board via a board API
  • Driver API functions may include functionalities
    such as Reset, Open, Close, Set Clocks, DMA,
    Read, Write, Download Configurations, Interrupt,
    Readback

8
Common Interface - PCI
  • PCI Peripheral Component Interconnect

64-bit bus
32-bit bus
9
PCI - Conventional hardware specifications
  • 32-bit or 64-bit bus width
  • 33.33 MHz clock with synchronous transfers
  • peak transfer rate of 133 MB per second for
    32-bit bus width (33.33 MHz 32 bits (1 byte
    8 bits) 133 MB/s)
  • peak transfer rate of 266MB/s for 64-bit bus
    width
  • 32-bit address space (4 gigabytes)
  • 32-bit port space
  • 5-volt signaling

10
PCI-X (PCI eXtended)
  • PCI-X doubles the width to 64-bit, revises the
    protocol, and increases the maximum signaling
    frequency to 133 MHz (peak transfer rate of 1014
    MB/s)
  • PCI-X 2.0 permits a 266 MHz rate (peak transfer
    rate of 2035 MB/s) and also 533 MHz rate, adds a
    16-bit bus variant and allows for 1.5 volt
    signaling

11
Some Reconfigurable Boards Vendors
  • ANNAPOLIS MICRO SYSTEMS, INC. (www.annapmicro.com)
  • University of Southern California -USC/ISI
    (http//www.east.isi.edu).
  • AMONTEC (www.amontec.com/chameleon.shtml)
  • XESS Corporation (www.xess.com)
  • CELOXICA (www.celoxica.com)
  • CESYS (www.cesys.com)
  • TRAQUAIR (www.traquair.com)
  • SILICON SOFTWARE (www.silicon-software.com)
  • COMPAQ (www.research.compaq.com/SRC/pamette/)
  • ALPHA DATA (www.alpha-data.com)
  • Associated Professional Systems
    (www.associatedpro.com)
  • NALLATECH (www.nallatech.com)

12
WILDSTAR II Pro
Reproduced and displayed with permission
13
WILDSTAR II Pro
Reproduced and displayed with permission
14
Reconfigurable Supercomputers
15
Scalable Reconfigurable Systems
  • Large numbers of reconfigurable processors and
    microprocessors
  • Everything can be configured
  • Functional units
  • Interconnects
  • Interfaces
  • High-level of scalability
  • Suitable for a wide range of applications
  • Everything can be reconfigured over and over at
    run time (Run-Time Reconfiguration) to suite
    underlying applications
  • Can be easily programmed by application
    scientists, at least in the same way of
    programming conventional parallel computers

16
Early Reconfigurable Architecture
17
Current Reconfigurable Architecture
?P
. . .
?P memory
Shared Memory and or NIC
18
Possible Classes of Reconfigurable Supercomputers

µP 1
µP N

RP 1
RP N
Independent Board Design
µP Board
RP Board

µP 1
µP N

RP 1
RP N
Joint Board Design
Joint µP/RP Board
Tighter Integration
19
Possible Classes of Reconfigurable
Supercomputers cont.
µP inside of RP Design

µP 1
µP N
RP 1
RP N
Joint µP/RP Board
RP inside of µP Design

RP 1
RP N
µP 1
µP N
Joint µP/RP Board
Tighter Integration
20
FPGA based supercomputers
Machine
Released
SRC 6 fromSRC Computers Cray XD1 fromfrom
Cray SGI Altix from SGI SRC 7 from SRC
Computers, Inc,
2002 2005 2005 2006
21
How to choose the system that best suits your
needs?
Typical users criteria
1. Clock speed 2. Amount of memory 3. Cost of
Ownership
22
How to choose the system that best suits your
needs?
Recommended users criteria
  • Tools
  • - right level of abstraction
  • - ease of development verification
  • - progress backward compatibility
  • 2. Libraries
  • - basic operations
  • - examples of full applications
  • 3. Technical support

23
How to choose the system that best suits your
needs?
Recommended users criteria (cont.)
4. Data Bandwidth
Reconfigurable ProcessorSystem
?Psystem
external I/O devices
24
How to choose the system that best suits your
needs?
Recommended users criteria (cont.)
5. Scalability - variable power and price
- efficient communication among the modules
25
Recommended users criteria (cont.)
6. Transfer of control overhead
Actual behavior
Theoretical behavior
FPGA
?P
?P
FPGA
Control transfer overhead
time
26
High Level Language (HLL)Design
MethodologyHandel C
27
Behavioral Synthesis
I/O Behavior
Target Library
Algorithm
Behavioral Synthesis
RTL Design
Logic Synthesis
Classic RTL Design Flow
Gate level Netlist
28
Need for High-Level Design
  • Higher level of abstraction
  • Modeling complex designs
  • Reduce design efforts
  • Fast turnaround time
  • Technology independence
  • Ease of HW/SW partitioning

29
Advantages of Behavioral Synthesis
  • Easy to model higher level of complexities
  • Smaller in size source compared to RTL code
  • Generates RTL much faster than manual method
  • Multi-cycle functionality
  • Loops
  • Memory Access

30
High-Level Languages
  • C/C-Based
  • Handel C Celoxica Ltd., UK
  • Impulse C Impulse Accelerated Technologies
  • Catapult C Impulse Accelerated Technologies
  • System C The Open SystemC Initiative
  • Java-based
  • Forge Xilinx
  • JHDL Brigham Young University

31
Other High-Level Design Flows
  • Matlab-based
  • System Generator for DSP Xilinx
  • AccelChip DSP Synthesis AccelChip
  • GUI Data-Flow based
  • Corefire Annapolis Microsystems
  • RC Toolbox DSPlogic

32
Handel C Design Flow
33
Design Flow
Executable Specification
Handel-C
VHDL
Synthesis
EDIF
EDIF
Place Route
34
Handel-C/ANSI-C Comparisons
ANSI-C
HANDEL-C
Handel-C Standard Library
ANSI-C Standard Library
Preprocessors i.e. define
Parallelism
Pointers
Structures
Channels
Side Effects i.e. X Y
ANSI-C Constructs for, while, if, switch
Arbitrary width variables
Arrays
Bitwise logical operators
Enhanced bit manipulation
Recursion
Logical operators
Arithmetic operators
RAM, ROM
Signals
Functions
Floating Point
Interfaces
35
Variables
  • Only one fundamental type for variables int
  • int 5 x
  • unsigned int 13 y
  • Default types
  • char 8 bits
  • short 16 bits
  • long 32 bits

36
Type Summary
37
Arrays
  • Same way as in ANSI-C
  • int 6 x7
  • 7 registers of 6 bits wide
  • unsigned int 6 x 4 5 6
  • 120 registers of 6 bits wide
  • Index must be a compile time constant. If random
    access is required, consider using RAM or ROM

38
Internal RAMs and ROMs
  • Using ram and rom keywords
  • ram int 6 a 43
  • a RAM consisting of 43 entries of 6 bits wide
  • rom int 16 b 4
  • a ROM consisting of 4 entries of 16 bits wide
  • RAMs and ROMs are accessed the same way that
    arrays are accessed in ANSI-C
  • Index need not be a compile time constant

39
Restrictions on RAMs and ROMs
  • RAMs and ROMs are restricted to performing
    operations sequentially. Only one element may be
    addressed in any given clock cycle
  • ram unsigned int 8 x 4
  • x 1 x 3 1 illegal
  • if (x 0 0)
  • x 1 1 illegal

40
Multi-port RAMs
  • static mpram Fred
  • ram ltunsigned 8gt ReadWrite256 (read/write
    port)
  • rom ltunsigned 8gt Read256
  • (read only port)
  • Now we can read and write in a given
  • clock cycle

41
Dual Port Memory
42
Handel-C Language (1)
  • A subset of ANSI-C
  • Sequential software style with a par construct
    to implement parallelism
  • A channel chan statement allows for
    communication and synchronization between
    parallel branches
  • Level of design abstraction is above RTL but
    below behavioral

43
Handel-C Language (2)
  • Each assignment and delay statement take one
    clock cycle
  • Automatic generation of the state machine from an
    algorithmic description of the circuit in terms
    of parallel and sequential blocks
  • Automatic scheduling of parallel and sequential
    blocks, that is the code following a group is
    scheduled only after that whole group has
    completed

44
Parallelism
Statement
Parallel blocks
45
Par construct - Examples
46
Par constructs - timing
47
Par construct shift register
48
Handel C vs. C - functions
  • Functions may not be called recursively, since
    all logic must be
  • expanded at compile-time to generate hardware
  • You can only call functions in expression
    statements.
  • These statements must not contain any other calls
    or assignments.
  • Variable length parameter lists are not
    supported.
  • Old-style ANSI-C function declarations
  • (where the type of the parameters is not
    specified) are not supported.
  • main() functions take no arguments and return no
    values.
  • Each main() function is associated with a clock.
  • If you have more than one main() function in the
    same source file,
  • they must all use the same clock.

49
Handel-C Overview
  • High-level language based on ISO/ANSI-C for the
    implementation of algorithms in hardware
  • Allows software engineers to design hardware
    without retraining
  • Clean extensions for hardware design including
    flexible data widths, parallelism and
    communications
  • Based on Communicating Sequential Process model
  • Independent parallel processes
  • par construct to specify parallel computation
    blocks within a process
  • Well defined timing model
  • Each statement takes a single clock cycle
  • Includes extended operators for bit manipulation,
    and high-level mathematical macros (including
    floating point)
Write a Comment
User Comments (0)
About PowerShow.com