Syllabus Summary - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Syllabus Summary

Description:

Overview of Field Programmable Gate Arrays (FPGAs) design development within the ... Fixed-plus-Variable, that is core processor with FPGA (Quicksilver, Stretch) ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 55
Provided by: spaane
Category:

less

Transcript and Presenter's Notes

Title: Syllabus Summary


1
Syllabus Summary
Microelectronics for the Global World
Collaborative Engineering ECE992/777
2
Lecture 1. Introduction
  • Overview of Field Programmable Gate Arrays
    (FPGAs) design development within the appropriate
    software/firmware components development
    environment.
  • In the global design world, we will have to deal
    with Intellectual Property (IP), IP testing,
    trust and design efficiency.
  • Differences in technology status, design
    environments and proficiency will lead to the
    need for tools for design-space excursions and
    optimizations.
  • As a result of the taught design elements,
    globally distributed engineering can be
    accomplished.

3
  • The course is divided into three parts,
  • collaborative design elements,
  • the collaborative development process and
  • the subsequent approach for integration and
    optimization.

4
Part I.
Collaborative Design Elements
5
Lecture 2. Outsourcing Economy
  • Outsourcing has been approached fearfully but can
    also be approached as an opportunity to
    innovation
  • We will review traditional versus outsourcing
    driven design methodology
  • Security/Trust issues related to foreign killer
    chips will also be discussed.

6
Global Collaboration in Outsourcing
7
Disruptive Technologies
Performance trajectory of present technology
driven by sustaining technological improvements
Performance that customers can absorb or utilize
Performance
New performance trajectory
Disruptive Technology
Time
Clayton M. Christensen, The Innovators Dilemma
When New Technologies Cause Great Firms to
Fail, HarperBusiness, 2000 (Revised Edition)
8
Lecture 3. Reconfigurable Computing
  • Positioned in computing densities between
    Application-Specific Integrated Circuits (ASICs)
    and Digital Signal Processors (DSPs), FPGAs
    provide increased flexibility in computational
    details such as degrees of parallelism and
    pipelining, as well as real-estate and power
    consumption over DSPs and General-Purpose (GP)
    microprocessors.

9
Estrins Fixed Plus Variable Structure Computer
Organization of Computer Systems - The Fixed Plus
Variable Structure Computer, Gerald Estrin, Proc.
WJCC, 1960
10
FPGA Architectural Developments
  • Traditional Sea-of-CLBs (Xilinx, Altera)
  • Extreme-DSP, FPGA with embedded 192 18x18
    multipliers (Xilinx, Altera), with embedded
    PowerPC cores, RapidIO cells (Xilinx)
  • Fixed-plus-Variable, that is core processor with
    FPGA (Quicksilver, Stretch)
  • Macro-Pipeline Processor (PipeRench)
  • Sea-of-ALUs, chunky arrays (MorphoSys,
    MathStar)
  • Dynamic Reconfiguration (IPFlex)
  • DARPA-sponsored Polymorphous Computing
    Architectures (PCA) developments

11
Xilinx Virtex-4 FPGA Family
12
MathStar FPOA
  • Chunky, gross-grain array
  • Five Silicon Object types
  • Arithmetic Logic Unit (ALU)
  • Content Addressable Memory (CAM)
  • Cyclic Redundancy Check (CRC)
  • Multiply Accumulator (MAC)
  • Register File (RF).
  • In addition RAM memory resources are distributed
    in the array.
  • The function and ratio of these different Silicon
    Objects are chosen based on detailed study of
    applications space for the product offerings.

13
Processing Spectrum Continuum

ASIC


FPGA


DSP


GPU

Sea-of-CLBs





lt
-------
gt


Sea-of-ALUs



lt
----------------
gt


Fixed-plus-Variable




lt
---------------------------------------
gt


Macro-Pipeline



lt
-----------------------------------------------
gt


Dynamic Reconfiguration



lt
-------
gt


VHDL/Verilog
C/C




lt
-----
-------
gtlt
----------
-------
gt





lt
---------------------------
----------------------
gt

SystemC

14
Efficiency versus Application Space
PCA
ASIC
FPGA
GP
SWEPT Efficiency
Vectors/ Streaming
Structured Bit-operations
Symbolic Operations
Application Types
Optimized Performance Over Broad Application Space
15
Native Stream Mode
16
Native Threaded Mode
17
Application Flow
Control
1
2
3
StreamProcessing
MC-SM
MC-SM
MC-SM
Inter-chip I/O(crossbar)
Inter-chipMemoryTransfer
ThreadedProcessing
MC-TM
MC-TM
5
4
ParcelInterface
18
Lecture 4. Levels of Abstraction
  • It is a misconception to expect to be able to use
    FPGA personalization bit-level code, in order to
    update/upgrade.
  • Too many technology-specific design decisions
    have been made to get to that particular
    synthesized code pattern.
  • Only optimization at higher levels of abstraction
    will payoff in the long run.
  • Liev01 P. Lieverse, P. van der Wolf, E.
    Deprettere, K. Vissers, A Methodology for
    Architecture Exploration of Heterogeneous Signal
    Processing Systems, Journal of VLSI Signal
    Processing, 29, 197206 (2001), Kluwer Academic
    Publishers, Boston

19
The Design Pyramid LIEV01
20
Effect of Abstraction Level
Relative Efficiency
Compiler Performance
Tradeoff Curve Optimization Potential
VHDL
SystemC
UML
Abstraction Level
21
Lecture 5. Design Flow
  • Design elements from UML down to VHDL, including
    SystemC, MathWorks Simulink and Xilinx
    SystemBuilder will be reviewed, as well as
    general design/test flows.

22
(No Transcript)
23
SystemC-based Hardware/Software Co-Design
System Behavior
System Architecture
Mapping
Performance Simulation
Refine
Implementation
Software
Hardware
Keutzer, K., Malik, S., Newton, R., Rabacy, J.,
Sangiovanni-Vincentelli, A., System Level
Design Orthogonalization of Concerns and
Platform Based Design, IEEE Transactions on
Computer-Aided Design of Circuits and Systems,
2000, 19(12)
24
Lecture 6. Tools for Design
  • The state-of-the-art of design elements needed
    for collaborative design development, including
    verification, trade-off and optimization tools
    will be described and evaluated.

25
MILAN
  • MILAN is a model-based, extensible simulation
    framework that provides a unified environment
    capable of
  • modeling a large class of embedded systems and
    applications
  • seamlessly integrating different widely-used
    simulators into a single framework
  • enabling rapid evaluation of performance metrics
    such as power, latency, and throughput
  • facilitating simulation at various levels of
    granularity
  • rapid evaluation of a large design space

MILAN, Institute for Software Integrated Systems,
Vanderbilt University, Nashville
26
The MILAN Architecture
GME 2000
Design Space Exploration Tools
Functional Simulators
High-level Power Estimators
Cycle-Accurate Power Simulators
System Generation and Synthesis Tools
Target System
Model interpreter feeding-back results
Model interpreter driving simulators/tools
i
i
MILAN, Institute for Software Integrated Systems,
Vanderbilt University, Nashville
27
Part II.
Collaborative Development
28
Lecture 7. Intellectual Property (IP)
  • The IP business model and some of its limitations
    will be reviewed, several other business
    propositions such open model and fabless design
    companies will be analyzed.
  • Business Proposition, Cost Model
  • Re-use Potential, Patentable
  • Hardcore or Softcore IP
  • Hardware versus Software Components

29
Lecture 8. Open Standards VIA, OCP, VSIA
  • Interface standards defined and developed for, in
    particular, System-on-Chip design will be
    reviewed and analyzed for compatibility to IP
    component development.
  • Open Core Protocol (OCP)-IP www.ocpip.org
  • Virtual Interface Architecture (VIA)
  • Virtual Sockets Interface Alliance (VSIA)
    www.vsia.org

30
Lecture 9. Component/System Testing
  • Testability aspects of firmware components,
    including generation of test-vectors, assessment
    of coverage, JTAG testing and test monitor
    concept will be illustrated.
  • Intellitech (Durham, NH) TEST-IP
  • Plug and Play Scan Components
  • Boundary Scan
  • Self-Test
  • Observability

31
Lecture 10. Trusted Circuits
  • The use of more globally developed ICs has
    increased the need for tools to support the
    trustable development of complex and
    performance sensitive applications.
  • ..develop enabling trusted assembly,
    integration, and test technologies that verify
    the correctness, reliability, and functionality
    of designed Integrated Circuits (ICs), i.e.,
    approaches that enable IC users to fully trust
    the ICs they employ. DARPA SBIR 2005.2

32
Part III.
Collaborative Integration and Optimization
33
Lecture 11. Component Tradeoffs
  • In heterogeneous computing environments, the
    constituting functions and subsystems can be
    implemented at various points along their
    respective design space tradeoff curves.

34
Performance/Cost Tradeoffs
The Analysis of Processor-Time Trade-Off
Opportunities in a Reconfigurable Multi-Processor
System, H.A.E. Spaanenburg, Syracuse University,
1979
35
Lecture 12. Design Excursions (SPADE)
  • In the University of Leiden STEF02 approach
    particular computational instances have been
    transformed by small perturbations in the
    design space. These techniques support a system
    designer in exploring alternative instances of an
    application mapped onto an architecture template.
  • STEF02 T. Stefanov, B. Kienhuis, E. Deprettere,
    Algorithmic Transformation Techniques for
    Efficient Exploration of Alternative Application
    Instances, Proceedings 10th International
    Symposium on Hardware/Software Codesign
    (CODES02), Estes Park, Colorado, May 6-8, 2002

36
The Y-chart extended with the Application
TransformationLayer STEF02.
37
Alternative instances of the application have to
begenerated, mapped onto the architecture
template and exploredin order to evaluate the
performance of the Application-Architecture pair
STEF02.
38
Simple example illustrating the unfolding and
skewingtransformations STEF02.
39
Lecture 13. Optimization (SPIRAL)
  • A Carnegie Mellon University developed SPIRAL
    PUCH05 program technique automatically
    generates high performance code that is tuned to
    the given platform. SPIRAL generates code for a
    broad set of DSP transforms including the
    discrete Fourier transform, other trigonometric
    transforms, filter transforms, and discrete
    wavelet transforms.
  • PUCH05 M. Püschel, J. Moura, J. Johnson, D.
    Padua, M. Veloso, B. Singer, J. Xiong, F.
    Franchetti, A. Gacic, Y. Voronenko, K. Chen, R.
    W. Johnson, and N. Rizzolo, SPIRAL Code
    Generation for DSP Transforms, Proceedings of
    the IEEE Special Issue on Program Generation,
    Optimization, and Adaptation, Vol. 93, No. 2,
    2005, pp. 232-275

40
SPIRAL
Automates the
A library generator for highly optimized,
platform-adapted signal processing transforms
J. Moura et al, Generating Platform-Adapted DSP
Libraries Using SPIRAL, HPEC 2001
41
SPIRAL Methodology
given
DSP Transform (DFT, DCT, Wavelets etc.)
given
Computer Architecture
J. Moura et al, Generating Platform-Adapted DSP
Libraries Using SPIRAL, HPEC 2001
42
SPIRAL vs. FFTW (lower better)
Pentium III/Linux/gcc
Athlon/Linux/gcc
comparable performance
J. Moura et al, Generating Platform-Adapted DSP
Libraries Using SPIRAL, HPEC 2001
Pentium III/Win2000/Intel compiler
43
Lecture 14. System Optimization
  • The total system solution can be evaluated for
    the right combination of design space points for
    their constituting elements.
  • This procedure within the total system constraint
    allows for an efficient process for increasing
    benefits for the least incremental cost.
  • These procedures especially facilitate the
    introduction of technology updates, since it
    allows for the reestablishment of the proper
    computational operating point for the combination
    of the old and new technology.

44
Processor-Time System Tradeoffs
The Analysis of Processor-Time Trade-Off
Opportunities in a Reconfigurable Multi-Processor
System, H.A.E. Spaanenburg, Syracuse University,
1979
45
Order-of-Magnitude Improvements
Insertion of a next-level processor into an
embedded heterogeneous environment needs to
present an order-of-magnitude improvement
potential
MOPS Kg.Watt
ASIC
FPGA
X
DSP
1000
RISC
100
10
gt3.3x18 months 5 years
time
46
Lecture 15. Heterogeneous Systems
  • Heterogeneous processing systems currently
    contain a continuum of processing alternatives
    from general-purpose processors (GPP), to digital
    signal processors (DSP), to Field-Programmable
    Gate Arrays (FPGA) and Application-Specific
    Integrated Circuits (ASIC).
  • Especially the FPGA domain has recently produced
    its own range of architectural alternatives along
    that processing continuum spectrum.

47
Not One Machine Does Everything
Since no single architecture can satisfy the
needs of all users, it has been desirable to have
compute system whose architecture can be defined
and varied dynamically S.S. Reddi and E.A.
Feustal, A Conceptual Framework for Computer
Architecture, Computing Surveys, Vol. 8, No. 2,
June 1976
Top of Empire State Building in New York
Top of Foshay Tower in Minneapolis
Airport
Airport
48
Performance-Flexibility Trades
1000
Dedicated ASICs
100
Energy Efficiency MOPS/mW (or MIPS/mW)
10
1
0.1
Flexibility (Coverage)
Pleiades Ultra-Low Power Hybrid and
Reconfigurable Computing, Jan Rabaey, UC
Berkeley, 1999
49
Lecture 16. Upgrade/Updates, Technology
Transparency
  • System developers must continue to reevaluate
    which combination of implementation alternatives
    will best meet their overall system requirements.
  • This question is not only important for the
    initial design, but also for subsequent
    technology updates and upgrades, especially when
    they have to be implemented in the same
    constrained real estate.

50
Upgrade/Update Approach
UML
UML-to-SystemC Front-end
SystemC
SystemC-to-VHDL Compiler
VHDL
VHDL
VHDL-to-FPGA Synthesizer
Design in Technology 1, e.g. Xilinx Virtex-4
Design in Technology 2, e.g. MathStar FPOA
51
Lecture 17. Virtualization
  • A virtual middleware architecture can be
    carefully mapped onto an FPGA architecture.
  • This approach results in effective performance of
    the virtual architecture, with maximum
    parallelism and throughput.
  • To the system programmer the virtual
    (middleware) machine will become its programming
    environment.
  • Programming and code generation of the actual
    virtual machine will make use of conventional
    software tools, such as compilers and assemblers.

52
Virtual Middleware Concept
53
Virtual PSP Middleware Concept
54
Conclusion
  • In a recent interview with Electronics Weekly (9
    May 2005), Wim Roelandts, president and CEO of
    Xilinx made the following observation
  • The next step is really to make FPGAs disappear.
    Today our customers are hardware engineers. But
    FPGAs are programmable devices. If we can create
    a level of abstraction that appeals to software
    engineers, we can increase our customer base by
    at least 10x. That's really where our future is.
    As long as you have a set of interfaces that you
    can programme to, you don't have to know what the
    hardware looks like.
Write a Comment
User Comments (0)
About PowerShow.com