Title: VLSI%20DESIGN%201998%20TUTORIAL%20Part%201.%20Core%20Building%20Blocks%20and%20Building%20Systems%20using%20Cores
1VLSI DESIGN 1998 TUTORIAL Part 1. Core Building
Blocks andBuilding Systems using Cores
- What are cores?
- Building systems using cores
- Challenges in using cores
- Rajesh K. Gupta
- University of California, Irvine.
2Available Core Building Blocks
68030
ARM810
PPC401
3What Is A Core Cell?
- Working definition
- at least 5K gates
- pre-designed
- pre-verified
- re-usable
- Examples
- Processor LSI logic CW4001/4010/4100, ARM 7TDMI,
ARM 810, NEC 85x, Motorola 680x0, IBM PPC - DSP cores TI TMS320C54X, Pine, Oak
- Encryption PKuP, DES
- Controllers USB, PCI, UART
- Multimedia JPEG comp., MPEG decoder, DAC
- Networking ATM SAR, Ethernet
4Core Types
- Soft cores (code)
- HDL description
- flexible, i.e., can be changed to suit an
application - technology independent may be resynthesized
across processes - significant IP protection risks
- Firm cores (codestructure)
- gate-level netlist to be placed and routed
- technology sampled
- Hard cores (physical)
- ready for drop in
- include layout and timing (technology dependent)
- IP is easily protected
- mostly processors and memory
- functional test vectors or ATPG vectors available.
5Core Types and Their Use
Technology ASIC or FPGA
6Core Portability
- Determined by technology independence and data
format. - Technology independence based on the type of core
- both open and proprietary data formats are
current in use.
DEF Design Exchange Format (Cadence) SPEF
Standard Parasitic Extended Format
(Cadence) GDSII Layout format (Cadence) ITL
Interpolated Table Lookup cell-level timing model
(Mentor) LEF Layout Exchange Format (Cadence)
MMF Motive Modeling Format (Viewlogic) NLDM
Non-linear Delay Model (Synopsys) TLF Table
Lookup Format (Cadence) VCD Verilog Change Dump
(Cadence) WGL Waveform Graphical Language (TSSI)
7Timing Information in Firm and Hard Cores
- Timing behavior can be generated from SPICE
inputs - However, it is not always possible for big cores
- static timing information is necessary
- Basic delay model
- propagation delay model from inputs to outputs
- slew model (as a function of load and input slew)
- input/output capacitances
- setup and hold constraints on inputs.
8- What are cores?
- Building systems using cores
- Challenges in using cores
9Building Systems-On-A-Chip Using Cores
Commodity Hardware -compression -encryption -mode
m -signal proc. -image proc.
Commodity Software - encryption/decryption -
device drivers - legacy code - operating/runtime
system
SOC is a SM of LSI Logic Corporation.
10S-O-C Application Classes
11Systems-On-A-Chip (SOCs)
- Two Types
- Technology-Driven
- Developed In-House, maximum leverage of
technology crown-jewels - Close cooperation between module developers and
system designers - or wide-ranging cross-licensing agreements
between partners - Component-Driven
- Core cells as IP carriers
- IP encapsulated into usable products
- design reuse is critical to IP products
12Component-Driven SOC
- Core supplier different from core user
- Third party IP providers
- Significant technology packaging without
importing it - The IP provider wants to sell a product and not
the technology behind the product - Enormous technical, and legal challenges
- can it be done successfully?
- who guarantees if a SOC works as required
- who is liable in case the end product does not
perform?
13ASIC Cores Availability
- 3Soft uC, DSP, LAN, SCSI, PI
- ARM uC, uP
- Plessey per. controllers, DSP
- Scenix uC, PCI, DMA
- Western Digital Center uC
- TI DSP NEC DSP, uC
- Symbios ARM7 TC
- VAutomation uP, controllers
- CAST 2910A, IDT49C410, DMAc
- LSI logic CoreWare
- IBM Microelectronics
- Motorola FlexWare
- Lucent
One-stop Shops
One-Stop Shops
- Digital Design Dev MIDI
- Hitachi MPGE, PCI, SCSI, uC
- Palmchip MPEG, UART, ECC
- Silicon Engg. micro VGA
- Butterfly DSP DSP, FFT, DFT, ADSL, OFDM
- Int. Sil. Systems ADPCM, FIR
- Analog Devices DSP
- DSP Group Pine, Oak
- LogicVision BIST, JTAG
- ROHM UART, SIO, PIO, FIFOc, Add, Mpy, ALU
- Synopsys DesignWare, ISA, Intel uC
- Chip Express FIFO, RAM, ROM
- VLSI Libraries Memory, Mpy
- Eureka PCI Virtual Chips PCI, USB
- Logic Innovations PCI, ATM
- OKI PCI, PCMCIA, DMA, UART
- Sand USB, PCI
- Sierra ATM SAR, Ether, R3000
- Focus Semi PLL, VCXO
- VLSI Cores Encryption, DES
- ASIC Intl DES
NOT EXHAUSTIVE.
14FPGA/CPLD Cores Availability
- Capacity constrained cores
- do not include wide/high performance PCI, ATM
SAR, or Microprocessors - Altera
- 8-bit 6502
- DMAC 8237
- Xilinx
- PCI
- Actel
- System Programmable Gate Array (SPGA)
- combine FPGA with customer ASIC
- ASIC examples PCI, Router, DMA controller.
15Current Core Market Models
Three ways
- 1. A design house licenses design and tools
- DSP Group (Pine and Oak Cores), 3Soft, ARM (RISC)
- offering includes HDL simulation model, tool
and/or an emulator - customer does the design, fab.
- 2. Core vendor designs and fabs ICs
- TI, Motorola, Lucent
- VLSI, SSI, Cirrus, Adaptec
- 3. Core vendor sells cores, takes customer
designs and fabs ICs - LSI logic, TI, Lucent
Licensable
Foundary Captive
Foundary captive cores do not have to reveal
internal design and layoutof the core. The
foundary provides a bounding box.
16Core Trends1997 Survey of Designers
Months to completion
- 74 hardware designers.
- 26 plan to purchase core for next design
- 40 hard, 68 soft, 32 firm
Source Integrated System Design
17Application Needs
Source Integrated System Design
18Using Cores PCI
- Class of interface cores such as
- USB, UART, SCSI, PCI, 1394 etc.
- Identify target technology
- ASIC, FPGA
- PCI (Peripheral Component Interface)
- processor independent CPU interface to
peripherals - multi-master, peer-to-peer protocol
- synchronous 8-33 MHz (132 MB/s)
- arbitration central, access oriented, hidden
- variable length bursting on reads and writes
- (I/O, Mem) x (Read, Write) and IACK commands
19PCI Cores
- VHDL/Verilog synthesizable cores with options
- PCI-Host, PCI-Satellite
- 32-bit (33 MHz) or 64-bit (66 MHz)
- FIFO or register data storage
- Synchronous or Asynchronous host interface
- Core components
- Master/Target Read/Write FIFOs,
- Master/Target State Machines
- Configuration registers
- Timing requirements
- input setup time 7ns clock to output delay
11ns - DC Specs input pin caps 10 pF, clk pin 12 pF,
ID Sel 8pF
20User Experience
- Huges Network Systems
- DirecPC ASIC in a satellite receiver card
- 80K gates device on Chip Express process
- DirecPC consists of
- IDT R3041 RISC controller
- Memory, Demodulator, Error-check, PCI core
- PCI core from Virtual Chips
- 17K gates including asynchronous FIFOs
- Guesstimate 4K extra gates due to the core (5)
- Comments
- Their test vectors assume you have direct access
to the internal interface of the core. I looked
through their test vectors and tried to do the
same things using my back end. - They were kind of giving us a reference
documentation. It wasnt turnkey.
Source EE Times
21Using Cores DSPs
- 16-bit fixed point processors are most commonly
used. - DSPs
- simple Clarkspur Design CD2450 (variable data
width) - compatible DSPGroup, TI, SGS-T 320C5x
- clone
- Options
- memory, mem controller, interrupt controller,
host port, serial port - Criticals
- power consumption as most DSP applications go
into portable products
22Design using DSP Cores
- Core vendors often supply a development chip or
core version of the COTS processor - board-level prototyping fairly common
- followed by single-chip solution
- To avoid board-level prototyping, a
full-functional simulation model is a must,
particularly for foundry captive cores. - Software tools provided
- assembler, linker, instruction set simulator,
debugger, (high-level language compiler?)
23DSP Sample Points
- TI TEC320C52
- 16-bit fixed-point TMS320C52
- 1Kx16 data RAM, 4Kx16 program RAM
- 2 serial ports, 1 16-bit timer
- and 0.8 micron 15,000-gate gate array
- Motorola 7-Day CSIC
- 8-16 MHz HC08, DMA, MMU, ..
- SGS-Thomson ST18932, ST18950
- 16-bit fixed-point DSPs, 0.5 u, 3.3 volt CMOS,
80MHz - has no off-the-shelf DSP IC
- used in PC sound cards, 950 has a better assembly
Not exhaustive, only a representative sample.
24Third Party DSP Cores
- DSPGroup Pine
- 16-bit fixed-point, 0.8u CMOS, 5.0/3.3 V, 40 MHz
- 36-bit ALU, 16-bit MPY, 2Kx16 RAM/ROM, (prog mem
is outside core) - used in pagers and answering machines
- DSPGroup Oak
- same as Pine, plus includes a bit manipulation
unit - Viterbi decoding support instructions (min, max)
- used in digital cellular telephony
- Clarkspur CD2400, CD2450
- 16-bit fixed-point
- 24-bit ALU, MPY, Acc, 2x 256x16 data RAM/450
makes it 48 bits - used in fax-modem
25One-Stop Shops LSI Logic CoreWare
- Cores for building ASIC for most embedded
applications - laser printer, ATM, PDA, Set-top, Router,
Graphics accelerators, etc. - CPU cores miniRISC CW4K, Oak DSP
- miniRISC compatible with MIPS R4000
- 0.5u CMOS, 2mW/MHz, 60MHz, 3-stage pipeline
- 32-bit address/data bus
- full scan 99 fault coverage, gate-level timing
model - Interface PCI, Fibre Channel, SerialLink
- Networking Ethernet, ATM (SAR), Viterbi, RS
- Compression etc MPEG, JPEG, DAC/ADC.
26Core Examples
- Only a representative sample of cores. Not
exhaustive or even comparative. - Processor cores
- LSI Logic CW4001, CW4010
- ARM (7) processors
- Motorola FlexCore
- Memory cores
- 16M/18M Rambus DRAM
- Multimedia cores
- CompCore CD2
- Networking
- Media Access Controller (MAC)
- Encryption cores
- VLSI cores, ASIC international.
27LSI Logic CW4001 Core
- Behavioral Verilog/VHDL model
- Gate-level timing accurate model
- Specifications
- 60 MHz, 60 MIPS (45 MIPS average), 3 stage
pipeline - 0.5 micron CMOS process, 4 sq. mm., 2mW/MHz
- Full-scan with 99 fault coverage.
- Interfaces
- CBUS, Computational Bolt-On (CBO), Co-processor,
MMU - Customizability
- BIU, cache controller, MDU, MMU, DRAM/SRAM
controllers, timers, caches (lt16K), RAM/ROM, DMAc - Upto 3 Co-processors (FPU, Graphics, Compression,
Network Protocol), MPY/DIV unit, CRC, direct
access to CPU GPRs
28Using CW4001
- Co-processor has its own instruction set
including - read data bus for instruction, rd/wr to external
mem. - read/write to CPU registers, stall and interrupt
CPU - CW delivers 05 and 2631 opc fields to
Co-processor instr. decoder - Coprocessor executs in lockstep with CPU
pipeline stages.
29CW4010 CPU Core
- Verilog/VHDL model with gate-level timing
- 80MHz, 160 MIPS (110 MIPS average), 6 stage
pipeline - 0.5 micron CMOS, 9 sq. mm., 5 mW/MHz
- Integrated cache controllers with separate I and
D caches - cache size from 2-16 KB
- 64-bit memory and cache interface
- Up to 3 co-processors
- Full-scan with 99 fault coverage.
30Advanced RISC Machines (ARM )
- A family of 32-bit RISC processor cores
- ARM6, ARM7 MPU with Cache, MMU, Write Buffer and
JTAG - ARM7TDMI ARM7 with Thumb ISA, ICE, Debug MPY
- ARM8 cached, low power, 5-stage pipe (vs 3 in
others) - StrongARM1, StrongARM2 available as Digital
SA-110 (21285) - Piccolo DSP co-processor for ARM, shares system
bus (AMBA) - support for Viterbi, bit manipulation operations
- four nestable zero-overhead hardware loop
constructs - splittable ALU, 1 cycle dual 16-bit operations
- saturation arithmetic
- 1024 point in place complex radix 2 FFT in 33,331
cycles - Manufacturing partnerships and/or licensing with
- Cirrus logic, GEC Plessey, Sharp, TI and VLSI
Tech.
31ARM Processor Cores
Source ARM Inc.
- Enhancements ARM7D, ARM7DM, ARM7DMI
- M 64-bit result hardware multiplier running at
8bits/cycle - D 2 boundary scan chains for basic debug
- I Embedded ICE debug
- Thumb instruction set
32ARM Enhancements Embedded ICE
- The EmbeddedICE core cell allows debugging of ARM
core embedded with an ASIC - real time address and data-dependent breakpoints
- full access and control of the CPU
- can be reduced for size savings once the part
goes into production.
40KB/s software download
ASIC
ICE
ARM Core
Uses boundary scan pins
Debug Host running ARMsd
EmbeddedICE Cell (creates to core)
Source ARM Inc.
33ARM Enhancements Thumb ISA
- 8- or 16-bit external, 32-bit internal
- Thumb instruction set is a subset of 32-bit ARM
instruction set - 16-bit instructions
- expanded into 32-bit ARM instructions at run
time without any penalty - Up to 65-70 smaller code size compared to ARM
- 130 of ARM performance with 8/16 bit memory
- 85 of ARM performance with 32-bit memory
001
10
Rd
Constant
16-bit Thumb instr.
ADD Rd constant
maj. opc.
min. opc.
dest. and src.
zero extended
always
1110
001
01001
0 Rd
0 Rd
0000 Constant
32-bit ARM instr.
34ARM Applications
- Widely used in a variety of applications
- low cost 16-bit applications
- mobile phones, modems, fax machines, pagers
- hard disk and CD drive controllers
- engine management
- low cost 32-bit applications
- smart cards
- ATM and ethernet network interfaces
- low power, on-chip application code
- high performance 32-bit applications
- digital cameras
- set top boxes, network switches, laser printers
- external memory system (RAM, ROMs)
Courtesy S. Dey, ICCAD96
35Motorola FlexCore
- CPU cores based on 680x0 family
- EC000, EC020, EC030
- all with static operation, 5/3.3 volt supplies
- performance
- EC000 2.7 MIPS _at_16.67MHz, 33 mW
- EC020 7.4 MIPS _at_25 MHz, 150 mW
- EC030 11.8 MIPS _at_33 MHz, 258 mW
- Serial I/O cores 68681UART, MBus, SPI
- RT clock, Dual timer cores
- SCSCI, Parallel I/O, 8051 interfaces
- DRAM, Interrupt, JTAG controllers
- PLA, PLL, oscillators, power management cells.
36Memory Core Example
- Virtual Chips 16M/18M bit Rambus DRAM
- Verilog/VHDL simulation model
- Organization
- two banks, 512 pages per bank, 72x256 per page
- dual internal banks, 2K byte cache per bank
- Programmable ack, write, read delays through
control registers - Synchronous protocol for fast block oriented
xfrs. - Modes of operation
- reset, stand-by, power-down, active
- Deliverable VHDL, Verilog source, test bench,
test vectors, documentations. - Others Sand DRAM, VRAM verilog models.
37Multimedia Cores
MPEG input
Source CompCore
- JPEG compression, MPEG decoding, Video DAC, etc.
- IBM Microelectronics, LSI logic, PalmChip,
Silicon Engineering, Mentor Graphics, CompCore,
Intrinsix VGA - Example MPEG-2 decoder from CompCore
- 70K-80K gates
- 18K bits of internal SRAM
- 16Mbit SDRAM (external)
- bitstream buffering, frames
- 54MHz, 16-bit external mem. bus
CD2 Decoder
microc. interface
Audio Decoder
Video Decoder
virtual mem. controller
synchronization
SRAM
SRAM
SRAM
phy. mem. controller
1Mx16 SDRAM
audio stream
video str.
38Other Core Categories
Networking
Encryption
- Protocol choices
- switched Ether, s. TR, ATM155, ATM25
- Example SYM1000 from Symbios
- HDL code, 3.3 V, 0.5u
- CSMA/CD ethernet
- programmable inter-packet gap.
- Optional CRC insertion, and check
- MII interface to physical layer device
- Host bus interface
- LSI Logic ATMizer
- VLSI Cores
- PKuP encryption core
- implements modular exponentiation
- synthesizable HDL core
- DES core as a synthesizable Verilog model
- two models 8 bytes/8 cycle, 8 bytes/16 cycles
- ASIC International
- DES cores
- Exponentiator Engine
- Hash function cores
39- What are cores?
- Building systems using cores
- Challenges in using cores
40Challenges in Using Cores
- A core cell is not a single product
- a PCI cell consists of 25 separate Verilog files
- plus as many synthesis scripts
- immature interface abstraction
- e.g., there is no direct access to the core from
the end product. Access must be created. - A core is not an end product
- a core cell is design know-how to use it for a
particular process, tools and even application - Testability and testing is a challenge
- as opposed to design, testing is not a
hierarchical problem - using 90 testable cores does not give 90 system
testability - tests are core-specific, not applicable from
primary IO - What is an efficient design methodology using
cores?
41SOC Design Problem Components
2. HDL Modeling Architectural synthesis Logic
synthesis Physical synthesis
1. Design environment, co-simulation constraint
analysis.
Interface
Analog I/O
3. Software synthesis, Optimization, Retargetable
code gen., Debugging Programming environ.
Processor
ASIC
Interface
4. Test Issues, Test access, Isolation, ATPG
Memory
DMA
Processor cores introduce software part of system
design.
42Co-Design Components
- Specification, Modeling and Analysis
- How to capture designer intent efficiently in a
design language? - HDL optimizations
- Constraint modeling and analysis
- System Validation
- How to use description in building a
(computational) prototype capable of running
actual applications? - Co-simulation, Formal Verification
- System Design and Synthesis
- Delayed partitioning of hardware and software
- Software synthesis and optimizations
- Interface design and optimizations.
9
43System Specification Goals Characteristics
- Main purpose provide clear and unambiguous
description of the system function, and to
provide a - documentation of the initial design process
- Support
- diverse models of computation
- allow the application of computer-aided design
tools for - design space exploration
- partitioning
- software-hardware synthesis
- validation (verification, simulation)
- testing
- Should not constrain the implementation options.
- diverse implementation technologies.
44Embedded System Modeling
- Reactive and time-constrained interactions
- Consist of structural and behavioral components.
- Hierarchically organized components.
- Synchronous and asynchronous communications.
- Locally or globally clocked.
- Idealized as Synchronous Reactive Systems.
45Synchronous Reactive Modeling
- Zero computation time
- System outputs produced in synchrony with inputs
- Instantaneous broadcast communications
- Deterministic behavior
- a given sequence of inputs always produces same
output sequence. - Examples languages using this model
- ESTEREL, LUSTURE.
- More later.
46Example Esterel
- Reactive and atomicity of reactions
- watching implements a generalized watchdog
- Time as discrete instants
- Easily translated into a transducer (FSM
generation) - Perfect synchrony hypothesis
- Instantaneous broadcast
- Implicit communication architecture.
- Using signals which are present or absent and may
carry a value. - Pure signals do not carry a value.
47Constraint and Interface Modeling
- Source of timing constraints
- Time-constrained interactions between system
components and environment - Specified using statement tags on HDL
descriptions. - Types of constraints
- Delay and interval constraints (latency-type)
- Rate constraints (throughput-type)
- Constraint satisfiability
- Are constraints satisfied for a given
implementation? - Given an implementation, resynthesize to satisfy
a given set of constraints.
48Example
Derived from events at system interfaces.
49Interface Modeling using Constraints
- Interface described using events.
- Events are instances of actions.
- Most common interface action is a signal
transition on a wire. - Temporal relationship between events
- Propagation delays
- Bounds on event separation intervals min, max,
linear - Absolute versus relative rate constraints.
50Binary Delay Constraints
i
j
k
MAX
max
max
i
j
k
MIN
min
min
51Interface Delay Timing Constraints
- Three types (McMillan Dill)
- Given events i and j with time stamps ti and tj
respectively and dij as the delay i to event j,
such that lij lt dij lt uij - min constraints tj miniltj (ti dij )
- max constraints tj maxiltj (ti dij )
- linear constraints tj - ti lt sij where sij
is maximum achievable separation between i and
j. - Constraint graph
- nodes ltgt events edges ltgt constraints.
- Synthesis find maximum achievable separation
between pairs of events (minimum separation
depends upon operation delays.) - Rate constraint analysis and debugging.
52Hardware Modeling As A Programming Activity
- Programming languages are often used for
constructing system models - Core based designs assume that all new designs
originate as an HDL model - Hardware
- concurrency in operations
- I/O ports and interconnection of blocks
- exact event timing is important open computation
- Software
- typically sequential execution
- structural information is less important
- exact event timing is not important closed
computation.
53HDL Semantic Necessities
- Abstraction
- provide a mechanism for building larger systems
by composing smaller ones - Reactive programming
- provide mechansims to model non-terminating
interaction with other components - watching (signal) and waiting (condition)
- must be separate (else one is an implementation
of the other) - exception handling
- Determinism
- provide a predictable simulation behavior
- Simultaneity
- model hardware parallelism, multiple clocks
54HDL Pragmatics
- Data types
- simple (bit/Boolean) HardwareC, Verilog
- complex (records) VHDL
- Interface abstraction
- provide an external view independent of
implementation - Classes (packages) in C, VHDL
- Entity interfaces or Tasks VHDL, ADA
55Pragmatics (contd.)
- Communication
- shared variables using explicit communication
architectures - synchronous handshaking using implicit
communications (ADA task entry call) - instantaneous broadcast (Esterel)
- asynchronous message passing using explicitly
communication architectures - Time
- global, multiple clocks, logics.
56Going from HLL to HDL
(Restricted) HLL Description
Refine data types - bit true, fixed point -
saturation arithmetic
Add reactivity, clock(s), waiting watching
CONTROL
DATA
HDL Description
57HLL Restrictions
- Classes for synthesis target do not use
- unions, floating, pointers (only interface with
lib) - type casts
- virtual functions (restricted to only library
classes) - policy of use on shared variables
- Suggestions
- explicit initialization blocks
- use defines instead of conditional process
enables for statically determined conditions
58Adding Reactivity
- Reactivity can be added in one of three ways
- 1. use annotations, comments
- commonly used in home-grown C-based HDLs
- sometime use semantic overloads that is
association an alternative interpretations. - 2. use library assists
- additional library elements that can be used by
the programmer in modeling hardware. - example additional classes in C
- 3. use additional language constructs
- new constructs require a specific language
front-end, new debugging tools. - example divide operations across cycles using
next()
59Adding Data Types
- Identify signals
- storage elements, structured memory blocks
- Type variables signed, unsigned, std_logic
- Size state variables on instantiation
60Language Comparisons
- Verilog, VHDL compiler produces inputs to run a
DES simulator. - Esterel compiler produces a single deterministic
FSM. - Scenic compiler produces (synthesizable)
processes and a simulator.
61From HDL to Circuit/SystemCompilation
Synthesis
- Compilation spans programming language theory,
architecture and algorithms - Synthesis spans concurrency, finite automata,
switching theory and algorithms - In practice, the two tasks are inter-related.
- Compilation and synthesis tasks are done in three
steps - front-end, intermediate optimizations, back-end.
62Compilation
- Program compilation for software target
- Front-end parsing into intermediate form
- Optimization over the intermediate form
- Back-end code-generation for a given processor
- HDL compilation for hardware target
- Front-end parsing into intermediate form
- Optimization over the intermediate form
- Back-end architecture, logic and physical
synthesis.
63Synthesis and Optimization
- Substantial growth in last twenty years
- Industry-standard tools in
- Logic synthesis
- Physical synthesis
- Behavioral synthesis just becoming commercial.
- Substantial room for growth when considered
together with software compilation.
64Behavioral to RTL
- Basic transformations needed
- 1. Operation scheduling
- 2. Resource binding
- 3. Control generation central or distributed..
- Evolutionary growth to synthesis tools
- Designer expertise today lies in the RTL coding
- Synthesis tools are strongly dependent upon
design methodology. - Generate a structure suitable for synchronous and
single-phase circuits - resource performance in terms of execution delay
- in number of clock cycles
- Design space
- area, cycle time, latency, throughput
65Synthesis Tasks
- Operation scheduling, resource binding, control
generation - Scheduling determines operation start times
- minimize latency
- Resource binding resource selection, allocation
- minimize area (maximize sharing)
- Control synthesis
- data-path connectivity synthesis
- detailed resource connections
- steering logic
- connection to the interface
- control synthesis
- synthesize controller that provides
operations/resource enables, operation
synchronization, resource arbitration
66A CAD Methodology for SW
- Automated software synthesis from specs.
- Synthesis tools generate implementation
- Global optimization of the program.
- Optimization used to achieve design goals.
- Analysis and verification tools for feedback.
- Compilation for embeddable software
- Software Optimizations
- Code compression
- Optimization for power
- Instruction-set generation
- Static memory allocation
67Compression
- Block-based compression
- Program compressed in small blocks to preserve
random-access properties (e.g., cache line
blocks) - Transparent code compression
- ISA unchanged. Compression uses compiler output.
- Decompression performed by cache refill engine.
- Processor sees only uncompressed code.
- Techniques Huffman coding.
- Key issue code location in memory after
compression?
68Compilation What is New?
- Machine description
- in terms of architecture -gt programming
- in terms of organization -gt hardware
- Retargetable code generation has traditionally
addressed the problem of compilation for an
architecture. - SOCs also need input about machine organization
in order to perform timing analysis on generated
code - Two approaches
- describe detailed machine
- extract ISA from machine organization
69Co-Design Framework
Hardware Design Synthesis
70Test Strategy for Firm/Hard Cores
- System-level test strategy
- build test sets for cores
- generate functional vectors
- fault grade for interconnects
- prepare cores for test application from primary
inputs through access/isolation, Scan/DFT - if BIST, schedule BIST application and signature
analysis. - System-level DFT
- goal is to reduce testing cost
- increase accessability of the internal nodes
- controllability ability to establish a specific
signal value at each node from primary inputs
(PIs) - observability determine signal value by
controlling Pis and observing primary outputs - tradeoffs area, I/O pins, performance, yield, TTM
71DFT Techniques
- Commonly used approach is to modify a sequential
circuit into a combinational one during test. - Automatic test generation is much easier for
combinational circuits - Current monitoring techniques.
- For sequential circuits, scan techniques are
often used - link memory elements into a shift register
- serially load and read out
- boundary scan is commonly used to test
board-level devices - Built-In Self Test
- minimal external support, high fault coverage,
easy access requirements, protect IP
72Test Access for Cores
- Peripheral access techniques
- parallel access, serial access or functional
access - Parallel access
- add MUXs to connect core IOs, high routing
overhead, pin limitations may prevent parallel
access - Serial access
- most common is ring approach, during test core
I/Os are connected via a scan chain, low
overhead, delay penalty, easy to test
user-defined logic, long test application time - Functional access
- sensitize path through cores, low hardware cost,
parallel test pattern translation possible. - Also need isolation mechanisms for cores.
73Summary of Part I
- Core cells present a new market opportunity
- core cells are breathing life into many old
designs (6502) - a new class of third-party vendors who bridge
the gap between design houses and EDA vendors. - Productization of cores faces many challenges
- portability of cores versus design reuse
- socketing standards (portability and reuse)
- IP protection encryption, product versus
technology - design and test methodologies
- Research outlook is aligned with industry
expectations - all new designs start with HDL description
- immediate focus on validation, testability issues
- long term focus on software optimization,
complexity management.