Title: Programmable Logic: Introduction
1Programmable Logic Introduction
- Digital electronics a few reminders to basic
ideas and concepts - Combinational logic
- Sequential logic
- Synchronous vs. Asynchronous designs
- Programmable logic devices (PLD)
- Hardware overview
- Field-programmable gate arrays (FPGA)
- Basics
- Design flow
- Introduction to VHDL
- Course Material at http//elearning.uni-giessen
.de/studip
2Basic Logic Gates
3Example
4Designing with NAND and NOR Gates
- Implementation of NAND and NOR gates is easier
than that of AND and OR gates (e.g., CMOS)
5Combinational Logic
- Has no memory gtpresent state depends only on
the present input
X x1 x2... xn
Z z1 z2... zm
x1
z1
x2
z2
xn
zm
6Combinational-Circuit Building Blocks
- Multiplexers
- Decoders
- Encoders
- Code Converters
- Comparators
- Adders/Subtractors
- Multipliers
- Shifters
7Example Multiplexers 2-to-1 Multiplexer
- Have number of data inputs, one or more select
inputs, and one output - It passes the signal value on one of data inputs
to the output
w
s
0
w
0
0
f
s
f
w
1
1
w
1
(a) Graphical symbol
(c) Sum-of-products circuit
f
s
w
0
0
w
1
1
(b) Truth table
8Example Full Adder
Module
Truth table
9Sequential Circuits
- Circuits with Feedback
- Outputs f(inputs, past inputs, past outputs)
- Basis for building "memory" into logic circuits
- Door combination lock is an example of a
sequential circuit - State gt memory
- State is can be "output" and "input" to
combinational logic or to other sequential logic
10Simplest Circuits with Feedback
- Two inverters form a static memory cell
- Will hold value as long as it has power
applied - How to get a new value into the memory cell?
- Selectively break feedback path
- Load new value into cell
11Clocks
- Used to keep time
- Wait long enough for inputs to settle
- Then allow to have effect on value stored
- Clocks are regular periodic signals
- Period (time between ticks)
- Duty-cycle (time clock is high between ticks -
expressed as of period)
duty cycle (in this case, 50)
period
12Edge-Triggered Flip-Flops
- Positive edge-triggered
- Inputs sampled on rising edge outputs change
after rising edge - Negative edge-triggered flip-flops
- Inputs sampled on falling edge outputs change
after falling edge
100
D CLK Qpos Qpos' Qneg Qneg'
positive edge-triggered FF
negative edge-triggered FF
13Comparison of Latches and Flip-Flops
D CLK Qedge Qlatch
CLK
positiveedge-triggeredflip-flop
CLK
transparent(level-sensitive)latch
behavior is the same unless input changes while
the clock is high
14Timing Methodologies
- Rules for interconnecting components and clocks
- Guarantee proper operation of system when
strictly followed - Approach depends on building blocks used for
memory elements - Focus on systems with edge-triggered flip-flops
- Found in programmable logic devices
- Basic rules for correct timing
- (1) Correct inputs, with respect to time, are
provided to the flip-flops - (2) No flip-flop changes state more than once per
clocking event
15Timing Methodologies (contd)
- Definition of terms
- clock periodic event, causes state of memory
element to change can be rising or falling edge,
or high or low level - setup time minimum time before the clocking
event by which the input must be stable (Tsu) - hold time minimum time after the clocking event
until which the input must remain stable (Th)
data
clock
there is a timing "window" around the clocking
event during which the input must remain stable
and unchanged in order to be recognized
changing
stable
data
clock
16Typical Timing Specifications
- Positive edge-triggered D flip-flop
- Setup and hold times
- Minimum clock width
- Propagation delays (low to high, high to low, max
and typical)
all measurements are made from the clocking event
that is, the rising edge of the clock
17Synchronous vs. Asynchronous Designs
- Clocked synchronous circuits
- Inputs, state, and outputs sampled or changed in
relation to acommon reference signal (the clock) - Asynchronous circuits
- Inputs, state, and outputs sampled or changed
independently of a common reference signal
(glitches/hazards a major concern) - Stay away from asynchronous designs !
- Asynchronous inputs to synchronous circuits
- Inputs can change at any time, will not meet
setup/hold times - Dangerous, synchronous inputs are greatly
preferred - Cannot be avoided (e.g., reset signal, memory
wait, user input) - Solution synchronize with clock as early as
possible !
18Overview IC Technology
- In the early 80s
- Generic logic circuits (Example TTL SN7400)
- Complex applications assembled from basic
building blocks chips with few ( lt 10) hardwired
logic functions - Many PCBs, interconnects, inflexibility, cost ...
- 90s VLSI Circuits glue logic
- Now three types of IC technologies
- Full-custom ASIC
- Semi-custom ASIC (gate array and standard cell)
- PLD (Programmable Logic Device)
19NRE and unit cost metrics
- Unit cost
- the monetary cost of manufacturing each copy of
the system, excluding NRE cost - NRE cost (Non-Recurring Engineering cost)
- The one-time monetary cost of designing the
system - total cost NRE cost unit cost of
units - per-product cost total cost / of units
- (NRE cost / of units) unit cost
20General-purpose processors
- Programmable device used in a variety of
applications - Also known as microprocessor
- Features
- Program memory
- General datapath with large register file and
general ALU - User benefits
- Low time-to-market and NRE costs
- High flexibility
- Example Pentium, ARM,
21Application-specific processors
- Programmable processor optimized for a particular
class of applications having common
characteristics - Features
- Program memory
- Optimized datapath
- Special functional units
- Benefits
- Some flexibility, good performance, size and
power - Example DSP, Media Processor
Datapath
Controller
Registers
Control logic and State register
Custom ALU
IR
PC
Data memory
Program memory
Assembly code for total 0 for i 1 to
22Single-purpose hardware
- Digital circuit designed to execute exactly one
program - coprocessor, accelerator
- Features
- Contains components needed to execute a single
program - No program memory
- Benefits
- Fast
- Low power
- Small size
23Full-custom/VLSI
- All layers are optimized for an embedded systems
particular digital implementation - Placing transistors
- Sizing transistors
- Routing wires
- Benefits
- Excellent performance, small size, low power
- Drawbacks
- High NRE cost (e.g., 300k), long time-to-market
24Semi-custom
- Lower layers are fully or partially built
- Designers are left with routing of wires and
maybe placing some blocks - Benefits
- Good performance, good size, less NRE cost than a
full-custom implementation (perhaps 10k to
100k) - Drawbacks
- Still require weeks to months to develop
25PLD (Programmable Logic Device)
- All layers already exist
- Designers can purchase an IC
- Connections on the IC are either created or
destroyed to implement desired functionality - Field-Programmable Gate Array (FPGA) very popular
- Benefits
- Low NRE costs, almost instant IC availability
- Drawbacks
- Bigger, expensive (perhaps 30 per unit), power
hungry, slower
26Comparison of different technologies
Flexibility
Speed
27Roadmap for Programmable Logic
28PLD Definition
- Programmable Logic Device (PLD)
- An integrated circuit chip that can be configured
by end use to implement different digital
hardware - Also known as Field Programmable Logic Device
(FPLD)
29PLD Advantages
- Short design time
- Less expensive at low volume
Nonrecurring engineering cost
PLD
ASIC
Cost
Volume
30PLD Categorization
PLD
HCPLD
SPLD
High Capacity PLD
Simple PLD
PLA
PAL
Programmable Logic Array
Programmable Array Logic
FPGA
CPLD
Field Programmable Gate Array
Complex PLD
31Programmable ROM (PROM)
2 N x M ROM
N input
M output
- Address N bits Output word M bits
- ROM contains 2 N words of M bits each
- The input bits decide the particular word that
becomes available - on output lines
32Logic Diagram of 8x3 PROM
33Combinational Circuit Implementation using PROM
I0 I1 I2 F0 F1 F2
F0 F1 F2
34PROM Types
- Programmable PROM
- Break links through current pulses
- Write once, Read multiple times
- Erasable PROM (EPROM)
- Program with ultraviolet light
- Write multiple times, Read multiple times
- Electrically Erasable PROM (EEPROM)/ Flash Memory
- Program with electrical signal
- Write multiple times, Read multiple times
35PROM Advantages and Disadvantages
- Widely used to implement functions with large
number of inputs and outputs - Design of control units (Micro-programmed control
units) - For combinational circuits with lots of dont
care terms, PROM is a waste of logic resources
36CPLDs and FPGAs
Complex Programmable Logic Device (CPLD)
Field-Programmable Gate Array (FPGA)
Architecture PAL/22V10-like Gate
array-like More Combinational More Registers
RAM Density Low-to-medium Medium-to-high
0.5-10K logic gates 1K to 3.2M system
gates Performance Predictable timing
Application dependent Up to 250 MHz today
Up to 200 MHz today Interconnect Crossbar
Switch Incremental
37CPLD
Logic Block
Logic Block
I/O
I/O
Programmable Interconnect
Logic Block
Logic Block
38FPGA Overview
- Basic idea 2D array of combination logic blocks
(CL) and flip-flops (FF) with a means for the
user to configure both - 1. the interconnection between the logic blocks,
- 2. the function of each block.
Simplified version of FPGA internal architecture
39Where are FPGAs in the IC Zoo?
Source Dataquest
Programmable Logic Devices (PLDs)
Gate Arrays
Cell-Based ICs
Full Custom ICs
SPLDs (PALs)
FPGAs
Acronyms SPLD Simple Prog. Logic Device PAL
Prog. Array of Logic CPLD Complex PLD FPGA
Field Prog. Gate Array (Standard logic is SSI or
MSI buffers, gates)
Common Resources Configurable Logic Blocks
(CLB) Memory Look-Up Table AND-OR planes Simple
gates Input / Output Blocks (IOB) Bidirectional,
latches, inverters, pullup/pulldowns Interconnect
or Routing Local, internal feedback, and global
40FPGA Variations
- Families of FPGAs differ in
- physical means of implementing user
programmability, - arrangement of interconnection wires, and
- basic functionality of logic blocks
- Most significant difference is in the method for
providing flexible blocks and connections
- Anti-fuse based (ex Actel)
- Non-volatile, relatively small
- - fixed (non-reprogrammable)
- (Almost used in 150 Lab only 1-shot at getting
it right!)
41User Programmability
- Latches are used to
- 1. make or break cross-point connections in
interconnect - 2. define function of logic blocks
- 3. set user options
- within the logic blocks
- in the input/output blocks
- global reset/clock
- Configuration bit stream loaded under user
control - All latches are strung together in a shift chain
- Programming gt creating bit stream
- Latch-based (Xilinx, Altera, )
- reconfigurable
- - volatile
- relatively large die size
- Note Today 90 die is interconnect, 10 is gates
42Xilinx Programmable Logic
43Xilinx CPLD/FPGA
Features
Virtex-II
Spartan-IIE
FPGAs SRAM-based Feature Rich High Performance
FPGAs SRAM-based Feature Rich Low Cost
CPLDs Low Power
10K 600K 10M
Density (System Gates)
44Idealized FPGA Logic Block
- 4-input Look Up Table (4-LUT)
- implements combinational logic functions
- Register
- optionally stores output of LUT
- Latch determines output register or LUT
45XILINX SPARTAN IIE CLB Structure
- Each slice has 2 LUT-FF pairs with associated
carry logic - Two 3-state buffers (BUFT) associated with each
CLB, accessible by all CLB outputs
46LUT Implementation
- n-bit LUT is actually implemented as a 2n x 1
memory - inputs choose one of 2n memory locations.
- memory locations (latches) are normally loaded
with values from users configuration bit stream. - Inputs to mux control are the CLB (Configurable
Logic Block) inputs. - Result is a general purpose logic gate.
- n-LUT can implement any function of n inputs!
47LUT as general logic gate
Example 4-lut
- An n-lut as a direct implementation of a function
truth-table - Each latch location holds value of function
corresponding to one input combination
Example 2-lut
Implements any function of 2 inputs.
How many functions of n inputs?
48More functionality for free?
- Given basic idea
- LUT built from RAM
- Latches connected as shift register
- What other functions could be provided at very
little extra cost? - Using CLB latches as little RAM vs. logic
- Using CLB latches as shift register vs. logic
491. Distributed RAM
- CLB LUT configurable as Distributed RAM
- A LUT equals 16x1 RAM
- Implements Single and Dual-Ports
- Cascade LUTs to increase RAM size
- Synchronous write
- Synchronous/Asynchronous read
- Accompanying flip-flops used for synchronous read
502. Shift Register
- Each LUT can be configured as shift register
- Serial in, serial out
- Saves resources can use less than 16 FFs
- Faster no routing
- Note CAD tools determine which CLB is used as
LUT, RAM, or shift register, rather than up to
designer
51System Interfaces
19 Different Standards Supported!
- Supports multiple voltage and signal standards
simultaneously - Eliminate costly bus transceivers
52SelectI/OTM Standards
- VCCO defines output voltage
- VREF defines input threshold reference voltage
- Available as user I/O when using internal
reference
53I/Os Separated into 8 Banks
Bank 1
Bank 0
GCLK2
GCLK3
Bank 2
Bank 7
Banks 2 and 3 used during configuration
Bank 3
Bank 6
GCLK0
GCLK1
Bank 4
Bank 5
IOBI/O Blocks
54I/O Signal Types
I/O Signal Type
Single-Ended
Differential
LVTTL
LVCMOS
HSTL
SSTL
LVDS
Bus LVDS
LVPECL
NOTE Only the popular IO types shown here
55Single Ended I/O
- Traditional means of data transfer
- Data is carried on a single line
- Bigger voltage swing between logic Low and High
3.3 V
Logic High
Driver
Receiver
2 V
1.2V swing
Data Out
Data In
0.8 V
Logic Low
Single ended data transfer
LVTTL input levels
56SystemI/OSingle-Ended I/O Standards Summary
57Differential I/O
- Latest means of data transfer
- One data bit is carried through two signal lines
- Voltage difference determines logic High or Low
- Smaller voltage swing between logic Low and High
- Higher performance
- Lower power
- Lower noise
3.3 V
1.7 V
0.4V swing
1.3 V
Data Out
Differential signal data transfer
LVDS Input levels
58Differential I/O Types
- LVDS (Low Voltage Differential Signal)
- Unidirectional data transfer
- Bus LVDS
- Bi-directional communication between 2 or more
devices - Can transmit and receive LVDS signals through the
same pins - LVPECL (Low Voltage Positive Emitter Coupled
Logic) - Unidirectional data transfer
- Popular industry standard for fast clocking
59Programmable Logic Design Flow
Design Entry in schematic, ABEL, VHDL, and/or
Verilog.
Implementation includes Placement Routing and
bitstream, analyze timing, view layout, and more.
Download directly to the Xilinx hardware
device(s) with unlimited reconfigurations
3
60How Program FPGA Generic Design Flow
- Design Entry
- Create your design files using
- schematic editor or
- hardware description language (Verilog, VHDL)
- Design implementation on FPGA
- Partition, place, and route (PPR) to create
bit-stream file - Divide into CLB-sized pieces, place into blocks,
route to blocks - Design verification
- Use Simulator to check function,
- Other software determines max clock frequency.
- Load onto FPGA device (cable connects PC to
board) - check operation at full speed in real environment.
61- The Routing software automatically determines
which switches should be on and which should be
off to make the tens of thousands of connections
required between logic blocks. - The goal in this stage is to minimize the length
of connections and the number of switches
required to route a signal
- Shorter connections mean faster circuits
(unwanted parasitic elements can add considerable
RC interconnect delay if the number of anti-fuses
connected in series is not kept to an absolute
minimum. Clever routing techniques are therefore
crucial to anti-fuse-based FPGAs.
62The Placement software chooses which one of the
thousands of logic blocks within an FPGA will
implement the circuit. The goals in this stage
are 1.Minimize the amount of routing required
to make all the necessary connections between
the logic blocks, and 2.Maximize the circuit
speed.
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67(No Transcript)
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72What is VHDL?
- Programming Language Hardware Modeling Language
- It has all of the following
- Sequential Procedural language PASCAL and ADA
like - Concurrency statically allocated network of
processes - Timing constructs
- Discrete-event simulation semantics
- Object-oriented goodies libraries, packages,
polymorphism
73(No Transcript)
74What is Logic or Register-Transfer Level
Synthesis?
- A register-transfer system is a sequential
machine - Combinational logic connecting registers
- Functionality described as new values of register
in a clock cycle - Depends on input and current register values
- Depends on the transfer functions associated
with the various combinational logic blocks - Register-transfer design is structural - complex
combinations of state machines may not be easily
described solely by a large state transition
graph. - Register-transfer design concentrates on
functionality, not details of logic design.
75Register-Transfer Simulation
- Simulates to clock-cycle accuracy. Doesnt
guarantee timing. - Important to get proper function of machine
before jumping into detailed logic design. - But be sure to take into account critical delays
when choosing register-transfer organization
76A NAND Gate Example
- -- black-box definition (interface)entity NAND
is port ( A, B in bit Y out bit )end
NAND-- an implementation (contents)architectur
e BEHAVIOR_1 of NAND isbegin Y lt A nand Bend
BEHAVIOR_1
Important Conceptsentityarchitecturegenericpor
t
77Another Implementation of NAND
-
- -- there can be multiple implementationsarchitect
ure BEHAVIOR_2 of NAND isbegin Y lt 1 when
A0 or B0 else 0end
BEHAVIOR_2
78The process Statement
- label process (sensitivity_list)
declarations begin sequential_statement
end process label - It defines an independent sequential process
which repeatedly executes its body - Following are equivalent process
(A,B) process begin begin C lt A or
B C lt A or B end wait on A,
B end - No wait statements allowed in the body if there
is a sensitivity_list
79The wait Statement
- wait on list_of_signals
until boolean_expression - This is the ONLY sequential statement during
which time advances! - examples
- -- wait for a rising or falling edge on CLKwait
on CLKwait until CLKEVENT -- this is
equivalent to the above -- wait for rising edge
of CLKwait on CLK until CLK1wait until
CLK1 -- this is equivalent to the above --
wait for 10 nswait until 10 ns(only for
simulation purposes, cannot be used in synthesis
!!) -- wait for ever (the process effectively
dies!)wait
80A Simple Producer-Consumer Example
Producer
Consumer
DATA
REQ
ACK
81Producer-Consumer in VHDL
- entity producer_consumer isend
producer_comsumerarchitecture two_phase of
producer_consumer is signal REQ, ACK
bit signal DATA integerbegin P process
begin DATA lt produce() REQ lt not
REQ wait on ACK end P C process
begin wait on REQ consume(DATA) ACK lt
not ACK end C end two_phase
82Producer-Consumer in VHDL4-Phase Case
- architecture four_phase of producer_consumer
is signal REQ, ACK bit 0 signal DATA
integerbegin P process begin DATA lt
produce() REQ lt 1 wait until
ACK1 REQ lt 0 wait until ACK0 end
P C process begin wait until
REQ1 consume(DATA) ACK lt 1 wait
until REQ0 ACK lt 0 end C end
four_phase
83Behavior vs. Structure Description
- An entity can be described by its behavior or by
its structure, or in a mixed fashion - Example a 2-input XOR gate
84XOR in VHDL Behavior
- entity XOR is port ( A,B in bit Y out
bit)end XORarchitecture BEHAVIOR of XOR
isbegin Y lt (A and not B) or (not A and
B)end BEHAVIOR
85XOR in VHDL Structure
- architecture STRUCTURE of XOR is component
NAND port ( A, B in bit Y out bit) end
component signal C, D, E bitbegin G1
NAND port map (A, B, C) G2 NAND port map (A
gt A, B gt C, Y gt D) G3 NAND port map (C,
B gt B, Y gt E) G4 NAND port map (D, E,
Y)end STRUCTURE
Component Instantiationis just
anotherConcurrent Statement!
86XOR in VHDL Mixed
- architecture MIXED of XOR is component NAND
port ( A, B in bit Y out bit) end
component signal C, D, E bitbegin D lt A
nand C E lt C nand B G1 NAND port map (A,
B, C) G4 NAND port map (D, E, Y)end MIXED
87Concurrent vs. Sequential Statements
- Concurrent Statements
- Process independent sequential
processBlock groups concurrent
statementsConcurrent Procedure
convenient syntax forConcurrent Assertion
commonly occurring formConcurrent
Signal Assignment of processesComponent
Instantiation structure decompositionGener
ate Statement regular description
88Concurrent vs. Sequential Statements
- Sequential Statements
- Wait synchronization of processesAssertionS
ignal AssignmentVariable AssignmentProcedure
CallIfCaseLoop (for, while)Next
ExitReturnNull
89VHDLs Model of a System
- Static network of concurrent processes
communicating using signals - A process has drivers for certain signals
- A signal may be driven by multiple processes
90XILINX Design Tools
91Signals versus Variables
- architecture DUMMY_1 of JUNK is signal Y
bit 0begin process variable X bit
0 begin wait for 10 ns X 1 Y
lt X wait for 10 ns -- What is Y at this
point ? 1 ... end processend DUMMY_1
- architecture DUMMY_2 of JUNK is signal X, Y
bit 0begin process begin wait
for 10 ns X lt 1 Y lt X wait for 10
ns -- What is Y at this point ? 0 ...
end processend DUMMY_2
Signal assignments with 0 delay take effect only
aftera delta delay. i.e., in the next simulation
cycle.
92Transaction Delay Models
- VHDL has two distinct ways of modeling delays
- Transport delay model
- signal lt transport waveform
- for ideal devices with infinite frequency
response in which every input pulse, however
small, produces an output pulse - Inertial delay model
- signal lt reject time inertial waveform
- for devices with inertia so that not all pulses
go through - pulses lt reject time are rejected
Projected waveformPreemptive timingTransport
delayInertial delay
93Case 1 Transport Delay Model
- Let
- Tnew earliest transaction scheduled by a new
signal assignment - Model
- All pending transactions at time ? Tnew are
deleted - All new transactions are added
94Transport Delay Model (contd.)
- Y lt 0 after 0 ns, 2 after 2 ns, 4 after 4 ns, 6
after 6 ns - wait for 1 ns
- Y lt transport 3 after 2 ns, 5 after 4 ns, 7
after 6 ns
95Case 2 Inertial Delay Model
- Let
- Tnew earliest transaction scheduled by a new
signal assignment - Tr pulse rejection limit
- Model
- All pending transactions at time ? Tnew are
deleted - All new transactions are added
- Any pending transactions in the interval Tnew-Tr
to Tnew are examined. If there is a run of
consecutive transactions immediately preceding
the earliest new transaction at Tnew with the
same value as that new transaction, the yare
kept. Other pending transactions in the interval
are deleted.
96Inertial Delay Model (contd.)
- Y lt 0 after 0 ns, 2 after 2 ns, 4 after 4 ns, 6
after 6 ns - wait for 1 ns
- Y lt 3 after 2 ns, 5 after 4 ns, 7 after 6 ns
97Inertial Delay Model (contd.)
- -- suppose current time is 10 ns
- S lt reject 5 ns inertial 1 after 8 ns
retained
retained
deleted
pulse rejection interval (13 ns to 18 ns)
new transaction
98Signals with Multiple Drivers
- Y lt A -- in process1and, Y lt B -- in
process2 - What is the value of the signal in such a case?
- VHDL uses the concept of a Resolution Function
that is attached to a signal or a type, and is
called every time the value of signal needs to be
determined -- that is every time a driver changes
value
99Resolution Function Example Wire-And (open
collector)
- package RESOLVED is function wired_and
(Vbit_vector) return bit subtype rbit is
wired_and bitend RESOLVEDpackage body
RESOLVED is function wired_and(Vbit_vector)
return bit is begin for I in VRANGE
loop if V(I) 0 then return 0 end
if end loop return 1 end wired_andend
RESOLVED
100Guarded Signals register bus
- Guarded signals are those whose drivers can be
turned off - What happens when all drivers of a guarded signal
are off? - Case 1 retain the last driven value
- signal X bit register
- useful for modeling charge storage nodes
- Case 2 float to a user defined default value
- signal Y bit bus
- useful for modeling busses
101Guarded Signals (contd.)
- Two ways to turn off the drivers
- null waveform in sequential signal assignment
- signal_name lt null after time_expression
- guarded concurrent signal assignment
- block (data_bus_enable1)begin data_bus lt
guarded 0011end block
102Using VHDL Like C!
- Normal sequential procedural programs can be
written in VHDL without ever utilizing the event
scheduler or the concurrent concepts. - Example
- entity HelloWorld is endarchitecture C_LIKE of
HelloWorld is use std.textio.allbegin main
process variable buf line begin write(buf,
Hello World!) writeln(output, buf) wait
-- needed to terminate the program end process
mainend C_LIKE
103Language Features Types
- TYPE Set of Values Set of Operations
- VHDL types SCALAR ENUMERATION e.g. character,
bit, boolean INTEGER e.g. integer FLOATING e
.g. real PHYSICAL e.g. time COMPOSITE ARRAY
e.g. bit_vector, string RECORD ACCESS FILE
104Examples of VHDL Types
- type bit is (0, 1)type thor_bit is (U,
0, 1, Z)type memory_address is range 0 to
232-1type small_float is range 0.0 to
1.0type weight is range 0 to 1E8 units Gm
-- base unit Kg 1000 Gm -- kilogram Tonne
1000 Kg -- tonne end units
105Language Features Subtypes
- SUBTYPE TYPE constraints on values
- TYPE is the base-type of SUBTYPE
- SUBTYPE inherits all the operators of TYPE
- SUBTYPE can be more or less used interchangeably
with TYPE - Examples
- subtype natural is integer range 0 to
integerHIGHsubtype good_thor_bit is thor_bit
range 0 to 1subtype small_float is real
range 0.0 to 1.0
106Array and Record Types
- -- unconstrained array (defines an array
type)type bit_vector is array (natural range ltgt)
of bit-- constrained array (define an array
type and subtype)type word is array (0 to 31) of
bit-- another unconstrained arraytype memory
is array (natural range ltgt) of word-- following
is illegal!type memory is array (natural range
ltgt) of bit_vector-- an example recordtype
PERSON is record name string(1 to
20) age integer range 0 to 150 end record
107Language Features Overloading
- Pre-defined operators (e.g., , -, and, nand
etc.) can be overloaded to call functions - Example
- function and(L,R thor_bit) return thor_bit
is begin if L0 or R0 then return
0 elsif L1 and R1 then return
1 else return U end if end
and -- now one can say C lt A and B --
where A, B and C are of type thor_bit
108Overloading (contd.)
- Two subprograms (functions or procedures) can
have the same name, i.e., the names can be
overloaded. They are distinguished by parameter
types. e.g., - function MAX(A,Binteger) return integer
- function MAX(A,Breal) return real
109Language Features Configurations
- Component declarations really define a template
for a design entity - The binding of an entity to this template is done
through a configuration declaration - entity data_path is ...end data_patharchitect
ure INCOMPLETE of data_path is component
alu port(function in alu_function op1,
op2 in bit_vector_32 result out
bit_vector_32) end componentbegin ...end
INCOMPLETE
110Configurations (contd.)
-
- configuration DEMO_CONFIG of data_path isfor
INCOMPLETE for allalu use entity
work.alu_cell(BEHAVIOR) port map (function_code
gt function, operand1 gt op1, operand2 gt
op2, result gt result, flags gt open) end
forend for - end DEMO_CONFIG
111Language Features Packages
- A package is a collection of reusable
declarations (constants, types, functions,
procedures, signals etc.) - A package has a
- declaration (interface), and a
- body (contents) optional
112Example of a Package
- package SIMPLE_THOR is type thor_bit is (U,
0,1,Z) function and(L,R thor_bit)
return thor_bit function or(L,Rthor_bit)
return thor_bit ...end SIMPLE_THORpackage
body SIMPLE_THOR is function and(L,R
thor_bit) return thor_bit is begin ... end
and ...end SIMPLE_THOR-- and then it can
be used after sayinglibrary my_lib use
my_lib.SIMPLE_THOR.all
113Language Features Design Units and Libraries
- VHDL constructs are written in a design file and
the compiler puts them into a design library. - Libraries are made up of design units
- primary design units
- entity declarations
- package declarations
- configuration declarations
- secondary design units
- architecture bodies
- package bodies
114Design Units and Libraries (contd.)
- Libraries have a logical name and the OS maps the
logical name to a physical name - for example, directories on UNIX
- Two special libraries
- work the working design library
- std contains packages standard and
textio - To declare libraries that are referenced in a
design unit - library library_name
- To make certain library units directly visible
- use library_name.unit_name
- use also defines dependency among design units.
115Logic Simulation in VHDL
- The 2-state bit data type is insufficient for
low-level simulation - Multi-Valued types can easily be built in VHDL
- several packages available
- IEEE standard logic
- Example one may use a 4-value (U,0,1,Z)
system - but no notion of strength
- only wired-X resolution
- Multi-State/Strength systems with interval logic
116A Typical VHDL Simulator (MCC)
117Common VHDL IssuesCombinational Processes
- process (A, B, SELECT)begin if (SELCT1)
then OUT lt A else OUT lt B end ifend
process - Sensitivity list must consist of all signals that
are read inside the process - Synthesis tools often ignore sensitivity list,
but simulation tools do not a forgotten signal
will lead to difference in behavior of the
simulated model and the synthesized design
118Common VHDL IssuesCombinational Processes
- process (A, B, SELECT)begin if (SELCT1) then
OUT lt A else OUT lt B end ifend process
- processbegin if (SELCT1) then OUT lt
A else OUT lt B end if wait on A, B,
SELend process
- Can use WAIT ON instead of sensitivity list
- But not both!
119Common VHDL IssuesWait-free Paths
processbegin if (some condition) wait on
CLKevent and CLK1 X lt A B end ifend
process
- Every possible path that the code can take
through the process body in a process without
sensitivity list must have a WAIT - Otherwise the process can hang (feedback loop)
120Common VHDL IssuesMistakenly Inferences Latches
process (A,B)begin if (cond1) X lt A
B elseif (cond2) X lt X B end ifend
process
- Remember, incomplete assignments imply latches
- in the above example, if neither cond1 nor cond2
is true then X will retain its value basically,
X is stored in a latch - if you are writing combinational logic, make sure
that every output gets assigned a value along
each path (e.g. if statements, case statements)
through the process body - in general, latches are not recommended any way
in synchronous designs (not testable via scan
paths)
121Common VHDL IssuesImplicit Register Inference
processbegin wait until CLKevent and CLK1
if (COUNT gt 9) then COUNT lt 0 else COUNT lt
COUNT 1 end process
1
COUNT
CLK
- Storage registers are synthesized for all signals
that are driven within a clocked process - Storage registers are also synthesized for all
variables that are read before being updated
122Common VHDL IssuesReset (or Set) in Synthesis
processbegin wait until CLKevent and CLK1
if (RST1) then -- synchronous reset
else -- combinational stuff end if end
process
process (CLK, RST)begin if (RST1) then
-- asynchronous reset elsif ( CLKevent and
CLK1) then -- combinational stuff end
if end process
- Must reset all regsiters, other syntehsized chip
wont work - unlike simulation, you cant set initial values
in synthesis! - Asynchronous reset possible only with a process
that ha sensitivity list
123Common VHDL IssuesCoding Style Influence
process(A, B, C, SEL)begin if (SEL1) then
Z lt A B else Z lt A C end ifend
process
process(A, B, C, SEL) variable TMP
bitbegin if (SEL1) then TMP B
else TMP C end if Z lt A TMPend
process
B C
A B A C
SEL
A
SEL
Z
Z
- Structure of initially generated hardware is
determined by the VHDL code itself - Synthesis optimizes that initially generated
hardware, but cannot do dramatic changes - Therefore, coding style matters!
124Common VHDL IssuesIF vs CASE
- IF-THEN-ELSIF-THEN--ELSE maps to a chain of
2-to-1 multiplexors, each checking for the
successive conditions - if (COND1) then OUT lt X1elsif (COND2) then
OUT lt X2else OUT lt Xn - CASE maps to a single N-to-1 multiplexor
- case EXPRESSION is when VALUE1 gt
- OUT lt X1
- when VALUE2 gt
- OUT lt X2
-
- when others gt
- OUT lt Xn
- end case
125Common VHDL IssuesLet the tool do the Synthesis
- Dont do synthesis by hand!
- do not come up with boolean functions for outputs
of arithmetic operator - let Synopsys decide which adder, multiplier to
use - you will only restrict the synthesis process
- Best to use IEEE signed and unsigned types, and
convert to integers if needed (IEEE NUMERIC_STD
and NUMERIC_BIT packages) - example
- A_INT lt TO_INTEGER(A_VEC)
- B_INT lt TO_INTEGER(B_VEC)
- C_INT lt A_INT B_INT
- C_VEC lt TO_UNSIGNED(C_INT, 4)
126Other VHDL Issues
- Let synthesis tool decide the numeric encoding of
the FSM states - use enumerated type for state
- Split into multiple simpler processes
- Keep module outputs registered
- simplifies timing constraints
127VHDL
128Programmable Logic Array
n x k links
k AND gates
m OR gates
m outputs
k X m links
n inputs
n x k links
- Programmable AND array programmable OR array
- n x k x m PLA has 2n x k k x m links
129Programmable Array Logic (PAL)
- Programmable AND array
- Fixed OR array
- Each output line permanently connected to a
specific set of product terms - Number of switching functions that can be
implemented with PAL are more limited than PROM
and PLA
130CPLD
Logic Block
Logic Block
I/O
I/O
Programmable Interconnect
Logic Block
Logic Block
131CPLD Logic Block
- Simple PLD
- Inputs
- Product-term array
- Product term allocation function
- Macro-cells (registers)
- Logic blocks executes sum-of-product expressions
and stores the results in micro-cell registers - Programmable interconnects route signals to and
from logic blocks -
132Major CPLD Resources
- Number of macro-cells per logic block
- Number of inputs from programmable interconnect
to logic block - Number of product terms in logic block
133Structure of FPGA (Xilinx)
Logic Block
I/O Block
Interconnect
134Configurable Logic Block CLB
135Logic Function
- Implemented as look-up table (LUT)
- K-input LUT corresponds to 2 K x 1 bit memory
- K-input LUT can implement any k-input 1-output
logic function
136Configuring FPGA
- Configure CLB and IOB
- Configure interconnect
- Interconnect technology
- SRAM
- Anti-fuse (program once)
- EPROM / EEPROM
137Programming Technology
138FPGA Applications
- Glue Logic (replace SSI and MSI parts)
- Rapid turnaround
- Prototype design
- Emulation
- Custom computing
- Dynamic reconfiguration
139PLD Logic Capacity
- SPLD about 200 gates
- CPLD
- Altera FLEX (250K logic gates)
- Xilinx XC9500
- FPGA
- Xilinx Vertex-E ( 3 million logic gates)
- Xilinx Spartan (10K logic gates)
- Altera
140FPGA Design Flow
Design Entry
Design Implementation
Design Verification
FPGA Configuration
141Design Entry (VHDL in our case)
Schematic
HDL
Compile
Logic Equations
Minimize
Test vectors
Reduced Logic Equations (Netlist)
Simulation
142Design Implementation
- Input Netlist Output bitstream
- Map the design onto FPGA resources
- Break up the circuit so that each block has
maximum n inputs - NP-hard problem
- However, optimal solution is not required
143Design Implementation (Cont.)
- Place assigns logic blocks created during
mapping process to specific location on FPGA - Goal minimize length of wires
- Again NP-hard
- Route routes interconnect paths between logic
blocks - NP-hard
144Design Implementation Techniques
- Simulated annealing
- Genetic algorithm
- Mincut method
- Heuristic method
145Design Verification FPGA Configuration
- Functional Simulation
- Timing Simulation
- Download bitstream into FPGA
146Introduction to Programmable Logic
- FPGAs Overview
- Why use FPGAs?(a short history lesson).
- FPGA variations
- Internal logic blocks.
- Designing with FPGAs.
- Specifics of Xilinx Virtex-E series.
147FPGA Overview
- Basic idea 2D array of combination logic blocks
(CL) and flip-flops (FF) with a means for the
user to configure both - 1. the interconnection between the logic blocks,
- 2. the function of each block.
Simplified version of FPGA internal architecture
148Why FPGAs? (1 / 5)
- By the early 1980s most of logic circuits in
typical systems were absorbed by a handful of
standard large scale integrated circuits (LSI
ICs). - Microprocessors, bus/IO controllers, system
timers, ... - Every system still needed random small glue
logic ICs to help connect the large ICs - generating global control signals (for resets
etc.) - data formatting (serial to parallel,
multiplexing, etc.) - Systems had a few LSI components and lots of
small low density SSI (small scale IC) and MSI
(medium scale IC) components.
Printed Circuit (PC) board with many small SSI
and MSI ICs and a few LSI ICs
149Why FPGAs? (2 / 5)
- Custom ICs sometimes designed to replace glue
logic - reduced complexity/manufacturing cost, improved
performance - But custom ICs expensive to develop, and delay
introduction of product (time to market)
because of increased design time - Note need to worry about two kinds of costs
- 1. cost of development, Non-Recurring
Engineering (NRE), fixed - 2. cost of manufacture per unit, variable
- Usually tradeoff between NRE cost and
manufacturing costs
NRE
NRE
150Why FPGAs? (3 / 5)
- Therefore custom IC approach was only viable for
products with very high volume (where NRE could
be amortized), and not sensitive in time to
market (TTM) - FPGAs introduced as alternative to custom ICs for
implementing glue logic - improved PC board density vs. discrete SSI/MSI
components (within around 10x of custom ICs) - computer aided design (CAD) tools meant circuits
could be implemented quickly (no physical layout
process, no mask making, no IC manufacturing),
relative to Application Specific ICs (ASICs)
(3-6 months for these steps for custom IC) - lowers NREs (Non Recurring Engineering)
- shortens TTM (Time To Market)
- Because of Moores law the density (gates/area)
of FPGAs continued to grow through the 80s and
90s to the point where major data processing
functions can be implemented on a single FPGA.
151Why FPGAs? (4 / 5)
- FPGAs continue to compete with custom ICs for
special processing functions (and glue logic) but
now try to compete with microprocessors in
dedicated and embedded applications - Performance advantage over microprocessors
because circuits can be customized for the task
at hand. Microprocessors must provide special
functions in software (many cycles) - MICRO Highest NRE, SW fastest TTM
- ASIC Highest performance, worst TTM
- FPGA Highest cost per chip (unit cost)
152Why FPGAs? (5 / 5)
- As Moores Law continues, FPGAs work for more
applications as both can do more logic in 1 chip
and faster - Can easily be patched vs. ASICs
- Perfect for courses
- Can change design repeatedly
- Low TTM yet reasonable speed
- With Moores Law, now can do full CS 152 project
easily inside 1 FPGA
153Where are FPGAs in the IC Zoo?
Source Dataquest
Programmable Logic Devices (PLDs)
Gate Arrays
Cell-Based ICs
Full Custom ICs
SPLDs (PALs)
FPGAs
Acronyms SPLD Simple Prog. Logic Device PAL
Prog. Array of Logic CPLD Complex PLD FPGA
Field Prog. Gate Array (Standard logic is SSI or
MSI buffers, gates)
Common Resources Configurable Logic Blocks
(CLB) Memory Look-Up Table AND-OR planes Simple
gates Input / Output Blocks (IOB) Bidirectional,
latches, inverters, pullup/pulldowns Interconnect
or Routing Local, internal feedback, and global
154FPGA Variations
- Families of FPGAs differ in
- physical means of implementing user
programmability, - arrangement of interconnection wires, and
- basic functionality of logic blocks
- Most significant difference is in the method for
providing flexible blocks and connections
- Anti-fuse based (ex Actel)
- Non-volatile, relatively small
- - fixed (non-reprogrammable)
- (Almost used in 150 Lab only 1-shot at getting
it right!)
155User Programmability
- Latches are used to
- 1. make or break cross-point connections in
interconnect - 2. define function of logic blocks
- 3. set user options
- within the logic blocks
- in the input/output blocks
- global reset/clock
- Configuration bit stream loaded under user
control - All latches are strung together in a shift chain
- Programming gt creating bit stream
- Latch-based (Xilinx, Altera, )
- reconfigurable
- - volatile
- relatively large die size
- Note Today 90 die is interconnect, 10 is gates
156Idealized FPGA Logic Block
- 4-input Look Up Table (4-LUT)
- implements combinational logic functions
- Register
- optionally stores output of LUT
- Latch determines whether read reg or LUT
1574-LUT Implementation
- n-bit LUT is actually implemented as a 2n x 1
memory - inputs choose one of 2n memory locations.
- memory locations (latches) are normally loaded
with values from users configuration bit stream. - Inputs to mux control are the CLB (Configurable
Logic Block) inputs. - Result is a general purpose logic gate.
- n-LUT can implement any function of n inputs!
158LUT as general logic gate
Example 4-lut
- An n-lut as a direct implementation of a function
truth-table - Each latch location holds value of function
corresponding to one input combination
Example 2-lut
Implements any function of 2 inputs.
How many functions of n inputs?
159More functionality for free?
- Given basic idea
- LUT built from RAM
- Latches connected as shift register
- What other functions could be provided at very
little extra cost? - Using CLB latches as little RAM vs. logic
- Using CLB latches as shift register vs. logic
1601. Distributed RAM
- CLB LUT configurable as Distributed RAM
- A LUT equals 16x1 RAM
- Implements Single and Dual-Ports
- Cascade LUTs to increase RAM size
- Synchronous write
- Synchronous/Asynchronous read
- Accompanying flip-flops used for synchronous read
1612. Shift Register
- Each LUT can be configured as shift register
- Serial in, serial out
- Saves resources can use less than 16 FFs
- Faster no routing
- Note CAD tools determine with CLB used as LUT,
RAM, or shift register, rather than up to designer
162How Program FPGA Generic Design Flow
- Design Entry
- Create your design files using
- schematic editor or
- hardware description language (Verilog, VHDL)
- Design implementation on FPGA
- Partition, place, and route (PPR) to create
bit-stream file - Divide into CLB-sized pieces, place into blocks,
route to blocks - Design verification
- Use Simulator to check function,
- Other software determines max clock frequency.
- Load onto FPGA device (cable connects PC to
board) - check operation at full speed in real environment.
163Decoders 2-to-4 Decoder
y
w
w
y
y
y
En
164Encoders
- Opposite of decoders
- Encode given information into a more compact form
- Binary encoders
- 2n inputs into n-bit code
- Exactly one of the input signals should have a
value of 1,and outputs present the binary number
that identifies which input is equal to 1 - Use reduce the number of bits (transmitting and
storing information)
165Designing with NAND and NOR Gates (2)
- Any logic function can be realized using only
NAND or NOR gates gt NAND/NOR is complete - NAND function is complete can be used to
generate any logical function - 1 a I (a a) a a 1
- 0 a I (a a) a I (a a) 1 1 0
- a a a a
- ab (a b) (a b) (a b) ab
- ab (a a) (b b) a b a b
166Multiplexers 4-to-1 Multiplexer
s
0
s
0
s
f
s
s
1
1
0
w
0
w
00
w
0
0
0
s
0
1
w
01
w
1
0
1
f
1
w
10
w
2
1
0
2
w
w
11
3
w
1
1
1
3
f
(b) Truth table
(a) Graphic symbol
w
2
w
3
(c) Circuit
167Multiplexers Building Larger Mulitplexers
s
0
s
1
s
1
w
s
0
0
w
3
w
0
0
w