Asynchronous Circuit Compilation - PowerPoint PPT Presentation

About This Presentation
Title:

Asynchronous Circuit Compilation

Description:

Asynchronous Circuit Compilation Dr. Doug Edwards doug_at_cs.man.ac.uk – PowerPoint PPT presentation

Number of Views:194
Avg rating:3.0/5.0
Slides: 99
Provided by: Doug358
Category:

less

Transcript and Presenter's Notes

Title: Asynchronous Circuit Compilation


1
Asynchronous Circuit Compilation
  • Dr. Doug Edwards
  • doug_at_cs.man.ac.uk

2
Overview
  • Asynchronous circuits
  • Advantages
  • Asynchronous Design Paradigms
  • Syntax Directed Compilation
  • Handshake Circuits
  • Balsa
  • Datapath Compilation
  • Design Example - DMA Controller

3
Asynchronous (self-timed) Basics
  • Synchronous circuits
  • a global clock separates system states
  • A time domain view of system activity.
  • Asynchronous circuits
  • input changes separate system states
  • A sequence or trace domain view of system
    activity.

4
Why Asynchronous?
  • Low Power
  • data-driven power is only used to do useful work
  • zero power when idle with instant restart
  • Low EMI
  • In a clocked circuit, all noise is correlated
  • Async circuits have distributed switching
    activity leading to uncorrelated EMI

5
Why Asynchronous?
  • No clock distribution problems
  • Composability/Modularity
  • facilitates IP reuse
  • Average Case Performance
  • exploit the fact that worst-case often occurs
    infrequently

6
Timing Models
  • Delay Insensitive (DI)
  • Delays in circuits wires are arbitrary
  • Quasi-Delay Insensitive (QDI)
  • Similar to DI but assuming isochronic forks
  • Speed Independent (SI)
  • Wires have no delays, arbitrary gate delays
  • Bounded Delay
  • Single-sided timing constraints

7
Asynchronous Design Paradigms
  • AFSMs - for fast controllers etc
  • Traditionally hard
  • hazards, races ,state asigment problems
  • Research has led to new techniques
  • STG/Petri net based SI circuits
  • Burst-Mode circuits
  • Macromodule-like for larger systems
  • micropipeline approach, handshake circuits

8
Asynchronous Control
  • With no clock, some other means is required to
    co-ordinate control flow
  • Use a request/acknowledge handshake

Req
Ack
Sender
9
Signalling Protocols
  • req ack are abstractions
  • layer a signalling protocol on top of them
  • Two common protocols
  • 2-phase (transition signalling, NRZ)
  • 4-phase (Return-to-Zero signalling)

10
Data Validity Models
  • Self Timed
  • The validity of the data is encoded within the
    data itself redundant coding
  • e.g. Dual Rail each data bit requires two wires.
    00 -gt no data, 01 -gt 0, 10 -gt 1
  • Bundled Data approach
  • conventional datapath
  • validity is assured by imposing timing
    constraints.

11
2-phase Protocol
  • Events are transitions

º
R
eq
1 tr
ansaction
1 tr
ansaction
Ac
k
v
alid
v
alid
12
4-phase protocol
  • Signals are returned to initial state after each
    transaction
  • Several possible interleavings of the signal
    transitions

13
Comparison of Approaches
  • 2-phase/4-phase
  • 2-phase conceptually simpler (once an event
    mind-set is adopted)
  • 2-phase circuits slower more complex
  • think 2-phase, build 4-phase
  • Bundled-Data/Dual-rail
  • Current orthodoxy bundled data is faster, lower
    power, smaller area with tolerancing task no
    worse than for a clocked design

14
Current Approach
  • QDI control
  • Bounded-Delay (bundled-data) datapath
  • 4-phase signalling
  • Amulet3i

15
Asynchronous HDLs
  • Conventional programming languages lack 3
    necessary constructs
  • communication
  • parallelism/concurrency
  • sharing (of hardware)
  • Conventional HDLs lack adequate
  • fine-grain concurrency
  • channel based communication primitives

16
Asynchronous HDLs 2
  • Tangram , Balsa
  • CSP based data types
  • based on underlying formal semantics
  • guarantees correct composition rules
  • easier composition than in sync circuits???
  • transparent compilation
  • each production rule in the language translates
    to an intermediate handshake circuit
  • allows designer to infer circuit costs
    performance from the program

17
Handshake Circuits - 1
  • Circuits communicate along channels
  • Channels connect ports at circuit interface
  • Ports have
  • Type
  • Direction
  • Sense

18
Handshake Circuits - 2
  • Port type determines the number of data wires
  • no data wires control only port!
  • Port direction is input, output or control only
  • Port sense
  • Active initiates transfers
  • Passive responds to requests

19
Micropipeline-Style Circuits
Push Circuits Circuit waits for data
req
req
data
data
cct
ack
ack
passive input
active output
20
Micropipeline-Style Circuits
Push Circuits data arrives
req
req
data
data
cct
ack
ack
21
Micropipeline-Style Circuits
Push Circuits data validity signalled
req
req
data
data
cct
ack
ack
22
Micropipeline-Style Circuits
Push Circuits circuit accepts data
req
req
data
data
cct
ack
ack
23
Micropipeline-Style Circuits
Push Circuits circuit signals data taken
req
req
data
data
cct
ack
ack
24
Micropipeline-Style Circuits
Push Circuits Circuit outputs data
req
req
data
data
cct
ack
ack
25
Micropipeline-Style Circuits
Push Circuits Circuit signals validity
req
req
data
data
cct
ack
ack
26
Micropipeline-Style Circuits
Push Circuits receiver takes data
req
req
data
data
cct
ack
ack
27
Micropipeline-Style Circuits
  • 4-phase protocol not detailed
  • Previous circuit decoupled input and ouput
  • implies a latch inside the handshake circuit
  • An alternative is for the input handshake to
    enclose the output handshake

28
Enclosed Handshake
Push Circuits data arrives
req
req
data
data
cct
ack
ack
29
Enclosed Handshake
Push Circuits data validity signalled
req
req
data
data
cct
ack
ack
30
Enclosed Handshake
Push Circuits circuit accepts data
req
req
data
data
cct
ack
ack
31
Enclosed Handshake
Push Circuits Circuit outputs data
req
req
data
data
cct
ack
ack
32
Enclosed Handshake
Push Circuits Circuit signals validity
req
req
data
data
cct
ack
ack
33
Enclosed Handshake
Push Circuits receiver takes data
req
req
data
data
cct
ack
ack
34
Enclosed Handshake
Push Circuits input handshake completes No
latch required
req
req
data
data
cct
ack
ack
35
Tangram Style Circuits
Pull Circuits active ported circuits/ control
driven
req
req
data
data
cct
ack
ack
active input port
36
Tangram Style Circuits
Pull Circuits Circuit demands data
req
req
data
data
cct
ack
ack
37
Tangram Style Circuits
Pull Circuits data is sent on demand
req
req
data
data
cct
ack
ack
38
Tangram Style Circuits
Pull Circuits data is accepted and can then be
released
req
req
data
data
cct
ack
ack
39
Balsa
  • Language for synthesising large async circuits
    systems
  • CSP/OCCAM background
  • Tangram-like
  • based on Tangram compilation function
  • compiles to a small (but expanding) set of
    handshake circuits
  • origins ESPRIT EXACT project

40
Balsa Language Features
  • Data types based on sequence of bits
  • Arrays and records are bit-based
  • Element extraction is by array slicing
  • Strict data typing
  • Structural iteration
  • Arrayed channels
  • Parameterised recursive functions

41
Balsa Language Features
  • Enclosed selection semantics
  • Allows passive ported circuits
  • Allows push (micropipeline-style) circuits
  • Allows unbuffered (latch-free) circuits
  • Can be considered a restricted form of Burns
    probe construct.

42
Balsa Source
43
Example Single Place Buffer
  • import balsa.types.basic
  • public
  • type word is 16 bits
  • procedure buffer (input i word output o
    word) is
  • local variable x word
  • begin
  • loop
  • i -gt x -- Input communication
  • o lt- x -- Output communication
  • end
  • end

library mechanism
visibility
type declaration
channel declarations
procedure definition
implies latch
repeat forever
sequential operation
read input channel into local variable x
output local variable x to output channel
44
Buffer Handshake Circuit
Single-place buffer
repeater
?
activation channel

sequencer

transferrer
i
o
x
T
T
variable
45
Buffer Handshake Circuit
Single-place buffer repeater is activated
?


i
o
x
T
T
46
Buffer Handshake Circuit
Single-place buffer Sequencer handshakes to left
transferrer
?


i
o
x
T
T
47
Buffer Handshake Circuit
Single-place buffer transferrer requests data
from environment
?


i
o
x
T
T
48
Buffer Handshake Circuit
Single-place buffer data transferred to variable
x
?


i
o
x
T
T
49
Buffer Handshake Circuit
Single-place buffer variable handshake completes
?


i
o
x
T
T
50
Buffer Handshake Circuit
Single-place buffer transferrer handshake
completes to environment
?


i
o
x
T
T
51
Buffer Handshake Circuit
Single-place buffer transferrer handshake
completes
?


i
o
x
T
T
52
Buffer Handshake Circuit
Single-place buffer Sequencer handshakes to right
transferrer
?


i
o
x
T
T
53
Buffer Handshake Circuit
Single-place buffer Transferrer reads variable
?


i
o
x
T
T
54
Buffer Handshake Circuit
Single-place buffer Transferrer outputs to
environment
?


i
o
x
T
T
55
Buffer Handshake Circuit
Single-place buffer handshakes complete
?


i
o
x
T
T
56
Buffer Handshake Circuit
Single-place buffer Sequencer completes its input
handshake
?


i
o
x
T
T
57
Buffer Handshake Circuit
Single-place buffer repeater initiates another
transfer, etc
o
58
Example Single Place Buffer
  • import balsa.types.basic
  • public
  • type word is 16 bits
  • procedure buffer (input i word output o
    word) is
  • local variable x word
  • begin
  • loop
  • i -gt x -- Input communication
  • o lt- x -- Output communication
  • end
  • end

59
Example 2-place buffer
  • import balsa.types.basic
  • import buffer1a
  • public
  • type word is 16 bits
  • procedure buffer2c (input i word output o
    word) is
  • local channel c word
  • begin
  • buffer (i, c)
  • buffer (c, o)
  • end

reuse component
internal channel connects two 1-place buffers
parallel composition
buffers connected by common signal name
60
2-place Buffer Handshake Circuit
61
2-place Buffer Handshake Circuit
par component




passivator
c
c
o
i
x
x
T
T
T
T
62
Peephole Optimisation
  • Composition of handshake circuits leads to
    inefficiencies at circuit boundaries
  • Straightforward peephole optimizations

63
2-place Buffer Handshake Circuit
par component




passivator
c
c
o
i
x
x
T
T
T
T
64
Optimized 2-place Buffer Circuit
?


??


control-only
i
x
x
T
T
65
The Repeater
  • Formal Definition
  • REP(a?,b?) (a? b?)

? denotes active port
denotes handshake enclosure
denotes repeat
? denotes passive port
66
The Repeater
  • Formal Definition
  • REP(a?,b?) (a? b?)
  • (a?? b??b??)
  • (ar?? br?? ba?? br?? ba??)

67
The Transferrer
  • Several Implementations
  • simplest wire-only

ar
aa
br
ca
cr
ba
datan
68
Balsa Toolkit -1
  • balsa-c
  • The compiler for the language
  • breeze2dot
  • Produces a postscript plot of the generated
    handshake circuits
  • breezecost
  • Reports the cost of the compiled circuit in
    arbitrary units

69
Balsa Toolkit -2
  • breeze2lard
  • The interface to the LARD simulation environment.
  • balsa source is translated to LARD
  • simple test harness is generated
  • balsa-md
  • An automatic makefile generation facility.
  • balsa-mgr
  • A GUI project manager

70
Mod-16 Counter (all even)
71
Bundled-Data Datapaths
  • Problems
  • random standard cell layout
  • mixed control datapath
  • timing analysis required
  • robustness of design reduced
  • Possible Solutions
  • DI codes
  • hybrid bundled DI
  • simpler timing analysis

72
DI Codes
  • Dual Rail (used in 1st Tangram system)
  • Can use standard cell approach without timing
    analysis
  • no need to distinguish between control data
  • abandoned in favour of bundled-data
  • area cost in extra wires
  • area time cost in completion detection
  • Tangram/Balsa generates push-pull pipelines with
    expensive synchronization

73
Generic Pipeline
  • Passivators join compiled procedure

passivator
74
Passivator Implementation
  • Bundled Data

ar
br
C
ba
aa
n-wide C-gate
datan
  • Dual Rail

d0
br
d1
n-bits wide
dn-1
ba
aa
75
DI Code Synchronizations
  • Expensive
  • need C-element synchronisation tree
  • A partial solution (not always possible/desirable)
    is
  • transform to push-style datapath
  • (not possible in Tangram only Balsa)

76
Push Pipeline
Passive input port
connector (wires-only)
77
Hybrid Solutions
  • Use DI coding within bundled datapath framework
  • e.g. use dual-rail carry signals within a
    conventional adder
  • early completion easily detected
  • Average-case performance
  • Only applicable to a few datapath operations

78
Simpler Timing Analysis
  • Separate control and datapath
  • generate regular, compiled, datapath
  • area improvement over standard cell (because of
    regular layout)
  • generate matched delay paths (c.f. self-timed
    PLAs)
  • must be able to recognize datapath
  • difficult control often contains datapath-like
    elements.
  • e.g. start at variables and work backwards ...

79
Datapath meets Control
  • Example Balsa case statement

1 hot encoding
data n bits wide
true/complement lines dual-rail expansion
80
Case Component
  • input from datapath
  • dual-rail simplifies internal logic
  • expansions parameterisable
  • encode component is similar
  • opposite of case with true/false expansion

81
Simpler Timing Analysis
  • Tool support required
  • use existing (non-Balsa) tools if possible
  • automatically add matched paths/delays to
    synthesised datapaths
  • Design own cells where appropriate
  • e.g. hybrid stages

82
Future Work
  • Provide support for DI, hybrid and
    datapath-compiled datapaths
  • even with datapath compilation, some datapath
    would still be standard cell
  • e.g. instruction decoder (control heavy)
  • datapath in control
  • cost of connecting separate blocks in layout
  • Test Design required (datapath heavy)

83
Tool Enhancement
  • balsa-c
  • support for attribution to select compilation
    mechanisms/ optimisation schemes
  • breeze2lard
  • new models
  • balsa-netlist
  • new tech-mapping descriptions
  • interface to datapath compilers

84
AMULET3i
  • Asynchronous macrocell
  • ARM compatible processor core
  • Full custom RAM
  • Compiled ROM
  • Balsa compiled DMA controller
  • Test I/F, synchronous and off-chip bus bridges
  • Synchronous peripherals
  • Designed by commercial partner ...

85
AMULET3 System
Periph1
Periph1
Periph1
CPU / RAM
Sync bridge
MARBLE
SOCB
ROM
DMAC
86
DMA Local RAM Access
Periph1
Periph1
Periph1
CPU / RAM
Sync bridge
MARBLE
SOCB
ROM
DMAC
87
DMA Peripheral Accesses
Periph1
Periph1
Periph1
CPU / RAM
Sync bridge
MARBLE
SOCB
ROM
DMAC
DMA requests
88
Requirements / Specification
  • 16 clients, 32 channels
  • 3 channel types - complicated register structure
  • Programmable client ? channel1 ? many mapping
  • Support synchronous requests
  • Transfers mostly between synchronous clients

89
Controller Structure
90
Two Controller Descriptions
  • Sequential (previous slides)
  • Very simple control flow
  • Requires two passes through register bank
  • Slow!, Only memory decoupling helps
  • Parallel (next slides)
  • Decouple TE actions from memory R/W with a new
    unit Transfer Interface
  • Interrupt the register bank on end of transfer

91
Parallel Design
92
The Design
  • 919 lines of Balsa describing register bank
    control, TE and TI.
  • Custom register banks and Synchronous Peripheral
    Interface
  • Miscellaneous glue standard cells
  • Register bank controllers
  • MARBLE interfaces
  • Compass Design Automation CAD

93
Implementation Technology
  • 0.35?m, 3LM CMOS
  • Standard cells from ARM Ltd.
  • Locally designed complex gates and asynchronous
    elements/gates.
  • Automated standard cell PR
  • Only essential and simple gate level
    optimisation (by hand)

94
Design Partitioning
Marble BUS outside of DMA controller
95
Design Partitioning
Balsa synthesised standard cells
96
Design Partitioning
Custom regular layout
97
Design Partitioning
Hand designed standard cells
98
DMA Controller Floor-Plan
Write a Comment
User Comments (0)
About PowerShow.com