A Deterministic Globally Asynchronous Locally Synchronous GALS Methodology for Validation, Debug, an - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

A Deterministic Globally Asynchronous Locally Synchronous GALS Methodology for Validation, Debug, an

Description:

Sources of nondeterminism. tPA Propagation delay ... To eliminate all nondeterminism, make the input sequence of each synchronous ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 61
Provided by: wwwunixE
Category:

less

Transcript and Presenter's Notes

Title: A Deterministic Globally Asynchronous Locally Synchronous GALS Methodology for Validation, Debug, an


1
A DeterministicGlobally Asynchronous Locally
Synchronous (GALS)Methodology for Validation,
Debug, and Test
  • Matthew Heath
  • University of Massachusetts Amherst
  • http//www-unix.ecs.umass.edu/mheath/
  • mheath_at_ecs.umass.edu
  • This research is funded by NSF grant 0204134 and
    SRC task 1075.

2
Outline
  • Globally Asynchronous Locally Synchronous (GALS)
    is a natural clocking style for SoCs
  • Each synchronous core is locally clocked
  • Asynchronous communication between cores
  • Existing GALS methodologies have limitations
  • Many are nondeterministic - bad for validation,
    debug, and test
  • Others achieve determinism by imposing
    environmental constraints which are valid for
    limited applications
  • Synchro-Tokens a novel deterministic GALS
    methodology
  • Flexible constraints for a wide variety of
    applications
  • Uses token rings to control local clocks and
    regulate data flow between clock domains
  • Current status verilog simulation has validated
    the concept
  • Future work circuit studies and formal methods

3
Modern synchronous design
PLL
Phase adjustment and frequency scaling
Low-skew global distribution
Wire delay
RC
logic
Flip-flop repeater
Clock domains or SoC cores
Low-skew local distribution
Logic delay wire delay on inter-domain
paths designed to avoid races critical paths
4
Clock design for SoCs
  • Fully synchronous isnt feasible for large SoCs
  • Difficult to design a chip-level clock with known
    skew
  • Pre-designed clocks of different cores may be
    incompatible
  • Cores run at different frequencies
  • Ratioed clocks cause critical path dependencies
    and impede dynamic frequency scaling
  • Chip-level timing convergence and flip-flop
    repeater placement
  • Fully asynchronous also has drawbacks
  • Not always better than synchronous due to
    handshaking overhead
  • Legacy cores are likely to be synchronous designs
  • Most tools handle async circuits inadequately or
    not at all
  • Many designers lack async design experience

5
GALSGlobally Asynchronous Locally Synchronous
Synchronous blocks
Local ring oscillator
Wire delay
Synchronizer
RC
logic
Asynchronous communication between blocks
Low-skew local distribution
D. Chapiro, Globally-Asynchronous
Locally-Synchronous Systems,PhD Thesis, Stanford
University, Report No. STAN-CS-84-1026, Oct.
1984.
6
Nondeterminism A behavioral view
I1 ADD R3, R1, R2 I2 MUL R5, R3, R4 I3 SUB R4,
R2, R1 I4 MOV R6, R3 I5 ADD R4, R3, R2
7
Simulated expectation
Simulated Expectation Clk Clock of RF SB Tester
observes RFstate after each Clk
8
Same sequence, different cycles
Simulated Expectation Clk Clock of RF SB Tester
observes RFstate after each Clk
Silicon Test 1 Results I3 exec/write 1 cycle
late I5 sequence delayed I2 I5 writes on time
9
Different sequence
Simulated Expectation Clk Clock of RF SB Tester
observes RFstate after each Clk
Silicon Test 2 Results I2 exec 1 cycle late I2
I5 writes swapped
Silicon Test 1 Results I3 exec/write 1 cycle
late I5 sequence delayed I2 I5 writes on time
10
More than one right answer
  • Architectural spec defines partially ordered
    sequence of events
  • Implementation is correct if it conforms to the
    spec
  • Single events can occur on nondeterministic clock
    cycles
  • Multiple events with no specified order can occur
    in a nondeterministic sequence
  • Many partially ordered sets of events induce a
    large number of possible correct event traces

11
Lack of a unique trace makes validation, debug,
and test much harder
  • Validating many possible traces requires
    computing resources
  • Analyzing whether a response is correct slows
    down debug
  • Finding all possible traces consumes test
    creation time
  • On-chip storage for BIST costs die area
  • Off-chip storage needs expensive tester memory
  • Comparing test results with all possible traces
    costs test time
  • If fault effect maps to another correct trace,
    coverage is lowered
  • Divide-and-conquer doesnt allow testing of
    entire chip at once
  • Waiting for the test to reach a naturally
    deterministic state provides insufficient
    observability

12
Nondeterminism A signal-level view
Sequence at output q
clk
q
clock cycle
  • Async data input switches after clock edge
  • Output switches after second clock edge

d
D1
D2
D1
1
D1
2
q
D1
D2
D2
3
clk
q
clock cycle
  • Async data input switches before first clock edge
  • Output switches after first clock edge

d
D1
D2
D1
1
D2
2
q
D1
D2
D2
3
clk
  • Data input switches very close to clock edge
  • Flip-flop goes metastable
  • Output resolves to a random value after a random
    time

q
clock cycle
d
D1
D2
D1
1
?
2
q
D1
D1 or D2
D2
D2
3
13
Sources of nondeterminism
The cycle of Clk_B on whicha signal transition
caused byClk_A is captured depends on
in
out
RC
logic
Wire Delay
  • tPA Propagation delay
  • Includes FF delay plus any combinational logic
  • May vary within one test on one chip due to
    data-dependence
  • tSkew nT Skew between local clocks, plus an
    integral number of clock cycles
  • May vary between test runs of one chip due to
    clock initialization and frequency shmoo
  • tWire Wire delay
  • May vary between chips due to process variation

Clk_A
Clk_B
Clk_A
out
in
Clk_B
tPA
tWire
tSetup
tSkew nT
14
Asynchronous signals sampled with unrelated
clocks are nondeterministic...
RC
logic
Double-flip-flop synchronizers
RC
Source-synchronous
RC
PLL
RC
encode
decode
Clock recovery from data
PLL
15
...regardless of what gets synchronized
D
RC
hand- shake logic
hand- shake logic
Handshaking with dual-rail data
RC
ACK
DATA
Handshaking with bundled data
REQ
hand- shake logic
hand- shake logic
ACK
16
Mutex Element (ME) hides metastabilitybut is
still nondeterministic
A2
Initial condition R1 R2 0 A1 A2 0 V1
V2 1
R1
V1
T1
Requests are acknowledged one at a time on a
first-come first-served basis
R2
V2
T2
A1
R1
During metastable period, V1 V2 lt Vt and A1
A2 0
R2
Metastable
A1
A2
17
Stoppable clocks
  • Ring oscillator, but not a PLL
  • Aligns clock to data to avoid metastability-induce
    d system failure
  • Clock stops while ME is metastable or while
    acknowledging data request
  • Each synchronous block has independent frequency
    and phase
  • Nondeterministic cant predict which clock
    cycle async request arrives
  • Muttersbach, Villiger, Fichtner, Practical
    Design of GALS Systems, ASYNC 00

To synchronous logic
Stoppable Clock
R1
A1
R2
A2
Req
Mutex
Ack
18
Self-timed FIFOs
Self-timed FIFO
Sync Block B
Sync Block A
Clk_B
Clk_A
ack
ack
ack
ack
req
req
req
req
  • Self-timed FIFOs pipeline the async communication
    channel
  • Use bundled data and careful timing (shown above)
  • Embed the request in dual-rail data
  • Same nondeterministic stoppable clock
  • Yun Dooply, Pausible Clocking-Based
    Heterogeneous Systems, Trans. VLSI, Dec. 99

19
Determinism by environmental constraints
  • Like all synchronous designs, each synchronous
    block produces deterministic state and output
    sequences in response to a given input sequence
  • To eliminate all nondeterminism, make the input
    sequence of each synchronous block deterministic
    by constraining the input data

20
Determinism for low-bandwidth I/O
Sync Block B
Sync Block A
Sync Block C
Clk_B
Clk_A
Clk_C
ack
ack
req
req
  • Accept new, asynchronous input only after a
    deterministic, synchronous local event has
    stopped the clock
  • Dont restart the clock until the input data has
    arrived
  • Nilsson Torkelson, A Monolithic Digital Clock
    Generator for On-Chip Clocking of Custom DSPs,
    JSSC, May 96

21
Determinism for constant I/O
Clk
Self-timed FIFO
Sync Block B
Sync Block A
ack
ack
ack
ack
req
req
req
req
  • Prevent FIFO from becoming empty or full
  • Initialize the FIFO to ½ full
  • Use global reference clock for exact frequency
    matching
  • Add and remove data at equal rates
  • Each end of the FIFO is effectively synchronized
    to the local clock
  • Greenstreet, Implementing a STARI Chip, 1995
    ICCD

22
Making GALS Deterministic
  • Each SB must receive each transition on each of
    its asynchronous inputs during a local clock
    cycle which is known in advance.
  • A transition must not be recognized if it occurs
    earlier than expected, and the local clock must
    stop to wait for a transition if it occurs later
    than expected.
  • Such complete knowledge is never available in
    practice
  • Its existence would imply that the inputs carry
    no information and thus arent even needed!
  • This knowledge can be inferred for all inputs if
    it is available for select inputs and if the
    timing relationship between those and all other
    inputs is known.

23
Types of Signals
Value Known?
Y
N
Asynchronous Data
Asynchronous Handshake
N
Transition Time Known?
Synchronous
Redundant
Y
24
Bundled Data
Async Handshake
Async Data
  • Use timing verification during design to ensure
    that the logic level of a data signal at the time
    of a transition of its associated handshake
    signal is deterministic
  • Easier than synchronous design because data
    signal and timing signal have the same source and
    destination, and thus can have similar routes

25
Master Handshake
  • All handshake signals with a common source SB and
    a common destination SB are bundled to a single
    master handshake signal
  • Timing verification ensures that the values of
    all bundled handshake signals at the time of a
    transition of the master handshake signal are
    deterministic

Master Handshake
Request Handshake
Bundled Data
Acknowledge Handshake
Bundled Data
26
Stoppable Clock
  • Use the master handshake signal as the
    asynchronous enable of a stoppable clock

clk
Synchronous Logic
SyncEn
D
Q
Clk
En
AsyncEn
SyncEn
AsyncEn
En
Clk
27
Synchro-Tokens Flexible constraints
  • Synchro-Tokens does NOT impose requirements on
    the outputs of other synchronous blocks
  • Instead, control logic added to the asynchronous
    inputs of blocks constrains them before they
    reach the synchronous logic
  • Ensures that asynchronous input transitions are
    captured on deterministic clock cycles
  • Uses token rings to control local clocks and
    regulate data flow between clock domains
  • No synchronizers ? zero probability of
    metastability failure
  • Flexible constraints for a wide variety of
    applications
  • Clocks dont stop for asynchronous data transfer
  • Time-varying asynchronous data rates are supported

28
Synchro-Tokens System Overview
Fifo Ifc
Node
  • Synchronous blocks (SB)

Fifo Ifcs
Node
Fifo Ifc
Node
En
Clk
Fifo Ifc
Node
Fifo Ifcs
Node
Node
Fifo Ifc
29
Synchro-Tokens System Overview
Fifo Ifc
Node
  • Synchronous blocks (SB)

Fifo Ifcs
  • Self-timed FIFOs for inter-block communication
  • Async / sync interfacein each SB

Node
Fifo Ifc
Node
En
Clk
Fifo Ifc
Node
Fifo Ifcs
Node
Node
Fifo Ifc
30
Synchro-Tokens System Overview
Fifo Ifc
Node
  • Synchronous blocks (SB)

Fifo Ifcs
  • Self-timed FIFOs for inter-block communication
  • Async / sync interfacein each SB

Node
Fifo Ifc
Node
  • One token ring for eachcommunicating SB pair
  • Any of FIFOs
  • Node in each SB
  • 1 link inverting for 2-phase handshake

En
Clk
Fifo Ifc
Node
Fifo Ifcs
Node
Node
Fifo Ifc
31
Synchro-Tokens System Overview
Fifo Ifc
Node
  • Synchronous blocks (SB)

Fifo Ifcs
  • Self-timed FIFOs for inter-block communication
  • Async / sync interfacein each SB

Node
Fifo Ifc
Node
  • One token ring for eachcommunicating SB pair
  • Any of FIFOs
  • Node in each SB
  • 1 link inverting for 2-phase handshake

En
Clk
Fifo Ifc
Node
Fifo Ifcs
Node
  • Internal blocks have localclock generators
    enabledby token ring nodes

Node
Fifo Ifc
32
Synchro-Tokens System Overview
Fifo Ifc
Node
  • Synchronous blocks (SB)

Fifo Ifcs
  • Self-timed FIFOs for inter-block communication
  • Async / sync interfacein each SB

Node
Fifo Ifc
Node
  • One token ring for eachcommunicating SB pair
  • Any of FIFOs
  • Node in each SB
  • 1 link inverting for 2-phase handshake

En
Clk
Fifo Ifc
Node
Fifo Ifcs
Node
  • Internal blocks have localclock generators
    enabledby token ring nodes

Node
Fifo Ifc
  • System I/O blocks are externallysynchronized

33
Token Ring NodesControl when token is received
and sent
Clock_En
Async Token Ring
TokenIn
Sync Block
FIFO_En
TokenOut
Clk
  • Node has two clock cycle counters
  • Decrement once per local clock
  • Initial values are programmable architectural
    parameters
  • Hold counter
  • Tracks time between receiving and sending token
  • Nonzero value enables clock AND interfaces of
    associated FIFOs
  • Recycle counter
  • Tracks how long after sending the token it is
    expected to return
  • Nonzero value enables clock but NOT FIFO
    interfaces

34
Stoppable ClockEnsures token received on a
deterministic clock cycle
  • Programmable frequency
  • Enabled by all nodes in the SB
  • If a token returns early, it is ignored by the
    node until the recycle count reaches zero
  • If recycle count reaches zero before token
    returns, the clock is synchronously disabled
  • When the late token arrives, the clock is
    asynchronously re-enabled
  • Counters and token are deterministically
    initialized
  • Token is always received on a deterministic clock
    cycle

Clock_En
TokenIn
Token Ring 2
FIFO_En
TokenOut
Clk
Node 2
Stoppable Clock
Clock_En
TokenIn
Token Ring 1
FIFO_En
TokenOut
Clk
Node 1
35
FIFO InterfacesEnsure deterministic data
accompanies token
Stoppable Clock
Clock_En
TokenIn
FIFO_En
Token Ring
TokenOut
Clk
  • FIFO handshakes use bundled data
  • Many data bits per req/ack pair
  • FIFO timing coupled to token ring
  • Arrival of token indicates async FIFO control
    data inputs have stabilized
  • Mutually exclusive FIFO access
  • FIFO interface enabled while associated node
    holds token
  • FIFO cant asynchronously become non-full or
    non-empty as a result of activity at other end
    (thus allowing nondeterministic data exchange)
  • FIFO shifts fast enough to exchange data on every
    local clock cycle

Node
En
Req
Clk
Self-Timed FIFO
Ack
Valid
Data
Full
Data
SB Output FIFO Interface
En
Req
Clk
Self-Timed FIFO
Ack
Read
Data
Empty
Data
SB Input FIFO Interface
36
Waveforms for One Node
TokenIn
TokenOut
Clk
Clock_En
FIFO_En
Hold Counter
0
3
2
1
4
3
2
4
0
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
37
Waveforms for One Node
A
TokenIn
TokenOut
Clk
Clock_En
FIFO_En
Hold Counter
0
3
2
1
4
3
2
4
0
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
The incoming token arrives early, but is not
received because the recycle counter is nonzero.
38
Waveforms for One Node
A
TokenIn
TokenOut
Clk
Clock_En
FIFO_En
Hold Counter
0
3
2
1
4
3
2
4
0
B
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
The recycle counter reaches zero.
39
Waveforms for One Node
A
TokenIn
TokenOut
Clk
Clock_En
FIFO_En
C
Hold Counter
0
3
2
1
4
3
2
4
0
B
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
FIFO_En is asserted to enable the FIFO
interfaces associated with the node.
40
Waveforms for One Node
A
TokenIn
TokenOut
Clk
Clock_En
FIFO_En
C
D
Hold Counter
0
3
2
1
4
3
2
4
0
B
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
The hold counter decrements on each local clock
cycle.
41
Waveforms for One Node
A
TokenIn
TokenOut
Clk
Clock_En
FIFO_En
C
D
E
Hold Counter
0
3
2
1
4
3
2
4
0
B
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
When the hold counter reaches zero...
42
Waveforms for One Node
A
TokenIn
F
TokenOut
Clk
Clock_En
FIFO_En
C
D
E
Hold Counter
0
3
2
1
4
3
2
4
0
B
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
...the token is sent out of the node...
43
Waveforms for One Node
A
TokenIn
F
TokenOut
Clk
Clock_En
G
FIFO_En
C
D
E
Hold Counter
0
3
2
1
4
3
2
4
0
B
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
and FIFO_En is de-asserted to disable the FIFO
interfaces.
44
Waveforms for One Node
A
TokenIn
F
TokenOut
Clk
Clock_En
G
FIFO_En
C
D
E
Hold Counter
0
3
2
1
4
3
2
4
0
B
H
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
The recycle counter decrements on each local
clock cycle.
45
Waveforms for One Node
A
TokenIn
F
TokenOut
Clk
I
Clock_En
G
FIFO_En
C
D
E
Hold Counter
0
3
2
1
4
3
2
4
0
B
H
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
Because the token hasnt arrived when the recycle
counter reaches zero, Clock_En is de-asserted...
46
Waveforms for One Node
A
TokenIn
F
TokenOut
J
Clk
I
Clock_En
G
FIFO_En
C
D
E
Hold Counter
0
3
2
1
4
3
2
4
0
B
H
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
...and the local clock stops synchronously.
47
Waveforms for One Node
A
TokenIn
K
F
TokenOut
J
Clk
I
Clock_En
G
FIFO_En
C
D
E
Hold Counter
0
3
2
1
4
3
2
4
0
B
H
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
When the late token arrives...
48
Waveforms for One Node
A
TokenIn
K
F
TokenOut
J
Clk
I
Clock_En
L
G
FIFO_En
C
D
E
Hold Counter
0
3
2
1
4
3
2
4
0
B
H
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
Clock_En is re-asserted and the local clock is
asynchronously re-enabled.
49
Waveforms for One Node
A
TokenIn
K
F
TokenOut
J
Clk
I
M
Clock_En
L
G
FIFO_En
C
D
E
Hold Counter
0
3
2
1
4
3
2
4
0
B
H
Recycle Counter
3
2
1
0
6
5
4
3
2
0
1
A late token at another node stops the clock to
the entire synchronous block, even if this node
is holding the token or recycling.
50
Results Out-of-order processor core
  • Implemented out-of-order processor core in
    verilog with variable, nonzero delays
  • Out-of-order engine and execution units in
    different blocks
  • Note which out-of-order engine clock cycle each
    instruction issues, reads, executes, and writes
  • Design 1 Fully synchronous
  • Deterministic, provided synchronous design rules
    are obeyed
  • Design 2 Fully asynchronous
  • Nondeterministic due to first-come, first-served
    bus arbitration
  • Design 3 Standard GALS
  • Nondeterministic due to synchronizers
  • Design 4 Synchro-tokens GALS
  • Deterministic!

51
Results Determinism Validation
  • Implemented a synchro-tokens system called
    Thrasher
  • Processes data with LFSR, bitwise logic, and
    arithmetic functions
  • No data hazards or partial orders just churn
    garbage data
  • Clocks only stop for late tokens, never for
    functional constraints
  • Chose nominal clock periods, FIFO token delays,
    hold recycle counts such that tokens always
    arrive just in time
  • Generate expected response
  • Simulate with different parameter combinations
  • Each delay can be 50, 75, 100, 150, or 200
    of nominal
  • 16,285 permutations
  • Observe state sequence in each SB on first 100
    local clocks
  • Exact matches on all states for all delay
    permutations shows system is deterministic!

52
Muller C-element
C
X
Z
Y
Y
X
Z
0
0
0
1
0
Hold State
0
1
Hold State
1
1
1
Z
X
Y
53
1-bit asynchronous shift register
  • Dual-rail data
  • Empty bit (neither 0 nor 1) is available to hold
    incoming data
  • To shift the chain
  • Assert Ack_in of the chain head to remove a data
    bit
  • Wait for data bubble to ripple backward to the
    chain tail
  • Assert Req0_in or Req1_in of the chain tail to
    add a data bit
  • Add extra empty cells to chain tail so reverse
    bubble propagation doesnt limit shifting
    frequency

C
C
Req0_in
Req0_out
Ack_out
Ack_in
C
C
Req1_in
Req1_out
54
Loadable C-element
C
X
Z
Y
X
Z
D
L
0
0
0

0
Y
1
0
Hold State

0
D
L
0
1
Hold State

0
1
1
1

0


0
0
1


1
1
1
0
Z
1
X
L
D
Y
55
Results 1-bit boundary scan cell
C
Req0_in
C
Req0_out
Ack_out
Ack_in
Req1_out
C
C
Req1_in
0
D_out
1
D_in
Drive
Update
Capture
56
Results Nondestructive ATPG scan cell
C
Req0_in
C
Req0_out
Ack_out
Ack_in
Req1_out
C
C
Req1_in
0
D_out
1
D_in
Update
Capture
Clk
57
Results Test Methodologies
Wrapper
I/O SB
SB
SB
SB
Internal TCK-Domain Scan Chain
Test FIFO
Test SB
Boundary Scan Chain
1149.1 TAP
System I/O
58
Future Work
  • SPICE simulations and timing analysis of
    synchro-tokens logic
  • Apply to a large system using FPGAs or a testchip
  • Investigate area, power, and performance impact
  • Investigate more aggressive protocol variations
  • Data-dependent, deterministically-varying hold
    recycle counts
  • Use local empty/full bits to keep FIFO interfaces
    enabled after releasing the token
  • Formal methods
  • Prove determinism
  • Show how to avoid deadlock

59
Summary
  • GALS is a natural clocking methodology for SoCs
  • Typical GALS designs are nondeterministic because
    asynchronous signals unpredictably transition
    before or after the sampling clock edge
  • A nondeterministic implementation which conforms
    to a higher-level specification is functionally
    correct
  • Nondeterminism makes validation, debug, and test
    harder because the expected response is not
    unique
  • Synchro-tokens eliminates nondeterminism by
    adding control logic to the interface of
    synchronous blocks so that asynchronous input
    transitions are captured on deterministic local
    clock cycles
  • Key components of synchro-tokens architecture
  • Token ring nodes, hold counters, and recycle
    counters control when tokens are received and
    sent
  • Stoppable clocks ensure tokens are received on
    deterministic clock cycles
  • FIFO interfaces ensure deterministic data
    accompanies the token
  • The synchro-tokens concept has been validated
    with HDL simulations
  • Future work circuit design and formal analyses

60
Time for Questions!
Write a Comment
User Comments (0)
About PowerShow.com