Title: SynchroTokens: Eliminating Nondeterminism to Enable ChipLevel Test of GloballyAsynchronous LocallySy
1Synchro-TokensEliminating Nondeterminismto
Enable Chip-Level Testof Globally-AsynchronousLo
cally-Synchronous SoCs
- Matthew W. Heath, Wayne P. Burleson
- University of Massachusetts Amherst, USA
- Ian G. Harris
- University of California Irvine, USA
This work was funded in part by NSF Grant No.
0204134, SRC Task 1075, and Intel Corporation.
2Introduction
- Definition of nondeterminism
- Nondeterminism in design and test
- Existing deterministic GALS methodologies
- Synchro-tokens methodology
- Why its deterministic
- Debug and test features
- Performance and area cost estimation
3Nondeterministic Sequences
clock cycle
Synchronizer
2
3
1
d q clk
Sequence at q
clk
Async input d switches just after clock rises
d
D1
D2
D1, D1, D2
q
D1
D2
Async input d switches just before clock rises
d
D1
D2
D1, D2, D2
q
D1
D2
Async input d switches as clock is rising q
goes metastable
d
D1
D2
D1, ?, D2
q
D1
D1 or D2
D2
4Nondeterminism in Design is OK
- Deterministic Engineer chooses at design time
- Worst-case timing analysis for synchronous design
- Nondeterministic Silicon chooses at run time
- Asynchronous arbiters or GALS synchronizers
Partially-Ordered Specification
Different Cycle Mapping
Conforming Implementation
Different Sequence
1
Event
Cycle
Event
Cycle
Event
Cycle
1
1
1
1
1
1
2A
2B
2B
2
2B
2
2B
2
3A
3
3A
3
2A
3
3A
3B
2A
2A
4
3A
4
3B
4
3B
3B
4
4
4
5
4
5
5Efficient Test Needs Determinism
- Expected response must be unique
- Compare observation with expectation
- Mismatch error
- Nondeterminism implies multiple correct behaviors
- Analyzing whether response is correct slows debug
- Finding all sequences consumes test creation time
- On-chip storage for BIST costs die area
- Off-chip storage needs expensive tester memory
- Comparing observations with multiple sequences
takes time - Faults causing alternate correct behavior lowers
coverage - Divide-and-conquer doesnt allow chip-level
testing - Waiting for naturally deterministic state
provides insufficient observability
6How to Make GALS Deterministic
- Each synchronous block (SB) has deterministic
response to a given block input sequence - To eliminate all nondeterminism in the GALS
system, make the input sequence of each SB
deterministic - Each SB must receive each transition on each of
its asynchronous inputs during a local clock
cycle which is known in advance - Transition not recognized if it occurs earlier
than expected - Local clock stops to wait for late transition
- Existing deterministic GALS methodologies do this
by constraining the async data to a certain
profile - Side effect of conservative metastability
prevention
7Low-Bandwidth Async Data (DSP)
- Nilsson Torkelson, A Monolithic Digital Clock
Generator for On-Chip Clocking of Custom DSPs,
JSSC, May 1996 - Asynchronous communication and local clocking are
mutually exclusive - Clock stops synchronously after a deterministic
local event - Clock restarts asynchronously after an input event
stop local clock
enable SB inputs
new async input data?
N
Y
sample async data
disable SB inputs
start local clock
local processing
8Constant Async Data (STARI)
Clk
Self-timed FIFO with bundled data
req
req
req
Output Port
Input Port
FIFO Stage
FIFO Stage
ack
ack
ack
data
data
data
Sync Block A
Sync Block B
- Greenstreet, Implementing a STARI Chip, 1995
ICCD - Initialize self-timed FIFO to half full
- Add and remove data at equal rates
- FIFO never asynchronously becomes empty or full
- Each end of FIFO synchronized to local clock
9Arbitrary Data (Synchro-Tokens)
Clock
Clock
Token Ring
Node
Node
Input Port
Output Port
FIFO Stage
FIFO Stage
data
data
data
Synchronous Block A
Synchronous Block B
req
req
req
Output Port
Input Port
FIFO Stage
FIFO Stage
ack
ack
ack
data
data
data
10Handshake Done in 1 Local Cycle
- Ensures that handshake delays do not cause the
availability of ports to be nondeterministic - Data need not pass through a port every cycle
- Output port doesnt receive acknowledge if FIFO
is full - Input port doesnt receive new request if FIFO is
empty
d1 d2 lt TA
d3 d4 lt TB
TA
TB
d1
d4
req
req
req
Output Port
Input Port
FIFO Stage
FIFO Stage
ack
ack
ack
d2
d3
data
data
data
11Data Bundled to its Handshake
- Ensures data received by FIFO stages and input
port are deterministic - Data may have to wait longer if receiver is
unavailable
d1 gt d2
d3 gt d4
d5 gt d6
d1
d3
d5
req
req
req
Output Port
Input Port
FIFO Stage
FIFO Stage
ack
ack
ack
data
data
data
d2
d4
d6
12Req Handshakes Bundled to Token
- Data ports enabled by the token
- Data added to tail of empty FIFO just before
departing token disables output port - Same data deterministically available at head of
FIFO when token enables input port
d1
Node
Node
Token Ring
d2
d3
d4
Synchronous Block A
Synchronous Block B
req
req
req
Output Port
Input Port
FIFO Stage
FIFO Stage
data
data
data
d1 gt d2 d3 d4
13Ack Handshakes Bundled to Token
- Data removed from head of full FIFO, leaving a
bubble - Token races bubble to other synchronous block
- FIFO space deterministically available for new
data as soon as arriving token enables output
port
Token Ring
Node
Node
d1
d5
d6
d7
Synchronous Block A
Synchronous Block B
Output Port
Input Port
FIFO Stage
FIFO Stage
ack
ack
ack
data
data
data
d1 gt d5 d6 d7
14Hold and Recycle Counters in Node
- Node counts local clock cycles between token
events - Hold counter cycles from capture to release
- Recycle counter cycles from release to next
capture - Decrementing counters typically start at lt 10
- Associated data ports enabled while holding token
- Local clock enabled while holding or recycling
Token Captured
Release Token
0
Start
gt0
Data Port Enable
Hold Counter
Clock Enable
To Stoppable Clock
Start
gt0
Node 1
Node 2
Recycle Counter
15Stoppable Clock for Internal SBs
- Not a gated clock
- Aligns local clock to token
- If token arrives early, ignore with SyncEn 1
- If token is late, stop clock synchronously at a
deterministic cycle with SyncEn ? 0 - When late token arrives, restart clock
asynchronously with AsyncEn ? 1
Nodes
Clk
SyncEn
En
AsyncEn
Ring Oscillator
Clk to SB
SyncEn
AsyncEn
Late Token
Early Token
En
Clk
16Externally Synchronized I/O SBs
- I/O block and environment are synchronized
- GALS chip provides synchronous output enable
- Environment must pause while Enable 0
I/O Synchronous Block
Board or Tester
Token Ring
Node 1
Enable
Node 2
data
Gated Clock
Token Ring
flip-flop
data
Clock
17Debug and Test Features
- Test synchronous block with 1149.1-compliant TAP
- Test clock pin
- Interlocked mode for deterministic test and debug
- Independent mode for board test and mission mode
- Scan chains
- Boundary scan, P1500, ATPG, BIST, custom features
- Self-timed shifting
- Head and tail synchronized to test clock
- Deterministic breakpointing
- Test block withholds tokens from other blocks
- After clocks stop, deterministically control and
observe internal state with scan chains - Scannable hold and recycle registers for
adjustable breakpoints
18Performance Cost of Determinism
- Worst case comparison with STARI
- Synchro-tokens assumptions
- Data added / removed every hold cycle
- Empty / fill half of FIFO each time token is held
- FIFO stage delay ¼ cycle
- Recycle count 2x hold count so clocks dont
stop - Throughput 1/3 of STARI
- Adjust parameters to optimize for expected
dataflow without risking nondeterminism
19Area Overhead Estimate
- Sum library cell areas divide by average
2-input gate - Two nodes per pair of communicating blocks
- In a system with N blocks, area lt 136N2
- Expect N O(10)
- To compare with other GALS
- Dont count data ports (always needed)
- Dont count FIFO stages (always optional)
2-Input Gates
136
Node
Data Port
13 4.5W
4 4.5W
FIFO Stage
20Conclusion
- Synchro-tokens a deterministic GALS methodology
- Asynchronous communication channels, bundled
data, optionally pipelined with self-timed FIFOs - Token rings control data ports and local clocks
- Hold and recycle counters control which local
clock cycles tokens arrive and depart - Clocks stop synchronously to wait for a late
token and restart asynchronously when the token
arrives - Test debug features compatible with IEEE
standards - Parameters can be chosen to optimize performance
- Assume a data profile without risking
nondeterminism - Modest area overhead for token ring nodes
21Time for Questions!