SynchroTokens: Eliminating Nondeterminism to Enable ChipLevel Test of GloballyAsynchronous LocallySy - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

SynchroTokens: Eliminating Nondeterminism to Enable ChipLevel Test of GloballyAsynchronous LocallySy

Description:

ack. ack. ack. Handshake Done in 1 Local Cycle ... Ack Handshakes Bundled to Token. Data removed from head of full FIFO, leaving a bubble ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 22
Provided by: matth67
Category:

less

Transcript and Presenter's Notes

Title: SynchroTokens: Eliminating Nondeterminism to Enable ChipLevel Test of GloballyAsynchronous LocallySy


1
Synchro-TokensEliminating Nondeterminismto
Enable Chip-Level Testof Globally-AsynchronousLo
cally-Synchronous SoCs
  • Matthew W. Heath, Wayne P. Burleson
  • University of Massachusetts Amherst, USA
  • Ian G. Harris
  • University of California Irvine, USA

This work was funded in part by NSF Grant No.
0204134, SRC Task 1075, and Intel Corporation.
2
Introduction
  • Definition of nondeterminism
  • Nondeterminism in design and test
  • Existing deterministic GALS methodologies
  • Synchro-tokens methodology
  • Why its deterministic
  • Debug and test features
  • Performance and area cost estimation

3
Nondeterministic Sequences
clock cycle
Synchronizer
2
3
1
d q clk
Sequence at q
clk
Async input d switches just after clock rises
d
D1
D2
D1, D1, D2
q
D1
D2
Async input d switches just before clock rises
d
D1
D2
D1, D2, D2
q
D1
D2
Async input d switches as clock is rising q
goes metastable
d
D1
D2
D1, ?, D2
q
D1
D1 or D2
D2
4
Nondeterminism in Design is OK
  • Deterministic Engineer chooses at design time
  • Worst-case timing analysis for synchronous design
  • Nondeterministic Silicon chooses at run time
  • Asynchronous arbiters or GALS synchronizers

Partially-Ordered Specification
Different Cycle Mapping
Conforming Implementation
Different Sequence
1
Event
Cycle
Event
Cycle
Event
Cycle
1
1
1
1
1
1
2A
2B
2B
2
2B
2
2B
2
3A
3
3A
3
2A
3
3A
3B
2A
2A
4
3A
4
3B
4
3B
3B
4
4
4
5
4
5
5
Efficient Test Needs Determinism
  • Expected response must be unique
  • Compare observation with expectation
  • Mismatch error
  • Nondeterminism implies multiple correct behaviors
  • Analyzing whether response is correct slows debug
  • Finding all sequences consumes test creation time
  • On-chip storage for BIST costs die area
  • Off-chip storage needs expensive tester memory
  • Comparing observations with multiple sequences
    takes time
  • Faults causing alternate correct behavior lowers
    coverage
  • Divide-and-conquer doesnt allow chip-level
    testing
  • Waiting for naturally deterministic state
    provides insufficient observability

6
How to Make GALS Deterministic
  • Each synchronous block (SB) has deterministic
    response to a given block input sequence
  • To eliminate all nondeterminism in the GALS
    system, make the input sequence of each SB
    deterministic
  • Each SB must receive each transition on each of
    its asynchronous inputs during a local clock
    cycle which is known in advance
  • Transition not recognized if it occurs earlier
    than expected
  • Local clock stops to wait for late transition
  • Existing deterministic GALS methodologies do this
    by constraining the async data to a certain
    profile
  • Side effect of conservative metastability
    prevention

7
Low-Bandwidth Async Data (DSP)
  • Nilsson Torkelson, A Monolithic Digital Clock
    Generator for On-Chip Clocking of Custom DSPs,
    JSSC, May 1996
  • Asynchronous communication and local clocking are
    mutually exclusive
  • Clock stops synchronously after a deterministic
    local event
  • Clock restarts asynchronously after an input event

stop local clock
enable SB inputs
new async input data?
N
Y
sample async data
disable SB inputs
start local clock
local processing
8
Constant Async Data (STARI)
Clk
Self-timed FIFO with bundled data
req
req
req
Output Port
Input Port
FIFO Stage
FIFO Stage
ack
ack
ack
data
data
data
Sync Block A
Sync Block B
  • Greenstreet, Implementing a STARI Chip, 1995
    ICCD
  • Initialize self-timed FIFO to half full
  • Add and remove data at equal rates
  • FIFO never asynchronously becomes empty or full
  • Each end of FIFO synchronized to local clock

9
Arbitrary Data (Synchro-Tokens)
Clock
Clock
Token Ring
Node
Node
Input Port
Output Port
FIFO Stage
FIFO Stage
data
data
data
Synchronous Block A
Synchronous Block B
req
req
req
Output Port
Input Port
FIFO Stage
FIFO Stage
ack
ack
ack
data
data
data
10
Handshake Done in 1 Local Cycle
  • Ensures that handshake delays do not cause the
    availability of ports to be nondeterministic
  • Data need not pass through a port every cycle
  • Output port doesnt receive acknowledge if FIFO
    is full
  • Input port doesnt receive new request if FIFO is
    empty

d1 d2 lt TA
d3 d4 lt TB
TA
TB
d1
d4
req
req
req
Output Port
Input Port
FIFO Stage
FIFO Stage
ack
ack
ack
d2
d3
data
data
data
11
Data Bundled to its Handshake
  • Ensures data received by FIFO stages and input
    port are deterministic
  • Data may have to wait longer if receiver is
    unavailable

d1 gt d2
d3 gt d4
d5 gt d6
d1
d3
d5
req
req
req
Output Port
Input Port
FIFO Stage
FIFO Stage
ack
ack
ack
data
data
data
d2
d4
d6
12
Req Handshakes Bundled to Token
  • Data ports enabled by the token
  • Data added to tail of empty FIFO just before
    departing token disables output port
  • Same data deterministically available at head of
    FIFO when token enables input port

d1
Node
Node
Token Ring
d2
d3
d4
Synchronous Block A
Synchronous Block B
req
req
req
Output Port
Input Port
FIFO Stage
FIFO Stage
data
data
data
d1 gt d2 d3 d4
13
Ack Handshakes Bundled to Token
  • Data removed from head of full FIFO, leaving a
    bubble
  • Token races bubble to other synchronous block
  • FIFO space deterministically available for new
    data as soon as arriving token enables output
    port

Token Ring
Node
Node
d1
d5
d6
d7
Synchronous Block A
Synchronous Block B
Output Port
Input Port
FIFO Stage
FIFO Stage
ack
ack
ack
data
data
data
d1 gt d5 d6 d7
14
Hold and Recycle Counters in Node
  • Node counts local clock cycles between token
    events
  • Hold counter cycles from capture to release
  • Recycle counter cycles from release to next
    capture
  • Decrementing counters typically start at lt 10
  • Associated data ports enabled while holding token
  • Local clock enabled while holding or recycling

Token Captured
Release Token
0
Start
gt0
Data Port Enable
Hold Counter
Clock Enable
To Stoppable Clock
Start
gt0
Node 1
Node 2
Recycle Counter
15
Stoppable Clock for Internal SBs
  • Not a gated clock
  • Aligns local clock to token
  • If token arrives early, ignore with SyncEn 1
  • If token is late, stop clock synchronously at a
    deterministic cycle with SyncEn ? 0
  • When late token arrives, restart clock
    asynchronously with AsyncEn ? 1

Nodes
Clk
SyncEn
En
AsyncEn
Ring Oscillator
Clk to SB
SyncEn
AsyncEn
Late Token
Early Token
En
Clk
16
Externally Synchronized I/O SBs
  • I/O block and environment are synchronized
  • GALS chip provides synchronous output enable
  • Environment must pause while Enable 0

I/O Synchronous Block
Board or Tester
Token Ring
Node 1
Enable
Node 2
data
Gated Clock
Token Ring
flip-flop
data
Clock
17
Debug and Test Features
  • Test synchronous block with 1149.1-compliant TAP
  • Test clock pin
  • Interlocked mode for deterministic test and debug
  • Independent mode for board test and mission mode
  • Scan chains
  • Boundary scan, P1500, ATPG, BIST, custom features
  • Self-timed shifting
  • Head and tail synchronized to test clock
  • Deterministic breakpointing
  • Test block withholds tokens from other blocks
  • After clocks stop, deterministically control and
    observe internal state with scan chains
  • Scannable hold and recycle registers for
    adjustable breakpoints

18
Performance Cost of Determinism
  • Worst case comparison with STARI
  • Synchro-tokens assumptions
  • Data added / removed every hold cycle
  • Empty / fill half of FIFO each time token is held
  • FIFO stage delay ¼ cycle
  • Recycle count 2x hold count so clocks dont
    stop
  • Throughput 1/3 of STARI
  • Adjust parameters to optimize for expected
    dataflow without risking nondeterminism

19
Area Overhead Estimate
  • Sum library cell areas divide by average
    2-input gate
  • Two nodes per pair of communicating blocks
  • In a system with N blocks, area lt 136N2
  • Expect N O(10)
  • To compare with other GALS
  • Dont count data ports (always needed)
  • Dont count FIFO stages (always optional)

2-Input Gates
136
Node
Data Port
13 4.5W
4 4.5W
FIFO Stage
20
Conclusion
  • Synchro-tokens a deterministic GALS methodology
  • Asynchronous communication channels, bundled
    data, optionally pipelined with self-timed FIFOs
  • Token rings control data ports and local clocks
  • Hold and recycle counters control which local
    clock cycles tokens arrive and depart
  • Clocks stop synchronously to wait for a late
    token and restart asynchronously when the token
    arrives
  • Test debug features compatible with IEEE
    standards
  • Parameters can be chosen to optimize performance
  • Assume a data profile without risking
    nondeterminism
  • Modest area overhead for token ring nodes

21
Time for Questions!
Write a Comment
User Comments (0)
About PowerShow.com