Clockless Computing - PowerPoint PPT Presentation

About This Presentation
Title:

Clockless Computing

Description:

Idea: move completion detector before processing block ... Completion Detection: performed in parallel with evaluation/precharge of stage. N evaluates ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 39
Provided by: Montek5
Learn more at: http://www.cs.unc.edu
Category:

less

Transcript and Presenter's Notes

Title: Clockless Computing


1
Clockless Computing
  • Montek Singh
  • Thu, Sep 13, 2007

2
Dynamic Logic Pipelines (contd.)
  • Drawbacks of Williams PS0 Pipelines
  • Lookahead Pipelines Singh/Nowick 2000
  • High-Capacity Pipelines Singh/Nowick 2000

3
Drawbacks of PSO Pipelining
  • Poor throughput
  • long cycle time 6 events per cycle
  • data tokens are forced far apart in time
  • Limited storage capacity
  • max only 50 of stages can hold distinct tokens
  • data tokens must be separated by at least one
    spacer
  • My Research Goals have been address both issues
  • still maintain very low latency

4
Recent Approaches
  • 3 novel styles for high-speed async pipelining
  • MOUSETRAP Pipelines Singh/Nowick, TAU-00,
    ICCD-01
  • Lookahead Pipelines (LP) Singh/Nowick,
    Async-00
  • High-Capacity Pipelines (HC) Singh/Nowick,
    WVLSI-00
  • Goal significantly improve throughput of PS0
  • Two Distinct Strategies
  • LP introduce protocol optimizations
  • shave off components from critical cycle
  • HC fundamentally new protocol
  • greater concurrency loosely-coupled stages

?
?
5
Outline
  • New Asynchronous Pipelines
  • MOUSETRAP Pipelines
  • Lookahead Pipelines (LP)
  • High-Capacity Pipelines (HC)

6
Lookahead Pipeline Styles
  • Singh and Nowick
  • Async-2000
  • Best Paper Award

7
Lookahead Pipelines Strategy 1
  • Use non-neighbor communication
  • stage receives information from multiple later
    stages
  • allows early evaluation

Benefit stage gets head-start on next cycle
8
Lookahead Pipelines Strategy 2
  • Use early completion detection
  • completion detector moved before stage (not
    after)
  • stage indicates early done in parallel with
    computation

early completion detector
Benefit again, stage gets head-start on next
cycle
9
Lookahead Pipelines Overview
  • 5 New Designs
  • Dual-Rail Data Signaling
  • LP3/1 early evaluation
  • LP2/2 early done
  • LP2/1 early evaluation early done
  • Single-Rail Bundled-Data Signaling
  • LPSR2/2 early done
  • LPSR2/1 early evaluation early done

10
Dual-Rail Design 1 LP3/1
PC
Eval
Data in
Data out
N
N1
N2
ProcessingBlock
Completion Detector
From N2
  • Optimization early evaluation
  • each stage has two control inputs from stages
    N1 and N2
  • Idea shorten precharge phase
  • terminate precharge early when N2 is done
    evaluating

11
LP3/1 Protocol
  • PRECHARGE N when N1 completes evaluation
  • EVALUATE N when N2 completes evaluation

N2 indicates done
N
N1
N2
N2 evaluates
N evaluates
N1 evaluates
12
LP3/1 Comparison with PS0
N
N1
N2
LP3/1
Only 4 events in cycle!
N
N1
N2
PS0
6 events in cycle
13
LP3/1 Performance
saved path
Savings over PS0 1 Precharge 1 Completion
Detection
14
LP3/1 Inside a Stage
  • Merging 2 Control Inputs

old Eval
early Eval
  • Timing Issues
  • must satisfy several simple constraints
  • Ex. PC must arrive before Eval de-asserted
  • 1-sided timing requirement
  • easily satisfied in practice

15
Dual-Rail Design 2 LP2/2
  • Optimization early done
  • Idea move completion detector before processing
    block
  • stage indicates when about to precharge/evaluate

early Completion Detector
early done
Data in
Data out
Processing Block
16
LP2/2 Completion Detector
  • Modified completion detectors needed
  • Done1 when stage starts evaluating, and inputs
    valid
  • Done0 when stage starts precharging
  • asymmetric C-element

17
LP2/2 Protocol
  • Completion Detection
  • performed in parallel with evaluation/precharge
    of stage

N
N1
N2
N evaluates
N1 evaluates
18
LP2/2 Performance
4
1
2
LP2/2 savings over PS0 1 Evaluation 1
Precharge
19
Dual-Rail Design 3 LP2/1
  • Hybrid of LP3/1 and LP2/2. Combines
  • early evaluation of LP3/1
  • early done of LP2/2

20
Lookahead Pipelines Overview
  • 5 New Designs
  • Dual-Rail Data Signaling
  • LP3/1 early evaluation
  • LP2/2 early done
  • LP2/1 early evaluation early done
  • Single-Rail Bundled-Data Signaling
  • LPSR2/2 early done
  • LPSR2/1 early evaluation early done

21
Single-Rail Design LPSR2/1
  • Derivative of LP2/1, adapted to single-rail
  • bundled-data matched delays instead of
    completion detectors

22
Inside an LPSR2/1 Stage
23
LPSR2/1 Protocol
N
N1
N2
N evaluates
24
FIFO Results (simulations)
0.19? CMOS 3.3 V, 300K
dual-rail
single-rail
  • LP dual-rail over 80 faster than Williams PS0
  • comparable latency
  • LP single-rail even faster

25
Practicality of Gate-Level Pipelining
When datapath is wide
  • Can often split into narrow streams
  • Use localized completion detector
  • for each stream
  • need to examine only a few bits
  • ? small fan-in
  • send done to only a few gates
  • ? small fan-out
  • comp. det. fairly low cost!

26
High-Capacity Pipelines
  • Singh/Nowick WVLSI-00, ISSCC-02, Async-02

27
HC Pipeline Style
  • High-Capacity Pipelines (HC)
  • bundled datapaths dynamic logic function blocks
  • latch-free no explicit latches needed
  • dynamic logic provides implicit latching
  • novel highly-concurrent protocol maximizes
    storage capacity
  • traditional latch-free approaches spacers
    limit capacity to 50
  • Key Idea Obtain greater control of stages
    operation
  • separate control of pull-up/pull-down
  • result new isolate phase
  • stage holds outputs/impervious to input changes
  • Advantage Each stage can hold a distinct data
    item
  • 100 storage capacity
  • Extra Benefit Obtain greater concurrency
  • ? High throughput

28
HC Basic Structure
  • Key Idea
  • 2 independent control signals
  • pc controls precharge
  • eval controls evaluation
  • Allows novel 3-phase cycle
  • Evaluate
  • Isolate (hold)
  • Precharge

pc
eval
ack
delay
delay
delay
N
N1
N2
29
HC Inside a Stage
  • Independent Controls of pull-up and pull-down
  • allows new 3rd phase isolate
  • pc asserted precharge
  • eval asserted evaluate
  • pc and eval de-asserted enter isolate (hold)
    phase

30
HC Protocol
Stage N
Stage N1
Eval
X
Isolate
Precharge
  • Our protocol only 2 synchronization arcs
  • only 1 backward arc
  • once stage N1 evaluates, N can complete entire
    next cycle!
  • Most Existing Protocols 3 synchronization arcs
  • 1 forward arc data dependency
  • 2 backward arcs control synchronization

31
Formal Specification of Controller
  • Problem Specification too concurrent for direct
    synthesis
  • desired precharge condition N and N1 have
    evaluated same data
  • problem this condition not uniquely captured by
    given signals!
  • N may evaluate next data item, while N1 stuck on
    current item!

32
Modified Specification of Controller
  • Solution Add a state variable ok2pc
  • ok2pc records whether N1 has absorbed Ns data
    item
  • ok2pc resets immediately when N deletes item (N
    precharges)
  • ok2pc is set when N1 deletes item (N1
    precharges)

33
Controller implementation
  • Controller implementation is very simple
  • each signal implemented using a single gate
  • ok2pc typically off the critical path

34
HC Stage Implementation
NAND
INV
eval
pc
ack
req
done
delay
35
HC Operation
N enables itself for next evaluation
N
N1
N evaluates
N precharges
N1 starts to evaluate
Cycle Time 8 CMOS gate delays
36
Performance
N
N1
N2
N enables itself for next evaluation
N precharges
N evaluates
N1 evaluates
37
FIFO Results (simulations)
0.19? CMOS 3.3 V, 300K
dual-rail
single-rail
  • LP dual-rail over 80 faster than Williams PS0
  • comparable latency
  • LP single-rail even faster

38
Fabricated Chip HC FIFO
  • 2.5 GHz in 0.18u
Write a Comment
User Comments (0)
About PowerShow.com