Title: EEL 5722 FPGA Design Fall 2003 SelfTimed FPGAs Part II
1EEL 5722FPGA DesignFall 2003Self-Timed
FPGAsPart II
2Phased Logic
Phased logic (PL) is a delay-insensitive design
methodology which supports the synchronous design
paradigm without using a clock or imposing timing
constraints. Timing and value information are
combined using a dual-rail encoding in which one
logical signal consists of a pair of conventional
signals. Any one of the dual-rail approaches can
be used, but phase logic is usually used with
level-encoding two-phase dual-rail scheme.
3Phased Logic
4PL Gates
A PL gate timing convention is based on
associating a phase with the gate. A gate can be
in the even or odd phase just like a signal. A
PL gate recalculates its output values when the
phases of all its inputs match the gates
phase. For example, if a gates phase is even,
the gate will recalculate its outputs, or fire,
when all of its inputs become even. A gate is
enabled when all input phases match the gates
phase. When an enabled gate fires, the gates
phase and the phase of the gates output signals
toggle to the opposite phase.
5PL Gates
6PL Gates
7PL Multi-Output Gates
A PL gate may have several outputs.
8PL Gate Constraints
The firing rule of PL gate can be summarized in
two constraints Internal Constraint The
gate fires if and only if it is enabled. The
firing is defined to occur when the first output
change is observed. External Constraint The
phase of each input and output of the gate
toggles once between the nth and (n 1)th
firings of the gate. In particular, after the
gate is enabled, the inputs do not change phase
again until the gates fires. After the gate
fires, all the outputs change phase before the
gate is re-enabled.
9Muller-C Element
The concept of indication or acknowledgement
plays an important role in ACs.
By observing the outputs of the OR gate, the
gate indicates or acknowledges only when both
inputs are 0. For other inputs, no indication can
be inferred. Similar arguments can be made for
the AND gate.
10Muller-C Element
A Muller C-element is a state-holding element
much like an asynchronous set-reset latch.
This element acknowledges when both inputs are
either 00 or 11. An observer seeing the output
of this element change can safely conclude that
both inputs are identical.
11PL LUT Design
A PL LUT can be designed efficiently by
separating control and compute logic. This
separation eliminates computation transitions for
cases when the gate fires, but the input values
remain constant. In this LUT design, the input
completion is detected using a Muller-C element
(C-gate) and the gate phase is held at its
output.
12PL LUT Design
When a new set of inputs is detected, the C-gate
toggles and causes the output latches to be
updated. The delay block is necessary to ensure
that the internal timing constraints are met.
13Delay Constraints in PL LUT
The D-latch must have a minimum pulse width on
the enable input. The D input of the latch
must be stable one setup time before the
trailing edge of the pulse.
The firing rule requires that the PL gate t and v
outputs change only once per gate firing. The
results is that the D inputs must arrive before
the latch is enabled.
14Delay Constraints in PL LUT
In order to insure that the new_v value is
defined before the latches are enabled, the
following inequality must hold Dcompute
Ddelay gt DLUT4
In order to insure that the final value of new_t
is latched, enforcing the requirement of a single
transition per PL gate fire, the following must
be true DG2 gt DG1
15Delay Constraints in PL LUT
Since the pulse width of the enable signal is
defined by the enable to Q delay of the latch
and the delays of gates G2 and G3, then DEN-Q
DG2 DG3 gt Tpulse-min
The worst case delay is given by the sum of the
completion logic delay, the delay block, delay of
G2, and the latch delay DPL-GATE Dcomplete
Ddelay DG2 DEN-Q
Note that the PL gate delay can be approximated
by the sum of the LUT4 and the latch delay.
16Power and Performance Comparisons
Two PL design examples are compared with their
clocked counterparts. The first example is a
32-bit accumulator with a synchronous clear. A
carry look-ahead (CLA) adder is used for the
clocked version and a ripple carry adder is used
for the PL version. The second example is a 16 x
16 iterative multiplier using a single adder. The
clocked version uses a CLA adder while the PL
version uses a ripple adder.
17Performance Comparisons
18Power Comparisons
VHDL simulations are instrumented to track
compute and control signal transitions. In both
versions (PL and clocked), a compute transition
is counted as any change of value on a LUT4
input. In the clocked version, a control
transition is counted as any arrival of the clock
edge at the D input of a FF.
19Power Comparisons
20Power and Performance
While PL designs can reduce compute transitions,
they can dramatically increase control
transitions since they impose control in every
cell, and every cell changes phases (control
transition) during a compute cycle. Because of
this overhead, it is critical that the ratio of
compute transitions to control transitions be
large enough.