Title: SiGe HBT BiCMOS Field Programmable Gate Arrays for Fast Reconfigurable Computing
1SiGe HBT BiCMOS Field Programmable Gate Arrays
for Fast Reconfigurable Computing
- Bryan S. Goda
- Rensselaer Polytechnic Institute
- Troy, New York
2Agenda
- Introduction
- BiCMOS FPGA History
- SiGe HBT BiCMOS Process
- Current Mode Logic
- Xilinx 6200 FPGA Design
- Configuration Memory
- Performance Results
- Conclusions and Future Work
3Current Role of SiGe
- More Zip per Chip
- Wireless Phones -gt Watch Sized Phone
- Direct Broadcast Satellite
- Fiber-Optic Lines, Switches, and Routers
4Programmable Bipolar Logic
- 1983 Fairchild ECL Field Programmable Logic
Array - Fuse Based
- 4ns Cycle Rate
- High Power
- Scaling Problems
- 1990 Algotronix 1.2uM 256 Cell Configurable
Logic Array - fT 6 GHz, 200ps Gate Delay
- 4 Transistor Static RAM Memory Cells
- ASIC Emulation and Signal Processing
- Forerunner of XC6200
5US Patent CMOS Switchable 2 Input Multiplexer
V
6SiGe Heterojunction Bipolar Transistor
- Selectively introduce Ge into the base of a Si
BJT - Smaller Base Bandgap increases e- injection,
higher Beta (100) - Higher Beta allows more heavily doped base RB
(125 Ohm) - Graded Bandgap decrease base transit time fT
7(No Transcript)
8SiGe HBT
- 50Ghz Process, 100Ghz process within a year (30uA
at 50 Ghz) - 5 layers of metal
- Used in RPI VLSI Class
- co-integrated with CMOS process
- can have HBT logic with CMOS memory
- low power and high speed
9f
Curves for Various Emitter Lengths
T
10SiGe HBT Layout
Emitter
Base
Collector Sub-Collector
11Band Diagram
Eg,Ge(x0)
Eg,Ge(xWb)-
Eg,Ge(x0)
Eg,Ge(grade)
0.031 ev
p-SiGe base
Drift Field
e-
EC
n Si emitter
h
EV
n- Si collector
Ge
Dielectric Constant Si 11.7 Ge 16.2 SiGe (7.5
Ge)12.03
p-Si
12CML Branch Current vs. Differential DC Voltage
13IBM SiGe and CMOS Load Gate Delays on M1, M2, LM
14Current Steering Logic
Vcc 0 V
Fastest Logic Level Limited Drive Capability
Level 1
-250 mV
-950 mV
Inter-block Signal Level Good Fan-Out (10)
Level 2
-1.2 V
-1.90 V
Clock Signal Slowest Level Level 4 Possible
Level 3
-2.15 V
Vee 4.5 V
15Current Steering Logic In SiGe
- 13ps Transistor Switching Time (75 Ghz)
- 6ps Process Next Year
- Small Voltage Swings (250mv) vs 3.3 or 5 V
- Less Power
- Smaller Swing Faster
- Steer Currents, Use Differential Logic
- Less Switch Noise
- Less Transistors needed, Complement Signal
Present - Flip-Flops and Multiplexers Easy to Implement
16Vcc
O V
CML XOR Logic Schematic
Level 1 0 -0.25 V
A XOR B
A
A XOR B
A
A
B
B
1 0 1 1 0 1 1 1 0
A level1
Level 2 -0.95 -1.2V
B level 2
Vref
0 0 0 1 1 0 0 1 0
1 0 1 0 1 1 1 0
Vee
-4.5V
A XOR B
17General FPGA Structure
I/O Cell
Logic Cell
Routing Network
Configuration Memory
18High Speed FPGA Applications
- Real Time Image Processing
- Radar
- Pattern Recognition
- Digital Networks
- Mobile Subscriber Equipment
- Command Information Systems
- High Speed Switching Nodes
- Control Systems
- Guidance Systems
- Reprogrammable Survivability
19Image Correlation
Search Image
Desired Image
1. Desired Image is programmed into chip (1
pixel 1CLB) 2. Load a section of search
image 3. If enough pixels match, then turn found
bit on 4. Load another section, or reprogram
with new desired image
20Samples From XC6200 CAD Tools
IO Blocks
CLBs
Pins
21FPGA Drawbacks
- Slowdown
- 200 Mhz Internal Speed down to 30-60 MHz
External - Pass Transistor Low Pass Filter
- Limited Bandwidth
- Relatively Long Configuration Times (Seconds)
- Vender Guarded Information
- More Expensive than Comparable ASIC
22Pass Transistor Interconnect Modeling
3
M
1
M
M
1
2
3
1
4
2
3
On
M
4
2
M
M
4
(Memory)
Interconnect
Pass Transistor
Equivalent Circuit from Node 3 to Node
2
23Field Programmable Gate Arrays (FPGA)
- Hierarchy Level Organization (Sea of Gates)
- Simple Cells (Configurable Logic Blocks)
- 4x4, 16x16, 64x64 groupings
- Hierarchy of routing resources at each level
- I/O Blocks (external interface)
24Design Parameters
- Logic Swings Levels
- Based on Differential Pair Switching
- Current Levels
- Redesign of the Configurable Logic Block
- Take Advantage of Differential Wiring
- What Parts Can be Turned off if not Used?
- Supply Levels
- How Many Levels of Logic?
- Routing Resources
- CMOS Voltage Levels
- Integrate CMOS into Bipolar Current Tree
25Current Tree with CMOS Routing
26Bipolar vs Bipolar/CMOS Current Trees
CMOS Bipolar
Pulse Width 50ps 60ps
70ps 100ps
2741 Multiplexer
Level 1 Inputs
Level 1 Output
Level 1 Output
Level 2 Input
Level 2 Input
Level 3 Input
Level 3 Input
CMOS Version
W/L 51
28Sample Logic Using Multiplexers
X1 a
A and B
X2 b
Y2
If a1 then select Y2 output b If a0 then
select Y3 output 0
1 0
Y3
X3 a
X1 a
A OR B
Y2
X2 a
If a1 then select Y2 output 1 If a0 then
select Y3 output b
1 0
Y3
X3 b
29Redesign of XC6200 Logic
X1 a
- Original XC6200 Design
- Have to Track Inversions
X2 b
Y2
1 0
Inverted Output
Y3
X3 a
X1 a
- Revised Design
- Use Differential Pair Logic
- Eliminate XC6200 Fast Logic
- No Inversion Tracking
Y2
X2b
1 0
Non-Inverted Output
Y3
X3 a
30X1
X2
Y2
1 0
CS Multiplexer
RP Multiplexer
C
F
S
D Q
Original XC6200 Architecture
X3
Y3
Clk
Q
Clr
X1
X2
Y2
1 0
CS Multiplexer
Redesigned Architecture
RP Multiplexer
C
F
S
D Q
X3
Y3
Bipolar with CMOS Routing
Clk
Q
Switchable
Clr
3110 Ghz Three CLB Simulation
32CLB Layout
41 Mux (off switchable) CMOS Control
Master/Slave Latch (off switchable)
(off switchable)
41 Mux High Speed Logic
21 Mux CMOS Control
Buffer
33Sample CLB Test Circuit
Vref
CLB
81 Mux
Vref
Buffer
8/1 Divide
Pad Drivers
34Actual Fabricated Test Circuit
Pads (110u x 110u)
35Outgoing CLB Routing
Incoming CLB Routing
N S E W N4 S4 E4 W4
X3
N S E W N4 S4 E4 W4
N S E W N4 S4 E4 W4
X1
X2
CLB
F
364x4 Block Boundary Routing
N Switches
N Switches
E Switches
E Switches
W Switches
W Switches
S Switches
S Switches
Length 4 FastLane (4x4) Length 16 Fastlane
(16x16) Chip Length Fastlane (64x64)
Local Routing Magic Routing
37Local CLB Routing
N S E W N4 S4 E4 W4
N S E F
X3
Eout
N S E W N4 S4 E4 W4
N S E W N4 S4 E4 W4
X1
X2
CLB
- Nearest Neighbor Routing
- Output (F) or Local Through
S E W F
F
Sout
Example Route East Signal Through to Next
CLB Note Cant Route Signal Back to Origin at
this Level
38Normal CMOS Memory-CML Interface
SRAM Bits
In Memory Planes
CMOS to CML Buffer
V
V
SS
SS
Data
CLB
Multiplexer
Inputs
V
REF
decode
New Configuration
V
EE
V
EE
39Memory Design
D Latch M/S 40 Transistors
D Latch M/S 18 Transistors
RAM Cell 6 Transistors Parallel Load
403-D Chip Stacking
Memory Planes
CLBs
- Shorter Wires
- More CLBs/Area
- Optimize Memory
41CLB with Routing and RAM (2)
CLB Select
RAM2
CLB
RAM1
MUX
MUX
MUX
MUX Selects
42Layout of Configurable Logic Block with 2 sets of
RAM
RAM
21 Mux
Circuit Elements 240 nfets 122 pfets 36
resistors 98 npn1 HBTs 16 npnhb1 HBTs
Master/Slave Latch (memory)
81Mux (routing) CMOS Selects
CLB (logic)
43SiGe Performance
Circuit Type
Buffer
CML
MUX
CLB
XOR,AND,OR
XOR,AND,OR
Propagation Delay
17ps
22-25ps
23-26ps
100ps
Power Decreasing Ideas
Date Idea Power Consumption/CLB Dec
98 Original CLB 73
mW June 99 CLB Redesign I 34 mW Aug
99 CLB Redesign II 24 mW Dec
99 Widlar Current Mirror with CMOS Control,
CMOS Routing 10.8 mW Mar
00 Supply Voltage 4.5 -gt 3.3V 7 mW Dec
00 7HP Process 0.3 mW
Projected Power Levels for 7HP Process At
50Ghz, 30 uA, 20x reduction in power
44Multiplexer Performance vs Temperature
Normal 250 mV Swing
200 mV Min Swing
45Vcc
Input
Vref
Vee
Widlar Current Mirror with CMOS Control
46XC6200 Design Improvements
- Developed at the University of Scotland
- Inversion of Signal at Every CLB
- Taken care of due to differential pair wiring
- No Pass Transistors, Use Multiplexers for
Routing - Able to turn off unused parts with CMOS
controlled current mirror - No CMOS-CML Conversion circuits needed, CMOS in
current trees - Handcrafted, dense layouts
- Context Switching
47Power Delay Product
1
5HP
PDP CMOS High
0.1
PDP CMOS Low
PDP BiCMOS
uW/gate/Mhz (log scale)
7HP
0.01
8HP
0.001
1998
1999
2000
2001
2002
Year
48Data Dependent Switching
Differential Logic has Complement Switching In
Opposite Direction
A
A
B
B
C
C
Slow Transition
Bit Line Twisting
Could Vary Signals Up to 30 Setup Time
Violations
A
A
B
B
C
C
Fast Transition
49Future Work
- Testing
- Overall FPGA Architecture
- Scaling
- Integrate with Other Systems
- Projected Graduation May 2001, work to
continue at USMA - Power Reduction
- 7HP Process
50CLB Context Switch Example
Pattern1 0001100100 70ps 7.1 GHz
Pattern2 1011011100 70ps
Select
AND OR AND
OR
0001100100 1011011100
0001000100 AND 1011111100 OR
51Redesigned CLB Cell with Routing and Memory (2x)
Three 8-1 Input Mux
2x24 Bit RAM
M1 M2 M3 M4
Four 4-1 Output Mux
CLB
52CLB Row 4x1
N/S Input Output
Memory Bus Lines
Circuit Elements 1520 Nfets 792 Pfets 260
Resistors 140 NPN1 HB 576 NPN1
Switch
53XC6200 Device Family
Device XC6209 XC6216 XC6236
XC6264 Gate Count 9-13K 16-24K
36-55K 64-100K Number Cells 2304
4096 9216 16384 I/O Blocks
192 256 384
512 Row x Col 48x48 64x64
96x96 128x128
54Typical Routing Delays
Symbol Parameter XC6200 SiGe Redesign
TNN Route Nearest Neighbor 1 ns
23 ps Tmagic Route X2/X3 to
Magic Out 1.5 ns 47 ps TL4
Length 4 FastLane 1.5 ns 47 ps TL16
Length 16 FastLane 2 ns 70 ps TCL64
Chip-Length (64) Delay 3 ns
94 ps 31x improvement
554x4 CLB Layout Cell
- Largest Basic
- Block
- Over 13,000
- Transistors
- Commercial
- Product Size is a 4x4 Array
- of this Cell
56(No Transcript)
57(No Transcript)
58Example High Speed Switch of 2 Incoming Signals
0 0 0 0 0 0 01 1 0 0 1 0 0 0 0 0 1 1 0 0 1 0 01
0 1 1 0 1 1 1 0 0 1 0 1 1 0 1 1 1 0 01 0 1 1 0 1
1 1 0 0 1 0 1 1
Pattern 2 1011011100
Pattern 1 0001100100
Switch Point
59(No Transcript)
605 Stage Ring Oscillator
Speed Relative to Schematic Current
Schematic 6.36 Ghz -- 8.4mA Parasitics 5.71
Ghz 89 8.6mA 50oC 5.26 Ghz 82 8.85
mA 75oC 4.87 Ghz 76 9.1 mA 100oC 4.16
Ghz 65 9.34 mA 125oC 3.12 Ghz 49 9.5 mA
61(No Transcript)
62(No Transcript)
63BiCMOS and CMOS Characteristics
Technology
Size, V threshold
Effective Size, Vdd
PDP Level
(uW/gate/MHz)
1998 CMOS
Ldrawn0.5u
Leff0.36u
Hi0.36
Vth0.87V
Vdd3.3V
Low0.2
2000 CMOS
Ldrawn0.25u
Leff0.18u
Hi0.18
Vth0.5V
Vdd2.5V
Low0.08
2002 CMOS
Ldrawn0.22u
Leff0.12u
Hi0.1
Vth0.4V
Vdd1.8V
Low0.05
1999 BiCMOS 5HP
Vbe0.85V
Vdd4.5V
0.36
2000 BiCMOS 7HP
TBD
TBD
0.01