VLSI Design Challenges for Gigascale Integration - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

VLSI Design Challenges for Gigascale Integration

Description:

for Gigascale Integration Shekhar Borkar Intel Corp. October 25, 2005 Outline Technology scaling challenges Circuit and design solutions Microarchitecture advances ... – PowerPoint PPT presentation

Number of Views:225
Avg rating:3.0/5.0
Slides: 26
Provided by: DavidN112
Category:

less

Transcript and Presenter's Notes

Title: VLSI Design Challenges for Gigascale Integration


1
VLSI Design Challenges for Gigascale Integration
  • Shekhar Borkar
  • Intel Corp.
  • October 25, 2005

2
Outline
  • Technology scaling challenges
  • Circuit and design solutions
  • Microarchitecture advances
  • Multi-everywhere
  • Summary

3
Goal 10 TIPS by 2015
Pentium 4 Architecture
Pentium Pro Architecture
Pentium Architecture
486
386
286
8086
4
Technology Scaling
GATE
Xj
DRAIN
SOURCE
D
Tox
BODY
Leff
Dimensions scale down by 30 Doubles transistor density
Oxide thickness scales down Faster transistor, higher performance
Vdd Vt scaling Lower active power
Scaling will continue, but with challenges!
5
Technology Outlook
High Volume Manufacturing 2004 2006 2008 2010 2012 2014 2016 2018
Technology Node (nm) 90 65 45 32 22 16 11 8
Integration Capacity (BT) 2 4 8 16 32 64 128 256
Delay CV/I scaling 0.7 0.7 gt0.7 Delay scaling will slow down Delay scaling will slow down Delay scaling will slow down Delay scaling will slow down Delay scaling will slow down
Energy/Logic Op scaling gt0.35 gt0.5 gt0.5 Energy scaling will slow down Energy scaling will slow down Energy scaling will slow down Energy scaling will slow down Energy scaling will slow down
Bulk Planar CMOS High Probability Low Probability High Probability Low Probability High Probability Low Probability High Probability Low Probability High Probability Low Probability High Probability Low Probability High Probability Low Probability High Probability Low Probability
Alternate, 3G etc Low Probability High Probability Low Probability High Probability Low Probability High Probability Low Probability High Probability Low Probability High Probability Low Probability High Probability Low Probability High Probability Low Probability High Probability
Variability Medium High Very High Medium High Very High Medium High Very High Medium High Very High Medium High Very High Medium High Very High Medium High Very High Medium High Very High
ILD (K) 3 lt3 Reduce slowly towards 2-2.5 Reduce slowly towards 2-2.5 Reduce slowly towards 2-2.5 Reduce slowly towards 2-2.5 Reduce slowly towards 2-2.5 Reduce slowly towards 2-2.5
RC Delay 1 1 1 1 1 1 1 1
Metal Layers 6-7 7-8 8-9 0.5 to 1 layer per generation 0.5 to 1 layer per generation 0.5 to 1 layer per generation 0.5 to 1 layer per generation 0.5 to 1 layer per generation
6
The Leakage(s)
7
Must Fit in Power Envelope
)
1400
2
SiO2 Lkg
10 mm Die
1200
SD Lkg
Active
1000
800
Power (W), Power Density (W/cm
600
400
200
0
90nm
65nm
45nm
32nm
22nm
16nm
8
Solutions
  • Move away from Frequency alone to deliver
    performance
  • More on-die memory
  • Multi-everywhere
  • Multi-threading
  • Chip level multi-processing
  • Throughput oriented designs
  • Valued performance by higher level of integration
  • Monolithic Polylithic

9
Leakage Solutions
For a few generations, then what?
10
Active Power Reduction
11
Leakage Control
12
Optimum Frequency
  • Maximum performance with
  • Optimum pipeline depth
  • Optimum frequency

13
Memory Latency
Memory
CPU
Cache
Small few Clocks
Large 50-100ns
Assume 50ns Memory latency
Cache miss hurts performance Worse at higher
frequency
14
Increase on-die Memory
  • Large on die memory provides
  • Increased Data Bandwidth Reduced Latency
  • Hence, higher performance for much lower power

15
Multi-threading
Thermals Power Delivery designed for full HW
utilization
Single Thread
Full HW Utilization
ST
Wait for Mem
Multi-Threading
Wait for Mem
MT1
Wait
MT2
MT3
Multi-threading improves performance without
impacting thermals power delivery
16
Single Core Power/Performance
Moores Law ? more transistors for advanced
architectures Delivers higher peak
performance But Lower power efficiency
17
Chip Multi-Processing
C1
C2
Cache
C3
C4
  • Multi-core, each core Multi-threaded
  • Shared cache and front side bus
  • Each core has different Vdd Freq
  • Core hopping to spread hot spots
  • Lower junction temperature

18
Dual Core
Rule of thumb
Voltage Frequency Power Performance
1 1 3 0.66
In the same process technology
Voltage 1 Freq 1 Area 1 Power
1 Perf 1
Voltage -15 Freq -15 Area
2 Power 1 Perf 1.8
19
Multi-Core
Power
Power 1/4
4
Performance
Performance 1/2
3
2
2
1
1
1
1
4
4
Multi-Core Power efficient Better power and
thermal management
3
3
2
2
1
1
20
Special Purpose Hardware
TCP/IP Offload Engine
2.23 mm X 3.54 mm, 260K transistors
Opportunities Network processing engines MPEG
Encode/Decode engines, Speech engines
Special purpose HW provides best Mips/Watt
21
Performance Scaling
Amdahls Law Parallel Speedup 1/(Serial
(1-Serial)/N)
Serial 6.7 N 16, N1/2 8 16 Cores, Perf 8
Serial 20 N 6, N1/2 3 6 Cores, Perf 3
Parallel software key to Multi-core success
22
From Multi to Many
13mm, 100W, 48MB Cache, 4B Transistors, in 22nm
23
Future Multi-core Platform
Heterogeneous Multi-Core Platform
24
The New Era of Computing
Multi-everywhere MT, CMP
25
Summary
  • Business as usual is not an option
  • Performance at any cost is history
  • Must make a Right Hand Turn (RHT)
  • Move away from frequency alone
  • Future mArchitectures and designs
  • More memory (larger caches)
  • Multi-threading
  • Multi-processing
  • Special purpose hardware
  • Valued performance with higher integration
Write a Comment
User Comments (0)
About PowerShow.com