Yale Patt - PowerPoint PPT Presentation

About This Presentation

Title:

Yale Patt

Description:

Future Microprocessors: Multi-core, Multi-nonsense, and What we must do differently moving forward Yale Patt The University of Texas at Austin Universidade de Brasilia – PowerPoint PPT presentation

Number of Views:147

Avg rating:3.0/5.0

Slides: 51

Provided by: Yal79

Category:

more less

Transcript and Presenter's Notes

Title: Yale Patt

1
Future Microprocessors Multi-core,
Multi-nonsense, and What we must do differently
moving forward

Yale Patt
The University of Texas at Austin

Universidade de Brasilia Brasilia, DF --
Brazil August 12, 2009
2
At this years ISC in Hamburg

Moores Law in the future will mean doubling
the number of cores on the chip.
I dont think so.
How will we effectively utilize a 10 million core
supercomputer?
I hope that one was a typo.
Do the math.
Will the chip be homogenous or heterogenous?
That ones easy heterogeneous
What I have been calling PentiumX/NiagaraY

3
and

Will there be a standard ISA, like x86 for
example?
Who cares?
Are there any good tools for automatically
generating parallel programs?
Why does it have to be automatic?

4
What I want to do today

Given all the Multi-core hype
Is it really the Holy Grail?
Will it cure cancer?
What multi-core is and what it is not
And where we go from here

5
The Compile-time Outline

Multi-core how we got here
Mis-information
Where do we go from here
The microprocessor of the future

6
Outline

Multi-core how we got here
Mis-information
Where we go from here
The microprocessor of the future

7
How we got here (Moores Law)

The first microprocessor (Intel 4004), 1971
2300 transistors
106 KHz
The Pentium chip, 1992
3.1 million transistors
66 MHz
Today
more than one billion transistors
Frequencies in excess of 5 GHz
Tomorrow ?

8
How have we used the available transistors?
9
Intel Pentium M
10
Intel Core 2 Duo

Penryn, 2007
45nm, 3MB L2

11
Why Multi-core chips?

In the beginning a better and better
uniprocessor
improving performance on the hard problems
until it just got too hard
Followed by a uniprocessor with a bigger L2
cache
forsaking further improvement on the hard
problems
poorly utilizing the chip area
and blaming the processor for not delivering
performance
Today dual core, quad core, octo core
Tomorrow ???

12
Why Multi-core chips?

It is easier than designing a much better
uni-core
It was embarrassing to continue making L2 bigger
It was the next obvious step

13
So, Whats the Point

Yes, Multi-core is a reality
No, it wasnt a technological solution to
performance improvement
Ergo, we do not have to accept it as is
i.e., we can get it right the second time,
and that means
What goes on the chip
What are the interfaces

14
Outline

Multi-core how we got here
Mis-information, or more accurately
Multi-nonsense
Where do we go from here
The microprocessor of the future

15
Multi-nonsense

Multi-core was a solution to a performance
problem
Hardware works sequentially
Make the hardware simple thousands of cores

16
The Asymmetric Chip Multiprocessor (ACMP)
17
Large core vs. Small Core
LargeCore
SmallCore

Out-of-order
Wide fetch e.g. 4-wide
Deeper pipeline
Aggressive branch predictor (e.g. hybrid)
Many functional units
Trace cache
Memory dependence speculation

In-order
Narrow Fetch e.g. 2-wide
Shallow pipeline
Simple branch predictor (e.g. Gshare)
Few functional units

18
Throughput vs. Serial Performance
19
Multi-nonsense

Multi-core was a solution to a performance
problem
Hardware works sequentially
Make the hardware simple thousands of cores
Do in parallel at a slower clock and save power
ILP is dead

20
ILP is dead

We double the number of transistors on the chip
Pentium M 77 Million transistors (50M for the L2
cache)
2nd Generation 140 Million (110M for the L2
cache)
We see 5 improvement in IPC
Ergo ILP is dead!
Perhaps we have blamed the wrong culprit.
The EV4,5,6,7,8 data from EV4 to EV8
Performance improvement 55X
Performance from frequency 7X
Ergo 55/7 gt 7 -- more than half due to
microarchitecture

21
(No Transcript)
22
Moores Law

A law of physics
A law of process technology
A law of microarchitecture
A law of psychology

23
Multi-nonsense

Multi-core was a solution to a performance
problem
Hardware works sequentially
Make the hardware simple thousands of cores
Do in parallel at a slower clock and save power
ILP is dead
Examine what is (rather than what can be)

24
Examine what is (rather than what can be)

Should sample benchmarks drive future
designs?
Another bridge over the East
River?

25
Multi-nonsense

Multi-core was a solution to a performance
problem
Hardware works sequentially
Make the hardware simple thousands of cores
Do in parallel at a slower clock and save power
ILP is dead
Examine what is (rather than what can be)
Communication off-chip hard, on-chip easy
Abstraction is a pure good
Programmers are all dumb and need to be protected
Thinking in parallel is hard

26
Outline

Multi-core how we got here
Mis-information
Where do we go from here
The microprocessor of the future

27
In the next few years

Process technology 50 billion transistors
Gelsinger says we are can go down to 10
nanometers
(I like to say 100 angstroms just to keep us
focused)
Dreamers will use whatever we come up with
What should we put on the chip?
How should software interface to it?

28
How will we use 50 billion transistors?

How have we used the transistors up to now?
and why havent we seen
comparable benefit

29
In my opinion the reason is

Our inability to effectively exploit
-- The transformation hierarchy
-- Parallel programming

30
Problem

Algorithm
Program
ISA (Instruction Set Arch)
Microarchitecture
Circuits

Electrons
31
Up to now

Maintain the artificial walls between the layers
Keep the abstraction layers secure
Makes for a better comfort zone
In the beginning, improving the Microarchitecture
Pipelining, Branch Prediction, Speculative
Execution
Out-of-order Execution, Caches, Trace Cache
Lately, blindly doubling the number of cores
Today, we have too many transistors
BANDWIDTH and POWER are blocking improvement
We MUST change the paradigm

32
We Must Break the Layers

(We already have in limited cases)
Pragmas in the Language
The Refrigerator
X Superscalar
The algorithm, the language, the compiler,
the microarchitecture all working together

33
IF we break the layers

Compiler, Microarchitecture
Multiple levels of cache
Block-structured ISA
Part by compiler, part by uarch
Fast track, slow track
Algorithm, Compiler, Microarchitecture
X superscalar the Refrigerator
Niagara X / Pentium Y
Microarchitecture, Circuits
Verification Hooks
Internal fault tolerance

34
Unfortunately

We train computer people to work within their
layer
Too few understand anything outside their layer
and, as to multiple cores
People think sequential

35
At least two problems
36
Conventional Wisdom Problem 1 Abstraction is
Misunderstood

Taxi to the airport
The Scheme Chip (Deeper understanding)
Sorting (choices)
Microsoft developers (Deeper understanding)

37
Conventional Wisdom Problem 2 Thinking in
Parallel is Hard

Perhaps Thinking is Hard
How do we get people to believe
Thinking in parallel is natural

38
How do we solve these two problems?

FIRST, Do not accept the premise
Parallel programming is hard
SECOND, Do not accept the premise
It is okay to know only one layer

39
Parallel Programming is Hard?

What if we start teaching parallel thinking
in the first course to freshmen
For example
Factorial
Streaming

40
How do we solve these problems?

FIRST, Do not accept the premise
Parallel programming is hard
SECOND, Do not accept the premise
It is okay to know only one layer

41
Students can understand more than one layer

What if we get rid of top-down FIRST
Students do not get it they have no
underpinnings
Objects are too high a level of abstraction
So, students end up memorizing
Memorizing isnt learning (and certainly not
understanding)
What if we START with motivated bottom up
Students build on what they already know
Memorizing is replaced with real learning
Continually raising the level of abstraction
The student sees the layers from the start
The student makes the connections
The student understands what is going on
The layers get broken naturally

42
We have an Education Problem We have an Education
Opportunity

Too many computer professionals dont get it.
We can exploit all these transistors
IF we can understand each others layer
Thousands of cores, hundreds of accelerators
Ability to power on/off under program control
Algorithms, Compiler, Microarchitecture, Circuits
all talking to each other
Harnessing 50 billion transistor chips

43
IF we understand

50 billion transistors means we can have
A very large number of simple processors, AND
A few large very heavyweight processors, AND
Enough refrigerators for handling special tasks
Some programmers can take advantage of all this
Those who cant need support
We need software that can enable all of the above

44
that is

IF we are willing to continue to pursue ILP
IF we are willing to break the layers
IF we are willing to embrace parallel programming
IF we are willing to provide more than one
interface
IF we are willing to understand more than
our own layer of the abstraction hierarchy
so we really can talk to each other

Then maybe we can really harness the resources
of the multi-core and many-core chips

46
Outline

Multi-core how we got here
Mis-information
Where do we go from here
The microprocessor of the future

47
The future microprocessor WILL BE a Multi-core
chip

But it will be a PentiumX/Niagara Y chip
With multiple interfaces to the software
It will tackle off-chip bandwidth
It will tackle power consumption (ON/OFF
switches)
It will tackle soft errors (internal fault
tolerance)
It will tackle security
And it WILL CONTAIN a few
heavyweight ILP processors
With lots of Refrigerators
And with the levels of transformation integrated
And with multiple interfaces

48
The Heavyweight Processor

Compiler/Microarchitecture Symbiosis
Multiple levels of cache
Fast track / Slow track
Part by compiler, part by microarchitecture
Block-structured ISA
Better Branch Prediction (e.g., indirect jumps)
Ample sprinkling of Refrigerators
SSMT (Also known as helper threads)
Power Awareness (more than ON/OFF switches)
Verification hooks (CAD a first class citizen)
Internal Fault tolerance (for soft errors)
Better security

49
and very importantly

At least two interfaces
One for programmers who understand
One for programmers who dont understand
With layers of software for those who dont.

Thank you!

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Agenda - Agenda. Welcome Tony Ambler. Dean Neikirk. Brian Evans. Arjang Hassibi. Jacob Abraham. Yale Patt. Panel Discussion ... | PowerPoint PPT presentation | free to view

Bottleneck Identification and Scheduling in Multithreaded Applications PowerPoint PPT Presentation

Bottleneck Identification and Scheduling in Multithreaded Applications - Bottleneck Identification and Scheduling in Multithreaded Applications Jos A. Joao M. Aater Suleman Onur Mutlu Yale N. Patt ... | PowerPoint PPT presentation | free to view

Lecture%205%20Approaches%20to%20Concurrency:%20The%20Multiprocessor PowerPoint PPT Presentation

Lecture%205%20Approaches%20to%20Concurrency:%20The%20Multiprocessor - Title: PowerPoint Presentation Last modified by: Yale Patt Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles | PowerPoint PPT presentation | free to view

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution PowerPoint PPT Presentation

Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution - Wish Branches Combining Conditional Branching and Predication for Adaptive Predicated Execution Hyesoon Kim Onur Mutlu Jared Stark* Yale N. Patt The University of ... | PowerPoint PPT presentation | free to view

Feedback Directed Prefetching PowerPoint PPT Presentation

Feedback Directed Prefetching - Feedback Directed Prefetching Santhosh Srinath Onur Mutlu Hyesoon Kim Yale N. Patt Spend more time here. Explain why adapt better Prefetches can ... | PowerPoint PPT presentation | free to view

Address-Value Delta (AVD) Prediction PowerPoint PPT Presentation

Address-Value Delta (AVD) Prediction - Address-Value Delta (AVD) Prediction Onur Mutlu Hyesoon Kim Yale N. Patt What is AVD Prediction? A new prediction technique used to break the data dependencies ... | PowerPoint PPT presentation | free to view

The Journal of Instruction Level Parallelism Championship Branch Prediction website: http://www.jilp.org/cbp PowerPoint PPT Presentation

The Journal of Instruction Level Parallelism Championship Branch Prediction website: http://www.jilp.org/cbp - The Journal of Instruction Level Parallelism. Championship ... Yale N. Patt, Univ. of Texas at Austin. Jim Smith, Univ. of Wisconsin. Jared Stark, MRL/MTL Intel ... | PowerPoint PPT presentation | free to view

Can we design a core that adapts to the thread-level parallelism in programs? PowerPoint PPT Presentation

Can we design a core that adapts to the thread-level parallelism in programs? - Can we design a core that adapts to the thread-level parallelism in programs? MorphCore High performance and energy-efficiency on both single- and multi-threaded programs | PowerPoint PPT presentation | free to view

What you should know about computer architecture PowerPoint PPT Presentation

What you should know about computer architecture - Power can be as important as performance ... countless other non-traditional computing devices. 1/23/00. UW-CSE. Hot Topics in Industry ... | PowerPoint PPT presentation | free to view

Runahead Execution: An Alternative to Very Large Instruction Windows for Outoforder Processors PowerPoint PPT Presentation

Runahead Execution: An Alternative to Very Large Instruction Windows for Outoforder Processors - Long Running Instruction. Commited Instruction. Instruction Window. Filling the Instruction Window ... instructions during long stalls. Disregard results ... | PowerPoint PPT presentation | free to view

Computer Engineering PowerPoint PPT Presentation

Computer Engineering - Computer Engineering Proposed Changes for 2002-2004 and beyond | PowerPoint PPT presentation | free to view

Performance-Aware Speculation Control using Wrong Path Usefulness Prediction PowerPoint PPT Presentation

Performance-Aware Speculation Control using Wrong Path Usefulness Prediction - Stops fetching instructions on wrong path. to save energy. ... Degrade with Perfect Fetch Gating? ... 30% performance degradation with perfect fetch gating ... | PowerPoint PPT presentation | free to view

Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps PowerPoint PPT Presentation

Improving the Performance of Object-Oriented Languages with Dynamic Predication of Indirect Jumps - indirect jump/call instructions in the ISA ... target of indirect jump = target in previous execution ... 4K-entry, 4-way BTB (baseline indirect jump predictor) ... | PowerPoint PPT presentation | free to view

CS252 Graduate Computer Architecture Lecture 12 Branch Prediction Possible Projects PowerPoint PPT Presentation

CS252 Graduate Computer Architecture Lecture 12 Branch Prediction Possible Projects - Possible Projects. October 8th, 2003. Prof. John Kubiatowicz ... Should be a miniature research project ... Projects. David Culler and Kris Pister collaborating ... | PowerPoint PPT presentation | free to view

Research on Branch Prediction Algorithms PowerPoint PPT Presentation

Research on Branch Prediction Algorithms - Our benchmarks represent general computer. ... Our benchmarks use quick sort and heap sort, most widely used. ... on miss rate, it's the benchmark's limit ... | PowerPoint PPT presentation | free to view

This lecture is not mandatory for all students. PowerPoint PPT Presentation

This lecture is not mandatory for all students. - This lecture is not mandatory for all students. It should be studied only by those students who selected the reversible logic as ... higher densities. higher speed ... | PowerPoint PPT presentation | free to view

Profile-Assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors - Title: Profile-Assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors Author: Jose Joao Last modified by: Electrical and Computer Engineering | PowerPoint PPT presentation | free to view

Modem Design, Implementation, and Testing Using NI PowerPoint PPT Presentation

Modem Design, Implementation, and Testing Using NI - real-time digital communications software. Lecture: breadth (three hours/week) ... Role of technology. TI DSPs and Code Composer Studio. NI LabVIEW and DSP ... | PowerPoint PPT presentation | free to view

Reducing Issue Logic Complexity in Superscalar Microprocessors PowerPoint PPT Presentation

Reducing Issue Logic Complexity in Superscalar Microprocessors - Reducing Issue Logic Complexity in Superscalar Microprocessors ... Budget / Deluxe speculatively woken up scheduling. Ideal 1 cycle scheduling pipeline ... | PowerPoint PPT presentation | free to view

Line Distillation: PowerPoint PPT Presentation

Line Distillation: - Caches are organized at linesize granularity ... Spatial-Temporal Cache -Gonzales [ICS'95] Spatial Locality Prediction Johnson [ISCA'97] ... | PowerPoint PPT presentation | free to view

Modem Design, Implementation, and Testing Using NI - Eye Diagram. LabVIEW demo by Zukang Shen (UT Austin) ... Method 1 with different DMA initialization(s) LabVIEW DSP. Test Integration Toolkit 2.0 ... | PowerPoint PPT presentation | free to view

13th Lecture 6' Future Processors to use CoarseGrain Parallelism PowerPoint PPT Presentation

13th Lecture 6' Future Processors to use CoarseGrain Parallelism - Today s microprocessors utilize instruction level parallelism by a deep ... Hydra: A Single-Chip Multiprocessor. CPU 0. Centralized Bus Arbitration Mechanisms ... | PowerPoint PPT presentation | free to view

Hardwarebased Devirtualization VPC Prediction PowerPoint PPT Presentation

Hardwarebased Devirtualization VPC Prediction - Source code: Shape *s = ...; a = s- area(); // virtual function call. Static assembly code: R1 = MEM[R2] // function address lookup. call R1 // a register ... | PowerPoint PPT presentation | free to view

A Case for MLP-Aware Cache Replacement PowerPoint PPT Presentation

A Case for MLP-Aware Cache Replacement - Memory Level Parallelism (MLP) Memory Level Parallelism ... [Glew 98] Several techniques to improve MLP (out-of-order, runahead etc.) MLP varies. | PowerPoint PPT presentation | free to view

EE 319K Introduction to Embedded Systems PowerPoint PPT Presentation

EE 319K Introduction to Embedded Systems - EE 319K Introduction to Embedded Systems Lecture 15: Final Exam Review Bill Bard, Andreas Gerstlauer, Jon Valvano, Ramesh Yerraballi Bill Bard, Andreas Gerstlauer ... | PowerPoint PPT presentation | free to view

Improving the Performance of ObjectOriented Languages with Dynamic Predication of Indirect Jumps - Dept. of Computer Science and Eng. IIT Kanpur. Motivation. 2 ... Stores multiple targets on the BTB, as our target selection logic does. 12 ... | PowerPoint PPT presentation | free to view