Title: Are New Languages Necessary for Manycore?
1Are New Languages Necessary for Manycore?
- David I. August
- Department of Computer Science
- Princeton University
2THIS is the Problem!
?
2004
SPEC CPU INTEGER PERFORMANCE
TIME
3Why New Multicore Languages Will Fail
- Money is earned by relieving customer pain
- The Market
- Legacy, Legacy, Legacy
- Programmers adopt new programming models
- Parallel programming is more difficult
- Parallel programming models have longevity issues
- Automatic Thread Extraction (ATE)
4Automatic Thread Extraction
That isn't to say we are parallelizing arbitrary
C code, that's a fool's errand! Richard Lethin
Compiler cant determine a tree from a graph
Burton Smith
Compiler cant determine dependences without
type information. Even then Burton Smith
Decades of automatic parallelization work has
been a failure James Larus
All that icky pointer chasing code... Tim
Mattson
5How To Get Parallelism For Multicore?
- Nine months ago, with an open mind
- A priori select ALL C programs from SPEC CINT
2000 - Our objective function (in priority order)
- Extract meaningful parallelism
- Prefer automatic over manual
- Minimize impact to the programmer when manual
6Our Results
Benchmark Threads at Peak Speedup LOCs Changed
164.gzip 32 29.91 26
175.vpr 15 3.59 1
176.gcc 16 5.06 17
181.mcf 32 2.84 0
186.crafty 32 25.18 9
197.parser 32 24.50 2
253.perlbmk 5 1.21 0
254.gap 10 1.94 1
255.vortex 32 4.92 0
256.bzip2 12 6.72 0
300.twolf 8 2.06 1
GEOMEAN 17 5.54
ARITHMEAN 20 9.81
M.L.O.P. 5 Generations 32 Cores 5.3x Speedup
7Our Recipe
- Recent Compiler Technology
- Decoupled Software Pipelining (DSWP) MICRO 05
- Parallel-Stage DSWP (PS-DSWP)
- Speculative DSWP (Spec-DSWP) PACT 07
- Existing Technology Speculative DOALL, TLS
- Targeted Memory Profiling
- Procedure Boundary Elimination PLDI 06
- Hardware Support
- Compiler-Controlled Speculation
- Streaming Communication MICRO 06
8Typical Example 197.parser
Threads run on multicore model with Itanium 2
cores.
Find English Sentences
Parse Sentences (95)
Emit Results
DSWP
PS-DSWP (Spec DOALL Middle Stage)
9What We Learned
- A new way of thinking about dependences
- Go With the Flow
- TLP is easier to extract than ILP
- A holistic approach is better
- A limitation exists in the sequential model
- Determinism
10Determinism A Double Edged Sword
while(ltcondgt) ltworkgt x Rand() ltworkgt
int Rand() state f2(state) return f1(state)
1
2
3
4
DOALL
1
2
3
4
SEQUENTIAL
56 LOCs in 11 programs 22 annotations Only 2
programs needed more Most common culprit Custom
Allocators
11What about Manycore?
- Multicore
- New languages arent necessary
- Legacy code easily adjusted
- Manycore
- Implicitly Parallel Sequential Programming
- No optimization for sequential (custom
allocators) - Points of non-determinism specified
- Parallel algorithms in sequential codes
- Debuggability, Understandability, Sanity
12The Answer Originates with ATE
- The Old Way PL folks would write languages,
Architecture folks would make HW, andCompiler
folks would dutifully connect the two. - This will fail for Manycore
- Unduly burden the programmer
- Performance will suffer
- Theres a New Way
13DO NOT POST ANYTHING AFTER THIS SLIDE
14How Code Was Transformed
Benchmark LOC (All) LOC (Model) Model Techniques Compiler Techniques Applied
164.gzip 26 2 Y-Branch TLS Memory, DSWP
175.vpr 1 1 PURE Alias, Value, Control Spec, TLS Mem, DSWP
176.gcc 17 7 PURE Alias Control Spec, TLS MEM, DSWP
181.mcf 0 0 Alias, Silent Store, Control Spec, TLS Mem, DSWP, Nested
186.crafty 9 9 PURE TLS Mem, DSWP, Nested
197.parser 2 2 PURE TLS Mem, DSWP
253.perlbmk 0 0 Alias, Control, Value Spec, DSWP
254.gap 1 1 PURE TLS Memory, DSWP, Alias Spec
255.vortex 0 0 Alias Value Spec, TLS Mem, DSWP
256.bzip2 0 0 TLS Memory, DSWP
300.twolf 1 1 PURE Alias Control Spec, TLS Mem, DSWP
15PURE
16Y-Branch
17SPEC 2006 403.gcc
Threads run on multicore model with Itanium 2
cores.