For thousand-core microprocessors - PowerPoint PPT Presentation

1 / 19

About This Presentation

Title:

For thousand-core microprocessors

Description:

Ryoo, Ueng, Rodrigues, Lathara, Kelm, Gelado, Stone, Yi, Kidd, Barghsorkhi, ... Infrastructure work has slowed down ground-breaking work ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 20

Provided by: all148

Category:

more less

Transcript and Presenter's Notes

Title: For thousand-core microprocessors

1
An IMplicitly PArallel Compiler Technology Based
on Phoenix

For thousand-core microprocessors
Wen-mei Hwu
with
Ryoo, Ueng, Rodrigues, Lathara, Kelm, Gelado,
Stone, Yi, Kidd, Barghsorkhi, Mahesri, Tsao,
Stratton, Navarro, Lumetta, Frank, Patel
University of Illinois, Urbana-Champaign

2
Background

Academic compiler research infrastructure is a
tough business
IMPACT, Trimaran, and ORC for VLIW and Itanium
processors
Polaris and SUIF for multiprocessors
LLVM for portability and safety
In 2001, IMPACT team moved into many-core
compilation with MARCO FCRC funding
A new implicitly parallel programming model that
balance the burden on programmers and the
compiler in parallel programming
Infrastructure work has slowed down
ground-breaking work
Timely visit by the Phoenix team in January 2007
Rapid progress has since been taking place
Future IMPACT research will be built on Phoenix

3
The Next Software Challenge
Big picture

Today, multi-core make more effective use of area
and power than large ILP CPUs
Scaling from 4-core to 1000-core chips could
happen in the next 15 years
All semiconductor market domains converging to
concurrent system platforms
PCs, game consoles, mobile handsets, servers,
supercomputers, networking, etc.

We need to make these systems effectively
execute valuable, demanding apps.
4
The Compiler Challenge
Compilers and tools must extend the humans
ability to manage parallelism by doing the heavy
lifting.

To meet this challenge, the compiler must
Allow simple, effective control by programmers
Discover and verify parallelism
Eliminate tedious efforts in performance tuning
Reduce testing and support cost of parallel
programs

5
An Initial Experimental Platform

A quiet revolution and potential build-up
Calculation 450 GFLOPS vs. 32 GFLOPS
Memory Bandwidth 86.4 GB/s vs. 8.4 GB/s
Until last year, programmed through graphics API
GPU in every PC and workstation massive volume
and potential impact

6
GeForce 8800

16 highly threaded SMs, gt128 FPUs, 450 GFLOPS,
768 MB DRAM, 86.4 GB/S Mem BW, 4GB/S BW to CPU

7
Some Hand-code Results
App. Archit. Bottleneck Simult. T Kernel X App X
H.264 Registers, global memory latency 3,936 20.2 1.5
LBM Shared memory capacity 3,200 12.5 12.3
RC5-72 Registers 3,072 17.1 11.0
FEM Global memory bandwidth 4,096 11.0 10.1
RPES Instruction issue rate 4,096 210.0 79.4
PNS Global memory capacity 2,048 24.0 23.7
LINPACK Global memory bandwidth, CPU-GPU data transfer 12,288 19.4 11.8
TRACF Shared memory capacity 4,096 60.2 21.6
FDTD Global memory bandwidth 1,365 10.5 1.2
MRI-Q Instruction issue rate 8,192 457.0 431.0
HKR HotChips-2007
8
Computing Q Performance
446x
GPU (V8) 96 GFLOPS
CPU (V6) 230 MFLOPS
9
Lessons Learned

Parallelism extraction requires global
understanding
Most programmers only understand parts of an
application
Algorithms need to be re-designed
Programmers benefit from clear view of the
algorithmic effect on parallelism
Real but rare dependencies often needs to be
ignored
Error checking code, etc., parallel code is often
not equivalent to sequential code
Getting more than a small speedup over sequential
code is very tricky
20 versions typically experimented for each
application to move away from architecture
bottlenecks

10
Implicitly Parallel Programming Flow
Stylized C/C or DSL w/ assertions
Deep analysis w/ feedback assistance
Concurrency discovery
Human
For increased composability
Visualizable concurrent form
Systematic search for best/correct code gen
Code-gen space exploration
For increased scalability
Visualizable sequential assembly code with
parallel annotations
parallel execution w/ sequential semantics
Parallel HW w/sequential state gen
For increased supportability
Debugger
11
Key Ideas

Deep program analyses that extend programmer and
DSE knowledge for parallelism discovery
Key to reduced programmer parallelization efforts
Exclusion of infrequent but real dependences
using HW STU (Speculative Threading with Undo)
support
Key to successful parallelization of many real
applications
Rich program information maintained in IR for
access by tools and HW
Key to integrate multiple programming models and
tools
Intuitive, visual presentation to programmers
Key to good programmer understanding of algorithm
effects
Managed parallel execution arrangement search
space
Key to reduced programmer performance tuning
efforts

12
Parallelism in Algorithms(H.263 motion
estimation example)
13
MPEG-4 H.263 EncoderParallelism Redicovery
(b)
(c)
(d)
(e)

(a)
14
Code Gen Space Exploration
15
Moving an Accurate Interprocedural Analysis into
Phoenix
Unification Based
Fulcra
16
Getting Started with Phoenix

Meetings with Phoenix team in January 2007
Determined the set of Phoenix API routines
necessary to support IMPACT analyses and
transformations
Received custom build of Phoenix that supports
full type information

17
Fulcra to Phoenix Action!

Four step process
Convert IMPACTs data structure to Phoenixs
equivalents, and from C to C/CLI.
Creating the initial constraint graph using
Phoenixs IR instead of IMPACTs IR.
Convert the solver pointer analysis.
Consist of porting from C to C/CLI and dealing
with any changes to Fulcra ported data
structures.
Annotate the points-to information back into
Phoenix's alias representation.

18
Phoenix Support Wish List

Access to code across file boundaries
LTCG
Access to multiple files within a pass
Full (Source code level) type information
Feed results from Fulcra back to Phoenix
Need more information on Phoenix alias
representation
In the long run, we need highly extendable IR and
API for Phoenix

19
Conclusion

Compiler research for many-cores will require a
very high quality infrastructure with strong
engineering support
New language extensions, new user models, new
functionalities, new analyses, new
transformations
We chose Phoenix based on its robustness,
features and engineering support
Our current industry partners are also moving
into Phoenix
We also plan to share our advanced extensions to
the other academic Phoenix users