EECS 583 Advanced Compilers Course Introduction - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

EECS 583 Advanced Compilers Course Introduction

Description:

Linux, gcc, gdb, emacs. Compiler system not ported to Windows or Mac. 2. ... 1-3 people per project. You will pick the topics ... the documentation and look ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 34
Provided by: scottm3
Category:

less

Transcript and Presenter's Notes

Title: EECS 583 Advanced Compilers Course Introduction


1
EECS 583 Advanced CompilersCourse Introduction
  • Fall 2007, University of Michigan
  • September 5, 2007

2
About Me
  • Mahlke mall key
  • But just call me Scott
  • 6 years here at Michigan
  • Compiler guy who likes hardware
  • Program optimization and building custom hardware
    for high performance
  • Before this HP Labs
  • Compiler research for Itanium-like processors
  • PICO automatic design of NPAs
  • Before before Grad student at UIUC
  • Before 3 Undergrad at UIUC

3
Class Overview
  • This class is NOT about
  • Programming languages
  • Parsing, syntax checking, semantic analysis
  • Handling advanced language features virtual
    functions,
  • Frontend transformations
  • Debugging
  • Simulation
  • Compiler backend
  • Mapping applications to processor hardware
  • Retargetability work for multiple platforms
    (not hard coded)
  • Work at the assembly-code level
  • Processor independent -gt Machine code
  • Speed/Efficiency
  • How to make the application run fast
  • Use less memory (text, data)

4
Background You Should Have
  • 1. Programming
  • Good C programmer (essential)
  • Linux, gcc, gdb, emacs
  • Compiler system not ported to Windows or Mac
  • 2. Computer architecture
  • EECS 370 is good, 470 is better but not essential
  • Basics caches, pipelining, function units,
    registers, virtual memory, branches, branch
    prediction, assembly code
  • 3. Compilers
  • Frontend stuff is not very relevant for this
    class
  • Basic backend stuff we will go over fast
  • Non-EECS 483 people will have to do some
    supplemental reading
  • 4. Powerpoint
  • You will have to make a presentation in this class

5
Textbook and Other Classroom Material
  • No required text Lecture notes, papers
  • Other useful material
  • Trimaran webpage http//www.trimaran.org
  • UIUC Impact webpage http//www.crhc.uiuc.edu/Impa
    ct
  • Course webpage course newsgroup
  • http//www.eecs.umich.edu/mahlke/583f07
  • Lecture notes available the night before class
  • Newsgroup forum for helping each other, I will
    try to check regularly, but I wont be able to
    answer everything
  • http//phorum.eecs.umich.edu

6
What the Class Will be Like
  • Class meeting time 1030 1230, MW
  • 2 hrs is hard to handle
  • Well go for an hour, take 10 min break
  • Core backend stuff
  • Text book material some overlap with 483
  • Few homeworks to apply classroom material
  • Research papers
  • Ill present research material along the way
  • Presentations by students You guys are going to
    teach

7
What the Class Will be Like (2)
  • Learning compilers
  • No memorizing definitions, terms, formulas,
    algorithms, etc
  • Learn by doing Writing code
  • Substantial amount of programming
  • Big learning curve for Trimaran compiler
  • Reasonable amount of reading
  • Classroom
  • Attendance You should be here
  • Discussion important
  • Work out examples, discuss papers, etc
  • Each of you will teach some advanced material to
    the rest of us
  • Essential to stay caught up
  • Special interest groups smaller meetings
    outside of class where certain compiler topics
    are focused on

8
Course Grading
  • Yes, everyone will get a grade
  • Distribution of grades, scale, etc - ???
  • Most (hopefully all) will get As and Bs
  • Slackers will be obvious and will suffer
  • Components
  • Midterm exam 25
  • Project 45
  • Homeworks 10
  • Paper presentation 10
  • Class participation 10

9
Homeworks
  • Around 2-3 of these
  • Small/modest programming assignments
  • Design and implement something we discussed in
    class
  • Goals
  • Learn the important concepts
  • Learn the compiler infrastructure so you can do
    the project
  • Grading
  • Good, weak effort but did something, did nothing
    (2/1/0)
  • Working together is fine (and encouraged!)
  • Make sure you understand things or it will come
    back to bite you
  • For now, everyone must turn in their own
    assignment

10
Projects Most Important Part of the Class
  • Design and implement an interesting compiler
    technique and demonstrate its usefulness
  • Topic/scope/work
  • 1-3 people per project
  • You will pick the topics (I have to agree)
  • Projects will be planned/organized at the SIG
    level
  • You will have to
  • Read background material
  • Plan and design
  • Implement and debug
  • Deliverables
  • Working implementation
  • Project report 5 page paper describing what
    you did/results
  • 20 min presentation at end (demo if you want)

11
Types of Projects
  • New idea
  • Small research idea
  • Design and implement it, see how it works
  • Extend existing idea
  • Take an existing paper, implement their technique
  • Then, extend it to do something interesting
  • Generalize strategy, make more efficient/effective
  • Implementation
  • Take existing idea, create quality implementation
    in Trimaran
  • Generate code for a real architecture

12
Class Participation
  • Interaction and discussion is essential in a
    graduate class
  • Be here
  • Dont just stare at the wall
  • Be prepared to discuss the material
  • Have something useful to contribute
  • Opportunities for participation
  • Research paper discussions thoughts, comments,
    etc
  • Saying what you think in the special interest
    group meetings
  • Solving class problems

13
Special Interest Groups
  • Divide up the class into focus groups
  • Each group will meet at times TBD
  • Identify research papers, discuss papers and
    project ideas
  • Start SIGs about ½ way through class
  • SIG topics from previous semesters
  • Analysis and optimization
  • Code generation (scheduling, register allocation,
    ... )
  • Managing the memory hierarchy
  • Power/energy management
  • Reliability
  • Multiple threads

14
Special Interest Groups (2)
  • FAQ
  • Do I have to be in a group Yes
  • Can I be in more than 1 group No
  • Do I get to pick which group I am in Sort of
  • What if I get put in a group that I do not want
    to be in Tough
  • Do I have to go to the SIG meetings Yes
  • Can I do my project with someone in another SIG
    No

15
Contact Information
  • Office 4633 CSE
  • Email mahlke_at_umich.edu
  • Office hours
  • Mon, Wed briefly after class Wed 4-5pm
  • Visiting office hrs
  • No GSI for this class
  • I dont have the time or energy to debug
    everyones code
  • You will have to be independent in this class
  • Read the documentation and look at the code
  • Come to me when you are really REALLY stuck or
    confused
  • Helping each other is encouraged
  • Use the phorum

16
Role of the Compiler My Biased View
  • Hardware people have to understand compilers
  • No attention to compilers -gt bad processor design
  • Frontend material is not what real compiler
    people do
  • Parsing, syntax checking, etc Standard, mature
    field
  • Buy a frontend from EDG
  • Backend is where the action is at
  • How to make code run fast (approach hand coding)
  • How to reduce power/energy
  • How to reduce code size
  • How to reduce memory stalls
  • How to make use of unusual architectural features
  • How to design better processors

17
Superscalar Processors
  • Do everything in hardware
  • Sequential code comes in
  • Hardware parallelizes the code on the fly
  • Traditional computer architecture class
  • Emphasis on Pentium class architectures
  • Desktop architecture is the only thing that is
    important
  • In this class ...
  • Very Long Instruction Word (VLIW) architectures
    and multicore VLIWs are the focus
  • Why? Dumb hardware Smart compiler
  • Burden shifted to the compiler to exploit machine
    resources

18
VLIW/EPIC Architectures
  • Our target processor for this class is VLIW/EPIC
  • EPIC Explicitly Parallel Instruction Computing
  • Think of these as synonyms for this class
  • Desktop
  • IA-64 aka Itanium I and II, Merced, McKinley
  • Embedded processors
  • All high-performance DSPs are VLIW
  • Why? Cost/power of superscalar, more scalability
  • TI-C6x, Philips Trimedia, Starcore, ST-200
  • Itanium (aka Itanic) Is it a bad idea?

19
VLIW/EPIC Philosphy
  • Compiler creates complete plan of run-time
    execution
  • At what time and using what resource
  • POE communicated to hardware via the instruction
    set
  • Processor obediently follows POE
  • No dynamic scheduling, out of order execution
    (these second guess the compilers plan)
  • Compiler allowed to play the statistics
  • Many types of info only available at run-time
    (branch directions, locations accessed via
    pointers)
  • Traditionally compilers behave conservatively ?
    handle worst case possibility
  • Allow the compiler to gamble when it believes the
    odds are in its favor Feedback directed
    optimization
  • Expose microarchitecture to the compiler
  • memory system, branch execution

20
Defining Feature I - MultiOp
  • Superscalar
  • Operations are sequential
  • Hardware figures out resource assignment, time of
    execution
  • MultiOp instruction
  • Set of independent operations that are to be
    issued simultaneously (no sequential notion
    within a MultiOp)
  • 1 instruction issued every cycle provides
    notion of time
  • Resource assignment indicated by position in
    MultiOp
  • POE communicated to hardware via MultiOps

add
sub
load
load
store
mpy
shift
branch
21
Defining Feature II - Exposed Latency
  • Superscalar
  • Sequence of atomic operations
  • Sequential order defines semantics
  • Unit assumed latency (UAL)
  • Each conceptually finishes before the next one
    starts
  • VLIW non-atomic operations
  • Register reads/writes for 1 operation separated
    in time
  • Semantics determined by relative ordering of
    reads/writes
  • Assumed latency (NUAL if gt 1 for at least one op)
  • Contract between the compiler and hardware
  • Instruction issuance provides common notion of
    time

22
UAL vs NUAL example
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13
Operation r1 load(r2) r1 load(r3) r4
mpy(r1, r5) r4 add(r1, r6) r7 mpy(r4, r9) r7
add(r7, r8)
Phase1 Operation v1 load(r2) v2
load(r3) v3 mpy(r1, r5) v4 add(r1, r6) v5
mpy(r4, r9) v6 add(r7, r8)
Phase2 Operation r1 v1 r1 v2 r4 v4 r4
v3 r7 v6 r7 v5
Time 1 2 3 4 5 6 7 8 9 10 11 12 13
NUAL
traditional
Assume load 4 cycles, add 1, mpy 3
23
Other Architectural Features of VLIW/EPIC
  • Add features into the architecture to support
    VLIW/EPIC philosphy
  • Create more efficient POEs
  • Expose the microarchitecture
  • Play the statistics
  • Example features
  • Register files with explicit register renaming
  • Unbundled branches
  • Control/data speculation
  • Memory hierarchy management
  • Predicated execution

24
Explicit Register Renaming
  • Superscalar
  • Small number of architectural registers
  • Rename using large pool of physical registers at
    run-time
  • VLIW
  • Compiler responsible for all resourceallocation
    including registers
  • Rename at compile time large poolof regs
    needed
  • Static renaming
  • Modify operands explicitly
  • Dynamic renaming
  • Operands not explicitly modified
  • Is this feature lost? NO!

Op1
r13
Op2
Op3
r13 ? r67
Op4
25
Fancier Renaming With Rotating Registers
iteration n RRB 7
  • Overlap loop iterations
  • How do you prevent register overwrite in later
    iterations?
  • Compiler-controlled dynamic register renaming
  • Rotating registers
  • Each iteration writes to r13
  • But this gets mapped to a different physical
    register
  • Block of consecutive regs allocated for each reg
    in loop corresponding to number of iterations it
    is needed

iteration n 1 RRB 6
II
Op1
Op1
r13
Op2
r13
Op2
actual reg (reg RRB) NumRegs At end of each
iteration, RRB--
26
Unbundled Branches
  • Branch separated into 3 distinct operations
  • 1. Prepare to branch compute target address,
    prefetch instructions from likely target
  • Executed well in advance of branch
  • 2. Compute branch condition comparison
    operation
  • 3. Branch itself

PBR btr1, TARGET
Branch
CMPP pr0, (xgt100)?
BR btr1, pr0
27
Control/Data Speculation
if (a gt b) x u w y x z y
4 . . .
a b . . . y x z y 4
Hoist conditionally executed instructions above
the condition
Hoist loads/uses over potentially aliased stores
x u w y x z y 4 if (a gt b) .
. .
y x z y 4 . . . a b
28
Predicated Execution
a b c if (a gt 0) e f g else e f
/ g h i - j
add a, b, c bgt a, 0, L1 div e, f, g jump L2 L1
add e, f, g L2 sub h, i, j
BB1 BB1 BB3 BB3 BB2 BB4
BB1
BB2
BB3
BB4
Traditional branching code
add a, b, c if T p2 a gt 0 if T p3 a lt 0 if
T div e, f, g if p3 add e, f, g if p2 sub h, i, j
if T
BB1 BB1 BB1 BB3 BB2 BB4
BB1 BB2 BB3 BB4
p2 ? BB2 p3 ? BB3
Predicated code
29
Scaling VLIW Architectures
Conventional Architecture
  • Register file access latency
  • Grows linearly with number of registers
  • Grows quadratically with number of ports
  • Increasing processor width requires increases to
    both
  • Clustered Approach
  • Decentralized architecture
  • Break design down into multiple chunks aka
    clusters
  • Communication through interconnection network
  • Used in Alpha 21264, TI C6x, Analog Tigersharc
    and others.

RF
Register File
FU
FU
FU
FU
FU
Clustered Architecture
Register File
Register File
FU
FU
FU
FU
Cluster 1
Cluster 2
30
Basics of Multicluster Compilation
  • Objectives
  • Divide instructions across clusters to maximize
    parallelism
  • Minimize critical intercluster communication

Interconnection Network

Register File
Register File
gtgt


LW
I
MEM
MEM
I

Intercluster move
Cluster 1
Cluster 2
31
Multicore VLIWs
To north
To west
Mem
Comm
. . .
FU
FU
FU
Register Files
GPR
FPR
PR
BTR
Instruction Fetch/Decode
L1
L1
Instruction Cache
Data Cache
To/From Banked L2
From Banked L2
Scalar operand network enables multicores to
behave as a multicluster VLIW
32
Speculating Larger Chunks of Work with a
Transactional Memory
  • Atomic and isolated execution
  • Replace locks for critical sections
  • No lock granularity problem
  • Software Error Recovery
  • Allow programmers to abort/rollback transactions
    when errors are detected
  • Convenient interface for exception handling
  • Enables thread level speculation

Wrt Buffer
CPU
L1 D
33
What if I Dont Care About These Architectures?
  • How do we compile for superscalars?
  • How do we compile for RISCs?
  • All the basic compiler analyses and
    transformations are the same for all processor
    types
  • They were developed for RISCs
  • Superscalar compilers work by pretending the
    processor is a VLIW
  • But must worry about hardware undoing what the
    compiler did
  • Other resources to worry about (ie reorder
    buffer, reserv stations, etc.)
  • Not all hardware features available
Write a Comment
User Comments (0)
About PowerShow.com