Title: Compilers Principles, Techniques, and Tools Chapters 13 Based on Florida State University Spring 200
1CompilersPrinciples, Techniques, and
ToolsChapters 1-3Based on Florida State
University Spring 2007 COP5621 slideshttp//www.c
s.fsu.edu/engelen/courses/COP5621
2Chapter 1Introduction to Compiling
3Compilers
- Compilation
- Translation of a program written in a source
language into a semantically equivalent program
written in a target language
Input
Compiler
Target Program
Source Program
Error messages
Output
4The Analysis-Synthesis Model of Compilation
- There are two parts to compilation
- Analysis determines the operations implied by the
source program which are recorded in a tree
structure - Synthesis takes the tree structure and translates
the operations therein into the target program
5Cousins of the Compiler
Skeletal Source Program
Preprocessor
Source Program
Compiler
Target Assembly Program
Assembler
Relocatable Object Code
Linker
Libraries and Relocatable Object Files
Absolute Machine Code
6The Phases of a Compiler
7The Grouping of Phases
- Compiler front and back ends
- Front end analysis (machine independent)?
- Back end synthesis (machine dependent)?
- Compiler passes
- A collection of phases is done only once (single
pass) or multiple times (multi pass)? - Single pass usually requires everything to be
defined before being used in source program - Multi pass compiler may have to keep entire
program representation in memory
8Goals of a Semantic Analyzer
- Compiler must do more than recognize whether a
sentence belongs to the language... - Find all possible remaining errors that would
make program invalid - undefined variables, types
- type errors that can be caught statically
- Terminology
- Static checks done by the compiler
- Dynamic checks done at run time
9Chapter 2A Simple One-Pass Compiler
10Structure of Compiler Front End
Lexical analyzer
Syntax-directedtranslator
Intermediate representation
Tokenstream
11Syntax Definition
- Context-free grammar is a 4-tuple with
- A set of tokens (terminal symbols)?
- A set of nonterminals
- A set of productions
- A designated start symbol
12Derivation
- Given a CF grammar we can determine the set of
all strings (sequences of tokens) generated by
the grammar using derivation - We begin with the start symbol
- In each step, we replace one nonterminal in the
current sentential form with one of the
right-hand sides of a production for that
nonterminal
13Parse Tree
- The root of the tree is labeled by the start
symbol - Each leaf of the tree is labeled by a terminal
(token) or ? - Each interior node is labeled by a nonterminal
- If A ? X1 X2 Xn is a production, then node A
has immediate children X1, X2, , Xn where Xi is
a (non)terminal or ? (? denotes the empty string)?
14Ambiguity
string
string
string
string
string
string
string
string
string
string
9
-
5
2
9
-
5
2
15Associativity of Operators
Left-associative operators have left-recursive
productions
left ? left term term
String abc has the same meaning as (ab)c
Right-associative operators have right-recursive
productions
right ? term right term
String abc has the same meaning as a(bc)?
16Precedence of Operators
Operators with higher precedence bind more
tightly
expr ? expr term termterm ? term factor
factorfactor ? number ( expr )
String 235 has the same meaning as 2(35)?
expr
expr
term
factor
term
term
factor
factor
number
number
number
2
3
5
17Syntax-Directed Translation
- Uses a CF grammar to specify the syntactic
structure of the language - AND associates a set of attributes with the
terminals and nonterminals of the grammar - AND associates with each production a set of
semantic rules to compute values of attributes - A parse tree is traversed and semantic rules
applied after the computations are completed the
attributes contain the translated form of the
input
18Synthesized Attributes
- An attribute is said to be synthesized if its
value at a parse-tree node is determined from the
attribute values at the children of the node
19Annotated Parse Tree
expr.t 95-2
term.t 2
expr.t 95-
expr.t 9
term.t 5
term.t 9
9
-
5
2
20Translation Schemes
A translation scheme is a CF grammar embedded
with semantic actions
rest ? term print() rest
Embeddedsemantic action
rest
term
rest
print()
21Parsing
- Parsing process of determining if a string of
tokens can be generated by a grammar - For any CF grammar there is a parser that takes
at most O(n3) time to parse a string of n tokens - Linear algorithms suffice for parsing programming
language source code - Top-down parsing constructs a parse tree from
root to leaves - Bottom-up parsing constructs a parse tree from
leaves to root
22Predictive Parsing
- Recursive descent parsing is a top-down parsing
method - Every nonterminal has one (recursive) procedure
responsible for parsing the nonterminals
syntactic category of input tokens - When a nonterminal has multiple productions, each
production is implemented in a branch of a
selection statement based on input look-ahead
information - Predictive parsing is a special form of recursive
descent parsing where we use one lookahead token
to unambiguously determine the parse operations
23FIRST
FIRST(?) is the set of terminals that appear as
thefirst symbols of one or more strings
generated from ?
type ? simple id
array simple of typesimple ? integer
char num dotdot num
FIRST(simple) integer, char, num FIRST(
id) FIRST(type) integer, char, num,
, array
When a nonterminal A has two (or more)
productions as in
A ? ? ?
Then FIRST (?) and FIRST(?) must be disjoint
forpredictive parsing to work
24Left Recursion
When a production for nonterminal A starts with
aself reference then a predictive parser loops
forever
A ? A ? ? ?
We can eliminate left recursive productions by
systematicallyrewriting the grammar using right
recursive productions
A ? ? R ? RR ? ? R ?
25AST
- Abstract Syntax Tree is a tree representation of
the program. Used for - semantic analysis (type checking)?
- some optimization (e.g. constant folding)?
- intermediate code generation (sometimes
intermediate code AST with somewhat different
set of nodes)?
26Lexical Analysis
- Typical tasks of the lexical analyzer
- Remove white space and comments
- Encode constants as tokens
- Recognize keywords
- Recognize identifiers and store identifier names
in a global symbol table
27Chapter 3Lexical Analysis
28Interaction of the Lexical Analyzer with the
Parser
Token,tokenval
LexicalAnalyzer
Parser
SourceProgram
Get nexttoken
error
error
Symbol Table
29Tokens, Patterns, Lexemes
- A token is a classification of lexical units
- For example id and num
- Lexemes are the specific character strings that
make up a token - For example abc and 123
- Patterns are rules describing the set of lexemes
belonging to a token - For example letter followed by letters and
digits and non-empty sequence of digits
30How To Describe Tokens
- Programming language tokens can be described
using regular expressions - A regular expression R describes some set of
strings L(R)? - L(R) is the language defined by R
- L(abc) abc
- L(hellogoodbye) hello, goodbye
- Idea define each kind of token using RE
31Regular Expression Matching
- Sketch of an efficient implementation
- start in some initial state
- look at each input character in sequence, update
scanner state accordingly - if state at end of input is an accept state, the
input string matches the RE - For tokenizing, only need a finite amount of
state (deterministic) finite automaton (DFA) or
finite state machine
32DFA vs. NFA
- DFA action of automaton on each input symbol is
fully determined - obvious table-driven implementation
- NFA
- automaton may have choice on each step
- automaton accepts a string if there is any way to
make choices to arrive at accepting state /
every path from start state to an accept state is
a string accepted by automaton - not obvious how to implement efficiently!