Compilers Principles, Techniques, and Tools Chapters 13 Based on Florida State University Spring 200 - PowerPoint PPT Presentation

1 / 32

About This Presentation

Title:

Compilers Principles, Techniques, and Tools Chapters 13 Based on Florida State University Spring 200

Description:

Parse tree or abstract syntax tree ... A parse tree is traversed and semantic rules applied: after the computations are ... for parsing programming language ... – PowerPoint PPT presentation

Number of Views:2484

Avg rating:5.0/5.0

Slides: 33

Provided by: csVir

Category:

more less

Transcript and Presenter's Notes

Title: Compilers Principles, Techniques, and Tools Chapters 13 Based on Florida State University Spring 200

1
CompilersPrinciples, Techniques, and
ToolsChapters 1-3Based on Florida State
University Spring 2007 COP5621 slideshttp//www.c
s.fsu.edu/engelen/courses/COP5621

Tamim Sookoor
10/25/2007

2
Chapter 1Introduction to Compiling
3
Compilers

Compilation
Translation of a program written in a source
language into a semantically equivalent program
written in a target language

Input
Compiler
Target Program
Source Program
Error messages
Output
4
The Analysis-Synthesis Model of Compilation

There are two parts to compilation
Analysis determines the operations implied by the
source program which are recorded in a tree
structure
Synthesis takes the tree structure and translates
the operations therein into the target program

5
Cousins of the Compiler
Skeletal Source Program
Preprocessor
Source Program
Compiler
Target Assembly Program
Assembler
Relocatable Object Code
Linker
Libraries and Relocatable Object Files
Absolute Machine Code
6
The Phases of a Compiler
7
The Grouping of Phases

Compiler front and back ends
Front end analysis (machine independent)?
Back end synthesis (machine dependent)?
Compiler passes
A collection of phases is done only once (single
pass) or multiple times (multi pass)?
Single pass usually requires everything to be
defined before being used in source program
Multi pass compiler may have to keep entire
program representation in memory

8
Goals of a Semantic Analyzer

Compiler must do more than recognize whether a
sentence belongs to the language...
Find all possible remaining errors that would
make program invalid
undefined variables, types
type errors that can be caught statically
Terminology
Static checks done by the compiler
Dynamic checks done at run time

9
Chapter 2A Simple One-Pass Compiler
10
Structure of Compiler Front End
Lexical analyzer
Syntax-directedtranslator
Intermediate representation
Tokenstream
11
Syntax Definition

Context-free grammar is a 4-tuple with
A set of tokens (terminal symbols)?
A set of nonterminals
A set of productions
A designated start symbol

12
Derivation

Given a CF grammar we can determine the set of
all strings (sequences of tokens) generated by
the grammar using derivation
We begin with the start symbol
In each step, we replace one nonterminal in the
current sentential form with one of the
right-hand sides of a production for that
nonterminal

13
Parse Tree

The root of the tree is labeled by the start
symbol
Each leaf of the tree is labeled by a terminal
(token) or ?
Each interior node is labeled by a nonterminal
If A ? X1 X2 Xn is a production, then node A
has immediate children X1, X2, , Xn where Xi is
a (non)terminal or ? (? denotes the empty string)?

14
Ambiguity
string
string
string
string
string
string
string
string
string
string
9
-
5

2
9
-
5

2
15
Associativity of Operators
Left-associative operators have left-recursive
productions
left ? left term term
String abc has the same meaning as (ab)c
Right-associative operators have right-recursive
productions
right ? term right term
String abc has the same meaning as a(bc)?
16
Precedence of Operators
Operators with higher precedence bind more
tightly
expr ? expr term termterm ? term factor
factorfactor ? number ( expr )
String 235 has the same meaning as 2(35)?
expr
expr
term
factor
term
term
factor
factor
number
number
number

2
3

5
17
Syntax-Directed Translation

Uses a CF grammar to specify the syntactic
structure of the language
AND associates a set of attributes with the
terminals and nonterminals of the grammar
AND associates with each production a set of
semantic rules to compute values of attributes
A parse tree is traversed and semantic rules
applied after the computations are completed the
attributes contain the translated form of the
input

18
Synthesized Attributes

An attribute is said to be synthesized if its
value at a parse-tree node is determined from the
attribute values at the children of the node

19
Annotated Parse Tree
expr.t 95-2
term.t 2
expr.t 95-
expr.t 9
term.t 5
term.t 9
9
-
5

2
20
Translation Schemes
A translation scheme is a CF grammar embedded
with semantic actions
rest ? term print() rest
Embeddedsemantic action
rest
term
rest

print()
21
Parsing

Parsing process of determining if a string of
tokens can be generated by a grammar
For any CF grammar there is a parser that takes
at most O(n3) time to parse a string of n tokens
Linear algorithms suffice for parsing programming
language source code
Top-down parsing constructs a parse tree from
root to leaves
Bottom-up parsing constructs a parse tree from
leaves to root

22
Predictive Parsing

Recursive descent parsing is a top-down parsing
method
Every nonterminal has one (recursive) procedure
responsible for parsing the nonterminals
syntactic category of input tokens
When a nonterminal has multiple productions, each
production is implemented in a branch of a
selection statement based on input look-ahead
information
Predictive parsing is a special form of recursive
descent parsing where we use one lookahead token
to unambiguously determine the parse operations

23
FIRST
FIRST(?) is the set of terminals that appear as
thefirst symbols of one or more strings
generated from ?
type ? simple id
array simple of typesimple ? integer
char num dotdot num
FIRST(simple) integer, char, num FIRST(
id) FIRST(type) integer, char, num,
, array
When a nonterminal A has two (or more)
productions as in
A ? ? ?
Then FIRST (?) and FIRST(?) must be disjoint
forpredictive parsing to work
24
Left Recursion
When a production for nonterminal A starts with
aself reference then a predictive parser loops
forever
A ? A ? ? ?
We can eliminate left recursive productions by
systematicallyrewriting the grammar using right
recursive productions
A ? ? R ? RR ? ? R ?
25
AST

Abstract Syntax Tree is a tree representation of
the program. Used for
semantic analysis (type checking)?
some optimization (e.g. constant folding)?
intermediate code generation (sometimes
intermediate code AST with somewhat different
set of nodes)?

26
Lexical Analysis

Typical tasks of the lexical analyzer
Remove white space and comments
Encode constants as tokens
Recognize keywords
Recognize identifiers and store identifier names
in a global symbol table

27
Chapter 3Lexical Analysis
28
Interaction of the Lexical Analyzer with the
Parser
Token,tokenval
LexicalAnalyzer
Parser
SourceProgram
Get nexttoken
error
error
Symbol Table
29
Tokens, Patterns, Lexemes

A token is a classification of lexical units
For example id and num
Lexemes are the specific character strings that
make up a token
For example abc and 123
Patterns are rules describing the set of lexemes
belonging to a token
For example letter followed by letters and
digits and non-empty sequence of digits

30
How To Describe Tokens

Programming language tokens can be described
using regular expressions
A regular expression R describes some set of
strings L(R)?
L(R) is the language defined by R
L(abc) abc
L(hellogoodbye) hello, goodbye
Idea define each kind of token using RE

31
Regular Expression Matching

Sketch of an efficient implementation
start in some initial state
look at each input character in sequence, update
scanner state accordingly
if state at end of input is an accept state, the
input string matches the RE
For tokenizing, only need a finite amount of
state (deterministic) finite automaton (DFA) or
finite state machine

32
DFA vs. NFA

DFA action of automaton on each input symbol is
fully determined
obvious table-driven implementation
NFA
automaton may have choice on each step
automaton accepts a string if there is any way to
make choices to arrive at accepting state /
every path from start state to an accept state is
a string accepted by automaton
not obvious how to implement efficiently!