Compilers Principles, Techniques, and Tools Chapters 13 Based on Florida State University Spring 200 - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Compilers Principles, Techniques, and Tools Chapters 13 Based on Florida State University Spring 200

Description:

Parse tree or abstract syntax tree ... A parse tree is traversed and semantic rules applied: after the computations are ... for parsing programming language ... – PowerPoint PPT presentation

Number of Views:2484
Avg rating:5.0/5.0
Slides: 33
Provided by: csVir
Category:

less

Transcript and Presenter's Notes

Title: Compilers Principles, Techniques, and Tools Chapters 13 Based on Florida State University Spring 200


1
CompilersPrinciples, Techniques, and
ToolsChapters 1-3Based on Florida State
University Spring 2007 COP5621 slideshttp//www.c
s.fsu.edu/engelen/courses/COP5621
  • Tamim Sookoor
  • 10/25/2007

2
Chapter 1Introduction to Compiling
3
Compilers
  • Compilation
  • Translation of a program written in a source
    language into a semantically equivalent program
    written in a target language

Input
Compiler
Target Program
Source Program
Error messages
Output
4
The Analysis-Synthesis Model of Compilation
  • There are two parts to compilation
  • Analysis determines the operations implied by the
    source program which are recorded in a tree
    structure
  • Synthesis takes the tree structure and translates
    the operations therein into the target program

5
Cousins of the Compiler
Skeletal Source Program
Preprocessor
Source Program
Compiler
Target Assembly Program
Assembler
Relocatable Object Code
Linker
Libraries and Relocatable Object Files
Absolute Machine Code
6
The Phases of a Compiler
7
The Grouping of Phases
  • Compiler front and back ends
  • Front end analysis (machine independent)?
  • Back end synthesis (machine dependent)?
  • Compiler passes
  • A collection of phases is done only once (single
    pass) or multiple times (multi pass)?
  • Single pass usually requires everything to be
    defined before being used in source program
  • Multi pass compiler may have to keep entire
    program representation in memory

8
Goals of a Semantic Analyzer
  • Compiler must do more than recognize whether a
    sentence belongs to the language...
  • Find all possible remaining errors that would
    make program invalid
  • undefined variables, types
  • type errors that can be caught statically
  • Terminology
  • Static checks done by the compiler
  • Dynamic checks done at run time

9
Chapter 2A Simple One-Pass Compiler
10
Structure of Compiler Front End
Lexical analyzer
Syntax-directedtranslator
Intermediate representation
Tokenstream
11
Syntax Definition
  • Context-free grammar is a 4-tuple with
  • A set of tokens (terminal symbols)?
  • A set of nonterminals
  • A set of productions
  • A designated start symbol

12
Derivation
  • Given a CF grammar we can determine the set of
    all strings (sequences of tokens) generated by
    the grammar using derivation
  • We begin with the start symbol
  • In each step, we replace one nonterminal in the
    current sentential form with one of the
    right-hand sides of a production for that
    nonterminal

13
Parse Tree
  • The root of the tree is labeled by the start
    symbol
  • Each leaf of the tree is labeled by a terminal
    (token) or ?
  • Each interior node is labeled by a nonterminal
  • If A ? X1 X2 Xn is a production, then node A
    has immediate children X1, X2, , Xn where Xi is
    a (non)terminal or ? (? denotes the empty string)?

14
Ambiguity
string
string
string
string
string
string
string
string
string
string
9
-
5

2
9
-
5

2
15
Associativity of Operators
Left-associative operators have left-recursive
productions
left ? left term term
String abc has the same meaning as (ab)c
Right-associative operators have right-recursive
productions
right ? term right term
String abc has the same meaning as a(bc)?
16
Precedence of Operators
Operators with higher precedence bind more
tightly
expr ? expr term termterm ? term factor
factorfactor ? number ( expr )
String 235 has the same meaning as 2(35)?
expr
expr
term
factor
term
term
factor
factor
number
number
number

2
3

5
17
Syntax-Directed Translation
  • Uses a CF grammar to specify the syntactic
    structure of the language
  • AND associates a set of attributes with the
    terminals and nonterminals of the grammar
  • AND associates with each production a set of
    semantic rules to compute values of attributes
  • A parse tree is traversed and semantic rules
    applied after the computations are completed the
    attributes contain the translated form of the
    input

18
Synthesized Attributes
  • An attribute is said to be synthesized if its
    value at a parse-tree node is determined from the
    attribute values at the children of the node

19
Annotated Parse Tree
expr.t 95-2
term.t 2
expr.t 95-
expr.t 9
term.t 5
term.t 9
9
-
5

2
20
Translation Schemes
A translation scheme is a CF grammar embedded
with semantic actions
rest ? term print() rest
Embeddedsemantic action
rest
term
rest

print()
21
Parsing
  • Parsing process of determining if a string of
    tokens can be generated by a grammar
  • For any CF grammar there is a parser that takes
    at most O(n3) time to parse a string of n tokens
  • Linear algorithms suffice for parsing programming
    language source code
  • Top-down parsing constructs a parse tree from
    root to leaves
  • Bottom-up parsing constructs a parse tree from
    leaves to root

22
Predictive Parsing
  • Recursive descent parsing is a top-down parsing
    method
  • Every nonterminal has one (recursive) procedure
    responsible for parsing the nonterminals
    syntactic category of input tokens
  • When a nonterminal has multiple productions, each
    production is implemented in a branch of a
    selection statement based on input look-ahead
    information
  • Predictive parsing is a special form of recursive
    descent parsing where we use one lookahead token
    to unambiguously determine the parse operations

23
FIRST
FIRST(?) is the set of terminals that appear as
thefirst symbols of one or more strings
generated from ?
type ? simple id
array simple of typesimple ? integer
char num dotdot num
FIRST(simple) integer, char, num FIRST(
id) FIRST(type) integer, char, num,
, array
When a nonterminal A has two (or more)
productions as in
A ? ? ?
Then FIRST (?) and FIRST(?) must be disjoint
forpredictive parsing to work
24
Left Recursion
When a production for nonterminal A starts with
aself reference then a predictive parser loops
forever
A ? A ? ? ?
We can eliminate left recursive productions by
systematicallyrewriting the grammar using right
recursive productions
A ? ? R ? RR ? ? R ?
25
AST
  • Abstract Syntax Tree is a tree representation of
    the program. Used for
  • semantic analysis (type checking)?
  • some optimization (e.g. constant folding)?
  • intermediate code generation (sometimes
    intermediate code AST with somewhat different
    set of nodes)?

26
Lexical Analysis
  • Typical tasks of the lexical analyzer
  • Remove white space and comments
  • Encode constants as tokens
  • Recognize keywords
  • Recognize identifiers and store identifier names
    in a global symbol table

27
Chapter 3Lexical Analysis
28
Interaction of the Lexical Analyzer with the
Parser
Token,tokenval
LexicalAnalyzer
Parser
SourceProgram
Get nexttoken
error
error
Symbol Table
29
Tokens, Patterns, Lexemes
  • A token is a classification of lexical units
  • For example id and num
  • Lexemes are the specific character strings that
    make up a token
  • For example abc and 123
  • Patterns are rules describing the set of lexemes
    belonging to a token
  • For example letter followed by letters and
    digits and non-empty sequence of digits

30
How To Describe Tokens
  • Programming language tokens can be described
    using regular expressions
  • A regular expression R describes some set of
    strings L(R)?
  • L(R) is the language defined by R
  • L(abc) abc
  • L(hellogoodbye) hello, goodbye
  • Idea define each kind of token using RE

31
Regular Expression Matching
  • Sketch of an efficient implementation
  • start in some initial state
  • look at each input character in sequence, update
    scanner state accordingly
  • if state at end of input is an accept state, the
    input string matches the RE
  • For tokenizing, only need a finite amount of
    state (deterministic) finite automaton (DFA) or
    finite state machine

32
DFA vs. NFA
  • DFA action of automaton on each input symbol is
    fully determined
  • obvious table-driven implementation
  • NFA
  • automaton may have choice on each step
  • automaton accepts a string if there is any way to
    make choices to arrive at accepting state /
    every path from start state to an accept state is
    a string accepted by automaton
  • not obvious how to implement efficiently!
Write a Comment
User Comments (0)
About PowerShow.com