Chapter 2: A Simple One Pass Compiler - PowerPoint PPT Presentation

Loading...

PPT – Chapter 2: A Simple One Pass Compiler PowerPoint presentation | free to download - id: 16f2ff-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Chapter 2: A Simple One Pass Compiler

Description:

Parsing - Top Down & Predictive. Pulling Together the Pieces. The Lexical Analysis Process ... Top-Down Process. Recursive Descent or Predictive Parsing ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 40
Provided by: stevenad
Learn more at: http://www.cse.uconn.edu
Category:
Tags: chapter | compiler | down | one | pass | pulled | simple | top

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Chapter 2: A Simple One Pass Compiler


1
Chapter 2 A Simple One Pass Compiler
Aggelos Kiayias Computer Science Engineering
Department The University of Connecticut 371
Fairfield Road, Box U-1155 Storrs, CT 06269
aggelos_at_cse.uconn.edu http//www.cse.uconn.edu/ak
iayias
2
The Entire Compilation Process
  • Grammars for Syntax Definition
  • Syntax-Directed Translation
  • Parsing - Top Down Predictive
  • Pulling Together the Pieces
  • The Lexical Analysis Process
  • Symbol Table Considerations
  • A Brief Look at Code Generation
  • Concluding Remarks/Looking Ahead

3
Grammars for Syntax Definition
  • A Context-free Grammar (CFG) Is Utilized to
    Describe the Syntactic Structure of a Language
  • A CFG Is Characterized By
  • 1. A Set of Tokens or Terminal Symbols
  • 2. A Set of Non-terminals
  • 3. A Set of Production Rules Each Rule Has the
    Form NT ? T, NT
  • 4. A Non-terminal Designated As the Start Symbol

4
Grammars for Syntax Definition Example CFG
list ? list digit list ? list - digit list ?
digit digit ? 0 1 2 3 4 5 6 7 8
9 (the means OR) (So we could have
written list ? list digit list - digit
digit )
5
Grammars are Used to Derive Strings
Using the CFG defined on the previous slide, we
can derive the string 9 - 5 2 as
follows list ? list digit ? list -
digit digit ? digit - digit digit
? 9 - digit digit ? 9 - 5 digit
? 9 - 5 2
P1 list ? list digit P2 list ? list -
digit P3 list ? digit P4 digit ? 9 P4
digit ? 5 P4 digit ? 2
6
Grammars are Used to Derive Strings
This derivation could also be represented via a
Parse Tree (parents on left, children on right)
list ? list digit ? list - digit
digit ? digit - digit digit ? 9
- digit digit ? 9 - 5 digit ?
9 - 5 2
7
A More Complex Grammar
block ? begin opt_stmts end opt_stmts ?
stmt_list ? stmt_list ? stmt_list stmt
stmt
What is this grammar for ? What does ?
represent ? What kind of production rule is this ?
8
Defining a Parse Tree
  • More Formally, a Parse Tree for a CFG Has the
    Following Properties
  • Root Is Labeled With the Start Symbol
  • Leaf Node Is a Token or ?
  • Interior Node (Now Leaf) Is a Non-Terminal
  • If A ? x1x2xn, Then A Is an Interior
    x1x2xn Are Children of A and May Be
    Non-Terminals or Tokens

9
Other Important Concepts Ambiguity
Two derivations (Parse Trees) for the same token
string.
Grammar string ? string string string
string 0 1 9
Why is this a Problem ?
10
Other Important Concepts Associativity of
Operators
Left vs. Right
right ? letter right letter letter ? a b
c z
list ? list digit list - digit
digit digit ? 0 1 2 9
11
Embedding Associativity
  • The language of arithmetic expressions with -
  • (ambiguous) grammar that does not enforce
    associativity
  • string ? string string string string 0
    1 9
  • non-ambiguous grammar enforcing left
    associativity (parse tree will grow to the left)
  • string ? string digit string - digit
    digit
  • digit ? 0 1 2 9
  • non-ambiguous grammar enforcing right
    associativity (parse tree will grow to the right)
  • string ? digit string digit - string
    digit
  • digit ? 0 1 2 9

12
Other Important Concepts Operator Precedence
What does 9 5 2 mean?
( ) / -
is precedence order
Typically
This can be incorporated into a grammar via
rules
expr ? expr term expr term term term ?
term factor term / factor factor factor ?
digit ( expr ) digit ? 0 1 2 3 9
Precedemce Achieved by expr term for each
precedence level Rules for each are left
recursive or associate to the left
13
Syntax-Directed Translation
  • Associate Attributes With Grammar Rules
    Constructs and Translate As Parsing Occurs
  • The translation will follow the parse tree
    structure (and as a result the structure and form
    of the parse tree will affect the translation).
  • First example Inductive Translation.
  • Infix to Postfix Notation Translation for
    Expressions
  • Translation defined inductively As Postfix(E)
    where E is an Expression.

Rules
1. If E is a variable or constant then
Postfix(E) E 2. If E is E1 op E2 then
Postfix(E) Postfix(E1 op E2)
Postfix(E1) Postfix(E2) op 3. If E is (E1)
then Postfix(E) Postfix(E1)
14
Examples
  • Postfix( ( 9 5 ) 2 )
  • Postfix( ( 9 5 ) ) Postfix( 2 )
  • Postfix( 9 5 ) Postfix( 2 )
  • Postfix( 9 ) Postfix( 5 ) - Postfix( 2 )
  • 9 5 2
  • Postfix(9 ( 5 2 ) )
  • Postfix( 9 ) Postfix( ( 5 2 ) ) -
  • Postfix( 9 ) Postfix( 5 2 )
  • Postfix( 9 ) Postfix( 5 ) Postfix( 2 )
  • 9 5 2

15
Syntax-Directed Definition
  • Each Production Has a Set of Semantic Rules
  • Each Grammar Symbol Has a Set of Attributes
  • For the Following Example, String Attribute t
    is Associated With Each Grammar Symbol
  • recall What is a Derivation for 9 5 - 2?

list ? list - digit ? list digit - digit
? digit digit - digit ? 9 digit - digit
? 9 5 - digit ? 9 5 - 2
16
Syntax-Directed Definition (2)
  • Each Production Rule of the CFG Has a Semantic
    Rule
  • Note Semantic Rules for expr define t as a
    synthesized attribute i.e., the various copies
    of t obtain their values from children ts

17
Semantic Rules are Embedded in Parse Tree
  • How Do Semantic Rules Work ?
  • What Type of Tree Traversal is Being Performed?
  • How Can We More Closely Associate Semantic Rules
    With Production Rules ?

18
Translation Schemes
Embed Semantic Actions into the right sides of
the productions.
19
Parsing Top-Down Predictive
  • Top-Down Parsing ? Parse tree / derivation of
    a token string occurs in a top down fashion.
  • For Example, Consider

Start symbol
type ? simple ? id
array simple of type simple ? integer
char num dotdot num
Suppose input is array num dotdot num
of integer Parsing would begin with type ?
???
20
Top-Down Parse (type start symbol)
Lookahead symbol
Input array num dotdot num of integer
Lookahead symbol
Input array num dotdot num of integer
21
Top-Down Parse (type start symbol)
Lookahead symbol
Input array num dotdot num of integer
22
Top-Down Process Recursive Descent or Predictive
Parsing
  • Parser Operates by Attempting to Match Tokens in
    the Input Stream
  • Utilize both Grammar and Input Below to Motivate
    Code for Algorithm

array num dotdot num of integer
type ? simple ? id
array simple of type simple ? integer
char num dotdot num
procedure match ( t token ) begin
if lookahead t then
lookahead nexttoken else
error end
23
Top-Down Algorithm (Continued)
procedure type begin if lookahead
is in integer, char, num then simple
else if lookahead ? then begin match
(? ) match( id ) end else if
lookahead array then begin
match( array ) match() simple match()
match(of) type end
else error end procedure simple
begin if lookahead integer then
match ( integer ) else if lookahead
char then match ( char ) else
if lookahead num then begin
match (num) match (dotdot) match
(num) end
else error end
24
Tracing
  • Input array num dotdot num of integer
  • To initialize the parser
  • set global variable lookahead array
  • call procedure type
  • Procedure call to type with lookahead array
    results in the actions
  • match( array ) match() simple match()
    match(of) type
  • Procedure call to simple with lookahead num
    results in the actions
  • match (num) match (dotdot) match (num)
  • Procedure call to type with lookahead integer
    results in the actions
  • simple
  • Procedure call to simple with lookahead integer
    results in the actions
  • match ( integer )

25
Limitations
  • Can we apply the previous technique to every
    grammar?
  • NO
  • type ? simple
  • array simple of type
  • simple ? integer
  • array digit
  • digit ? 0123456789
  • consider the string array 6
  • the predictive parser starts with type and
    lookahead array
  • apply production type ? simple OR type ? array
    digit ??

26
Designing a Predictive Parser
  • Consider A??
  • FIRST(?)set of leftmost tokens that appear in ?
    or in strings generated by ?.
  • E.g. FIRST(type)?,array,integer,char,num
  • Consider productions of the form A??, A?? the
    sets FIRST(?) and FIRST(?) should be disjoint
  • Then we can implement predictive parsing
    (initially start NT lookaheadlefmost)
  • Starting with A?? we find into which FIRST() set
    the lookahead symbol belongs to and we use this
    production.
  • Any non-terminal results in the corresponding
    procedure call
  • Terminals are matched.

27
Problems with Top Down Parsing
  • Left Recursion in CFG May Cause Parser to Loop
    Forever.
  • Indeed
  • In the production A?A? we write the
    program procedure A if lookahead belongs to
    First(A?) then call the procedure A
  • Solution Remove Left Recursion...
  • without changing the Language defined by the
    Grammar.

28
Dealing with Left recursion
  • Solution Algorithm to Remove Left Recursion

BASIC IDEA A?A?? becomes A? ?R R? ?R ?
29
What happens to semantic actions?
expr ? expr term print() ? expr -
term print(-) ? term term ? 0
print(0) term ? 1
print(1) term ? 9
print(9)
expr ? term rest rest ? term print()
rest ? - term print(-) rest
? ? term ? 0 print(0) term
? 1 print(1) term ? 9
print(9)
30
Comparing Grammars with Left Recursion
  • Notice Location of Semantic Actions in Tree
  • What is Order of Processing?

31
Comparing Grammars without Left Recursion
  • Now, Notice Location of Semantic Actions in Tree
    for Revised Grammar
  • What is Order of Processing in this Case?

rest
32
The Lexical Analysis Process A Graphical Depiction
returns token to caller
uses getchar ( ) to read character
lexan ( ) lexical analyzer
pushes back c using ungetc (c , stdin)
tokenval
Sets global variable to attribute value
33
The Lexical Analysis Process Functional
Responsibilities
  • Input Token String Is Broken Down
  • White Space and Comments Are Filtered Out
  • Individual Tokens With Associated Values Are
    Identified
  • Symbol Table Is Initialized and Entries Are
    Constructed for Each Appropriate Token
  • Under What Conditions will a Character be Pushed
    Back?

34
Example of a Lexical Analyzer
function lexan integer var lexbuf
array 0 .. 100 of char c
char begin loop begin
read a character into c
if c is a blank or a tab then
do nothing else if
c is a newline then
lineno lineno 1 else if
c is a digit then begin
set tokenval to the value of this and
following digits
return NUM end
35
Algorithm for Lexical Analyzer
else if c is a letter then
begin place c and
successive letters and digits into lexbuf
p lookup ( lexbuf )
if p 0 then
p insert ( lexbf,
ID) tokenval p
return the token field of
table entry p end
else set tokenval
to NONE / there is no attribute /
return integer encoding of
character c end end
Note Insert / Lookup operations occur against
the Symbol Table !
36
Symbol Table Considerations
OPERATIONS Insert (string, token_ID)
Lookup (string) NOTICE
Reserved words are placed into
symbol table for easy
lookup Attributes may be associated with each
entry, i.e.,
Semantic Actions
Typing Info id ? integer
etc.
ARRAY symtable lexptr
token attributes
div mod
id id
0 1 2 3 4




ARRAY lexemes
37
A Brief Look at Code Generation
  • Back-end of Compilation Process - Which Will Not
    Be Our Emphasis
  • Well Focus on Front-end
  • Important Concepts to Re-emphasize

Abstract Stack Machine for Intermediate
Code Generation (i) basic arithmetic,
(ii) stack, (iii), flow control L-value
Vs. R-value of an identifier I
5 L - Location I
I 1 R - Contents
38
A Brief Look at Code Generation
  • Employ Statement Templates for Code Generation.
  • Each Template Characterizes the Translation
  • Different Templates for Each Major Programming
    Language Construct, if, while, procedure, etc.

WHILE
IF
label test
code for expr
code for expr
gofalse out
gofalse out
code for stmt
code for stmt
label out
goto test
label out
39
Concluding Remarks / Looking Ahead
  • Weve Reviewed / Highlighted Entire Compilation
    Process
  • Introduced Context-free Grammars (CFG) and
    Indicated /Illustrated Relationship to Compiler
    Theory
  • Reviewed Many Different Versions of Parse Trees
    That Assist in Both Recognition and Translation
  • Well Return to Beginning - Lexical Analysis
  • Well Explore Close Relationship of Lexical
    Analysis to Regular Expressions, Grammars, and
    Finite Automatons
About PowerShow.com