Chapter 2 (part) Chapter 4: Syntax Analysis - PowerPoint PPT Presentation

Loading...

PPT – Chapter 2 (part) Chapter 4: Syntax Analysis PowerPoint presentation | free to download - id: 6992d4-M2EzM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Chapter 2 (part) Chapter 4: Syntax Analysis

Description:

Chapter 2 (part) + Chapter 4: Syntax Analysis S. M. Farhad * – PowerPoint PPT presentation

Number of Views:177
Avg rating:3.0/5.0
Slides: 63
Provided by: farhad
Learn more at: http://sydney.edu.au
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Chapter 2 (part) Chapter 4: Syntax Analysis


1
Chapter 2 (part) Chapter 4 Syntax Analysis
  • S. M. Farhad

2
Chapter 2 (part) Chapter 4 Syntax Analysis
  • S. M. Farhad

3
Grammars
  • Specify the syntax of a language
  • Hierarchical structure
  • Java if-else statement
  • if ( expr ) stmt else stmt
  • A production rule for if-else statement
  • stmt ? if ( expr ) stmt else stmt
  • Terminals and nonterminals

4
Context Free Grammars
  • The notation to specify syntax
  • Context Free Grammar (CFG)
  • Backus-Naur Form (BNF)
  • A context-free grammar
  • Analyze the syntax
  • Also used to translate the programs
  • Context free grammar ? Grammar

5
Components of Grammars
  • A set of terminal symbols
  • For example token, , -, keywords
  • A set of nonterminals
  • Sets of strings help define the language
  • Nonterminals impose a hierarchical structure
  • For example expr, stmt as follows
  • stmt ? if ( expr ) stmt else stmt

6
Components of Grammars
  • A set of productions
  • The head or left side
  • Consists of a nonterminal
  • An arrow means can have the form
  • Body or right side
  • A sequence of terminals and nonterminals
  • Start symbol
  • A special nonterminal symbol
  • The productions for the start symbol are listed
    first

7
Example
  • The arithmetic expression consisting of or
  • E ? E E E E EE (E) int
  • int ? 0123456789

8
Derivations
  • Beginning with the start symbol
  • Each rewriting step replaces a nonterminal by the
    body of one of its productions
  • Left most derivation
  • Leftmost nonterminal is always chosen
  • LL grammar (parses from left to right, left most)
  • Rightmost derivation
  • Rightmost nonterminal is always chosen
  • LR grammar (parses from left to right, right most)

9
Left Most Derivation
  • Given
  • E ? E E E E EE (E) int
  • String int int int
  • E gt E E
  • gt EE E
  • gt int E E
  • gt int int E
  • gt int int int

10
Right Most Derivation
  • String int int int

11
Parse Tree
  • String int int int

12
Ambiguity
  • Grammar that produces more than one parse tree
    for some sentence

For string int int int
13
Reasons for Ambiguity
  • Associativity and Precedence
  • , -, , / are left associate
  • , / have higher precedence than , -
  • Use E and T for two levels of precedence
  • Use F for basic units of expression

14
Non Ambiguous
  • F ? int (E)
  • T ? T F T / F F
  • E ? E T E T T
  • String int (int int)

15
Ambiguity The Dangling Else
  • Consider the grammar
  • S ? if E then S
  • if E then S else S
  • other
  • This grammar is also ambiguous

16
Ambiguity The Dangling Else
  • The expression
  • if E1 then if E2 then S1 else S2
  • has two parse trees

Typically we want the second form
17
The Dangling Else A Fix
  • else matches the closest unmatched then
  • We can describe this in the grammar
  • S ? MS / all then are matched /
  • US / some then are unmatched /
  • MS ? if E then MS else MS
  • other
  • US ? if E then S
  • if E then MS else US

18
The Dangling Else The Parse Tree
  • The expression
  • if E1 then if E2 then S1 else S2

19
CFG vs RE
  • Grammars are more powerful notation than RE
  • For RE (a l b)abb
  • A0 ? aA0 bA0 aA1
  • A1 ? bA2
  • A2 ? bA3
  • A3 ? ?

20
Why us RE in Lexical Analysis
  • Two manageable-sized components
  • More Simple
  • More Concise
  • Construction of Lexical Analyzer becomes easier
    and efficient

21
RE vs CFG
  • REs are most useful for
  • Identifiers, constants, keywords, and white space
  • Grammars are most useful for describing nested
    structure
  • Balanced parentheses, matching begin-end's,
    corresponding if-then-else
  • Nested structure cannot be described by RE

22
Parsing
  • Top down parsing
  • Starts at the root and proceeds towards the leave
  • Easier to understand and program manually
  • Bottom up parsing
  • Starts at the leaves and proceeds towards the
    root
  • more powerful, used by most parser generators

23
Recursive Descent Parsing
  • Consider the grammar
  • E ? T E T
  • T ? int int T ( E )
  • Token stream is int int
  • Start with top-level non-terminal E
  • Try the rules for E in order

24
Recursive Descent Parsing -Example
  • Try E ? T E
  • Then try a rule for T ? ( E )
  • But ( does not match input token int
  • Try T ? int - Token matches.
  • But after T does not match input token
  • Try T ? int T
  • This will match but after T will be unmatched
  • Has exhausted the choices for T
  • Backtrack to choice for E

25
Recursive Descent Parsing -Example
  • Token stream is int int
  • Try E ? T
  • Follow same steps as before for T
  • And succeed with T ? int T and T ? int
  • With the following parse tree

26
When Recursive Descent DoesNot Work
  • Consider the left-recursive grammar
  • S ? S a ß
  • S is called itself without consuming any symbol
  • Gets into an infinite loop
  • Recursive descent does not work in such cases

27
Elimination of Left Recursion
  • Consider the left-recursive grammar
  • S ? S a ß
  • S generates all strings starting with a ß and
    followed by a number of a
  • Can rewrite using right-recursion
  • S ? ß S
  • S ? a S e

28
More Elimination of Left-Recursion
  • In general
  • S ? S a1 S an ß1 ßm
  • All strings derived from S start with one of
    ß1,,ßm and continue with several instances of
    a1,,an
  • Rewrite as
  • S ? ß1 S ßm S
  • S ? a1 S an S e

29
General Left Recursion
  • The grammar
  • S ? A a d
  • A ? S ß
  • is also left-recursive because
  • S ? S ß a
  • This left-recursion can also be eliminated
  • See book, Section 4.3 for general algorithm

30
Summary of Recursive Descent
  • Simple and general parsing strategy
  • Left-recursion must be eliminated first
  • but that can be done automatically
  • Unpopular because of backtracking
  • Thought to be too inefficient
  • In practice, backtracking is eliminated by
    restricting the grammar

31
Predictive Parsers
  • Like recursive-descent but parser can predict
    which production to use
  • By looking at the next few tokens
  • No backtracking
  • Predictive parsers accept LL(k) grammars
  • L means left-to-right scan of input
  • L means leftmost derivation
  • k means predict based on k tokens of lookahead
  • In practice, LL(1) is used

32
LL(1) Languages
  • In recursive-descent, for each non-terminal and
    input token, may be a choice of production
  • LL(1) means that for each non-terminal and token
    there is only one production
  • Can be specified via 2D tables
  • One dimension for current non-terminal to expand
  • One dimension for next token
  • A table entry contains one production

33
Predictive Parsing and LeftFactoring
  • Recall the grammar
  • E ? T E T
  • T ? int int T ( E )
  • Hard to predict because
  • For T two productions start with int
  • For E it is not clear how to predict
  • A grammar must be left-factored before use for
    predictive parsing

34
Left-Factoring Example
  • Recall the grammar
  • E ? T E T
  • T ? int int T ( E )
  • Factor out common prefixes of productions
  • E ? T X
  • X ? E e
  • T ? ( E ) int Y
  • Y ? T e

35
Left-Factoring Example
  • Left-factored grammar
  • E ? T X X ? E e
  • T ? ( E ) int Y Y ? T e
  • Token stream is int int

36
LL(1) Parsing Table Example
  • Left-factored grammar
  • E ? T X X ? E e
  • T ? ( E ) int Y Y ? T e
  • LL(1) parsing table

int ( )
E T X T X
X E e e
T int Y ( E )
Y T e e e
37
LL(1) Parsing Table Example
  • Consider the E, int entry
  • When current non-terminal is E and next input is
    int, use production E ? T X
  • This production can generate a int in the first
    place
  • Consider the Y, entry
  • When current non-terminal is Y and current token
    is , get rid of Y
  • Y can be followed by only in a derivation in
    which Y ? e

38
LL(1) Parsing Tables - Errors
  • Blank entries indicate error situations
  • Consider the E, entry
  • There is no way to derive a string starting with
    from non-terminal E

39
Using Parsing Tables
  • Method similar to recursive descent, except
  • For each non-terminal S
  • We look at the next token a
  • And chose the production shown at S, a
  • We use a stack to keep track of pending
    nonterminals
  • We reject when we encounter an error state
  • We accept when we encounter end-of-input

40
LL(1) Parsing Algorithm
  • initialize stack ltS gt and next
  • repeat
  • case stack of
  • ltX, restgt if TX,next Y1Yn
  • then stack ? ltY1 Yn restgt
  • else error ()
  • ltt, restgt if t next
  • then stack ? ltrestgt
  • else error ()
  • until stack lt gt

41
LL(1) Parsing Example
  • Stack Input Action
  • E int int T X
  • T X int int int Y
  • int Y X int int terminal
  • Y X int T
  • T X int terminal
  • T X int int Y
  • int Y X int terminal
  • Y X e
  • X e
  • ACCEPT

42
Constructing Parsing Tables
  • LL(1) languages are those defined by a parsing
    table for the LL(1) algorithm
  • No table entry can be multiply defined
  • We want to generate parsing tables from CFG

43
Constructing Parsing Tables
  • If A ? a, where in the line of A we place a ?
  • In the column of t where t can start a string
    derived from a
  • a gt t ß
  • We say that t ? First(a)
  • In column of t if a is e and t can follow an A
  • S gt ß A t d
  • We say t ? Follow(A)

44
Computing First Sets
  • Definition First(X) t X gt ta ? e X
    gt e
  • Algorithm sketch (see book for details)
  • 1. For all terminals t do First(t) ? t
  • 2. If X ? A1 Ak
  • If a ? First(A1), add a to First(X)
  • Everything in First(A1) is in First(X)
  • If A1 does not drive e stop
  • If A1gt e then we add First(A2), and so on
  • 3. For each production X ? e, add e in First(X)

45
First Sets - Example
  • Recall the grammar
  • E ? T X X ? E e
  • T ? ( E ) int Y Y ? T e
  • First sets
  • First( ( ) ( First( T ) int, (
  • First( ) ) ) First( E ) int, (
  • First(int) int First( X ) , e
  • First( ) First( Y ) , e
  • First( )

46
Computing Follow Sets
  • Definition
  • Follow(B) t S gt ß B t d
  • If S is the start symbol then ? Follow(S)
  • If A ? a B ß then First(ß) - e is in Follow(B)
  • If A ? a B or A ? a B ß and e ? First(ß)
  • Follow(A) is in Follow(B)

47
Follow Sets. Example
  • Recall the grammar
  • E ? T X X ? E e
  • T ? ( E ) int Y Y ? T e
  • Follow sets
  • Follow( ) int, ( Follow( E ) ),
    Follow( ( ) int, ( Follow( X ) ),
  • Follow( ) int, ( Follow( T ) , ) ,
  • Follow( ) ) , ) , Follow( Y ) ,
    ) ,
  • Follow(int) , , ) ,

48
Constructing LL(1) ParsingTables
  • Construct a parsing table T for CFG, G
  • For each production A ? a in G do
  • For each terminal t ? First(a) do
  • TA, t a
  • If e ? First(a), for each t ? Follow(A) do
  • TA, t a
  • If e ? First(a) and ? Follow(A) do
  • TA, a

49
Constructing LL(1) ParsingTables
  • Grammar
  • E ? T X
  • X ? E e
  • T ? ( E ) int Y
  • Y ? T e

Follow Sets Follow( X ) ), Follow( E
) ), Follow( T ) , ) , Follow( Y )
, ) ,
First Sets First( T ) int, ( First( E
) int, ( First( X ) , e First( Y )
, e
int ( )
E T X T X
X E e e
T int Y ( E )
Y T e e e
50
LL(1) Parsing Example
  • Stack Input Action
  • E int int T X
  • T X int int int Y
  • int Y X int int terminal
  • Y X int T
  • T X int terminal
  • T X int int Y
  • int Y X int terminal
  • Y X e
  • X e
  • ACCEPT

51
Predictive Parsing for Dangling Else Grammar
  • Dangling else grammar
  • S ? i E t S i E t S e S a
  • E ? b
  • Left factoring
  • S ? i E t S S a
  • S ? e S e
  • E ? b

52
Predictive Parsing for Dangling Else Grammar
  • S ? i E t S S a
  • S ? e S e
  • E ? b

First(S) i, a First(E) b First(S)
e, e
Follow(S) e, Follow(S) e,
Follow(E) t
a b e i T
S S?a S?iEtSS
S S?eS S?e S?e
E E?b
53
Error Handling in Syntax Analysis
  • Goals
  • Report the presence of errors clearly and
    accurately
  • Recover from each error quickly
  • To detect subsequent errors
  • Add minimal overhead to the processing of correct
    programs

54
Error Recovery Strategies
  • Panic-Mode Recovery
  • Discards input symbols one at a time
  • Synchronizing tokens is used
  • Follow set, keyword, etc
  • Phrase-Level Recovery
  • Perform local correction on the remaining inputs
  • Replace a comma by a semicolon, delete an
    extraneous semicolon
  • For the empty cells of the parsing table
    implement the error correcting routines

55
Error Recovery Strategies
  • Error Productions
  • Augment the grammar for erroneous inputs
  • Global Correction
  • Make as few changes as possible in processing an
    incorrect input string
  • Read section 4.4.5

56
Error Recovery
  • Table entry A, a is empty input a is skipped
  • If the entry is synch then the stack top is
    popped
  • If the stack top terminal does not match input
    then stack top is popped

id ( )
E E' T T' F E ? TE' T ? FT' F ? id E ? TE1 synch T' e synch T' ? FT' synch E ? TE' T ? FT' F ? (E) E ? e synch T? e synch synch E ? e synch T? e synch
57
Error Recovery Panic Mode
Stack Input Remark
E E TE' FT'E' id T'E' TIE' FT'E' FT'E' TIE' E' TE' TE' FT'E' id T'E' T'E' E' ) id id id id id id id id id id i d id id id id id id id id error, skip ) id is in FIRST(E) error, M F, synch F has been popped
58
Bottom-up Parsing
  • Bottom-up parsing is more general than top-down
    parsing
  • Efficient although difficult by hand
  • Similar ideas of top-down parsing
  • Bottom-up is the preferred method in practice
  • Reading Section 4.5

59
Bottom-up Parsing
  • Bottom-up parsers dont need left factored
    grammars
  • Hence we can revert to the natural grammar for
    our example
  • E ? T E T
  • T ? int T int (E)
  • Consider the string int int int

60
Bottom-up Parsing
  • Bottom-up parsing reduces a string to the start
    symbol by inverting productions
  • int int int T ? int
  • int T int T ? int T
  • T int T ? int
  • T T E ? T
  • T E E ? T E
  • E

61
Observation
  • Read productions from bottom-up parse in reverse
    (i.e., from bottom to top)
  • This is a rightmost derivation!
  • int int int T ? int
  • int T int T ? int T
  • T int T ? int
  • T T E ? T
  • T E E ? T E
  • E

62
Trivial Bottom-Up ParsingAlgorithm
  • Let I input string
  • repeat
  • pick a non-empty substring ß of I
  • where X? ß is a production
  • if no such ß, backtrack
  • replace one ß by X in I
  • until I S (the start symbol) or
  • all possibilities are exhausted
About PowerShow.com