Loading...

PPT – Chapter 2 (part) Chapter 4: Syntax Analysis PowerPoint presentation | free to download - id: 6992d4-M2EzM

The Adobe Flash plugin is needed to view this content

Chapter 2 (part) Chapter 4 Syntax Analysis

- S. M. Farhad

Chapter 2 (part) Chapter 4 Syntax Analysis

- S. M. Farhad

Grammars

- Specify the syntax of a language
- Hierarchical structure
- Java if-else statement
- if ( expr ) stmt else stmt
- A production rule for if-else statement
- stmt ? if ( expr ) stmt else stmt
- Terminals and nonterminals

Context Free Grammars

- The notation to specify syntax
- Context Free Grammar (CFG)
- Backus-Naur Form (BNF)
- A context-free grammar
- Analyze the syntax
- Also used to translate the programs
- Context free grammar ? Grammar

Components of Grammars

- A set of terminal symbols
- For example token, , -, keywords
- A set of nonterminals
- Sets of strings help define the language
- Nonterminals impose a hierarchical structure
- For example expr, stmt as follows
- stmt ? if ( expr ) stmt else stmt

Components of Grammars

- A set of productions
- The head or left side
- Consists of a nonterminal
- An arrow means can have the form
- Body or right side
- A sequence of terminals and nonterminals
- Start symbol
- A special nonterminal symbol
- The productions for the start symbol are listed

first

Example

- The arithmetic expression consisting of or
- E ? E E E E EE (E) int
- int ? 0123456789

Derivations

- Beginning with the start symbol
- Each rewriting step replaces a nonterminal by the

body of one of its productions - Left most derivation
- Leftmost nonterminal is always chosen
- LL grammar (parses from left to right, left most)
- Rightmost derivation
- Rightmost nonterminal is always chosen
- LR grammar (parses from left to right, right most)

Left Most Derivation

- Given
- E ? E E E E EE (E) int
- String int int int
- E gt E E
- gt EE E
- gt int E E
- gt int int E
- gt int int int

Right Most Derivation

- String int int int

Parse Tree

- String int int int

Ambiguity

- Grammar that produces more than one parse tree

for some sentence

For string int int int

Reasons for Ambiguity

- Associativity and Precedence
- , -, , / are left associate
- , / have higher precedence than , -
- Use E and T for two levels of precedence
- Use F for basic units of expression

Non Ambiguous

- F ? int (E)
- T ? T F T / F F
- E ? E T E T T
- String int (int int)

Ambiguity The Dangling Else

- Consider the grammar
- S ? if E then S
- if E then S else S
- other
- This grammar is also ambiguous

Ambiguity The Dangling Else

- The expression
- if E1 then if E2 then S1 else S2
- has two parse trees

Typically we want the second form

The Dangling Else A Fix

- else matches the closest unmatched then
- We can describe this in the grammar
- S ? MS / all then are matched /
- US / some then are unmatched /
- MS ? if E then MS else MS
- other
- US ? if E then S
- if E then MS else US

The Dangling Else The Parse Tree

- The expression
- if E1 then if E2 then S1 else S2

CFG vs RE

- Grammars are more powerful notation than RE
- For RE (a l b)abb
- A0 ? aA0 bA0 aA1
- A1 ? bA2
- A2 ? bA3
- A3 ? ?

Why us RE in Lexical Analysis

- Two manageable-sized components
- More Simple
- More Concise
- Construction of Lexical Analyzer becomes easier

and efficient

RE vs CFG

- REs are most useful for
- Identifiers, constants, keywords, and white space
- Grammars are most useful for describing nested

structure - Balanced parentheses, matching begin-end's,

corresponding if-then-else - Nested structure cannot be described by RE

Parsing

- Top down parsing
- Starts at the root and proceeds towards the leave
- Easier to understand and program manually
- Bottom up parsing
- Starts at the leaves and proceeds towards the

root - more powerful, used by most parser generators

Recursive Descent Parsing

- Consider the grammar
- E ? T E T
- T ? int int T ( E )
- Token stream is int int
- Start with top-level non-terminal E
- Try the rules for E in order

Recursive Descent Parsing - Example

- Try E ? T E
- Then try a rule for T ? ( E )
- But ( does not match input token int
- Try T ? int - Token matches.
- But after T does not match input token
- Try T ? int T
- This will match but after T will be unmatched
- Has exhausted the choices for T
- Backtrack to choice for E

Recursive Descent Parsing - Example

- Token stream is int int
- Try E ? T
- Follow same steps as before for T
- And succeed with T ? int T and T ? int
- With the following parse tree

When Recursive Descent Does Not Work

- Consider the left-recursive grammar
- S ? S a ß
- S is called itself without consuming any symbol
- Gets into an infinite loop
- Recursive descent does not work in such cases

Elimination of Left Recursion

- Consider the left-recursive grammar
- S ? S a ß
- S generates all strings starting with a ß and

followed by a number of a - Can rewrite using right-recursion
- S ? ß S
- S ? a S e

More Elimination of Left- Recursion

- In general
- S ? S a1 S an ß1 ßm
- All strings derived from S start with one of

ß1,,ßm and continue with several instances of

a1,,an - Rewrite as
- S ? ß1 S ßm S
- S ? a1 S an S e

General Left Recursion

- The grammar
- S ? A a d
- A ? S ß
- is also left-recursive because
- S ? S ß a
- This left-recursion can also be eliminated
- See book, Section 4.3 for general algorithm

Summary of Recursive Descent

- Simple and general parsing strategy
- Left-recursion must be eliminated first
- but that can be done automatically
- Unpopular because of backtracking
- Thought to be too inefficient
- In practice, backtracking is eliminated by

restricting the grammar

Predictive Parsers

- Like recursive-descent but parser can predict

which production to use - By looking at the next few tokens
- No backtracking
- Predictive parsers accept LL(k) grammars
- L means left-to-right scan of input
- L means leftmost derivation
- k means predict based on k tokens of lookahead
- In practice, LL(1) is used

LL(1) Languages

- In recursive-descent, for each non-terminal and

input token, may be a choice of production - LL(1) means that for each non-terminal and token

there is only one production - Can be specified via 2D tables
- One dimension for current non-terminal to expand
- One dimension for next token
- A table entry contains one production

Predictive Parsing and Left Factoring

- Recall the grammar
- E ? T E T
- T ? int int T ( E )
- Hard to predict because
- For T two productions start with int
- For E it is not clear how to predict
- A grammar must be left-factored before use for

predictive parsing

Left-Factoring Example

- Recall the grammar
- E ? T E T
- T ? int int T ( E )
- Factor out common prefixes of productions
- E ? T X
- X ? E e
- T ? ( E ) int Y
- Y ? T e

Left-Factoring Example

- Left-factored grammar
- E ? T X X ? E e
- T ? ( E ) int Y Y ? T e
- Token stream is int int

LL(1) Parsing Table Example

- Left-factored grammar
- E ? T X X ? E e
- T ? ( E ) int Y Y ? T e
- LL(1) parsing table

int ( )

E T X T X

X E e e

T int Y ( E )

Y T e e e

LL(1) Parsing Table Example

- Consider the E, int entry
- When current non-terminal is E and next input is

int, use production E ? T X - This production can generate a int in the first

place - Consider the Y, entry
- When current non-terminal is Y and current token

is , get rid of Y - Y can be followed by only in a derivation in

which Y ? e

LL(1) Parsing Tables - Errors

- Blank entries indicate error situations
- Consider the E, entry
- There is no way to derive a string starting with

from non-terminal E

Using Parsing Tables

- Method similar to recursive descent, except
- For each non-terminal S
- We look at the next token a
- And chose the production shown at S, a
- We use a stack to keep track of pending

nonterminals - We reject when we encounter an error state
- We accept when we encounter end-of-input

LL(1) Parsing Algorithm

- initialize stack ltS gt and next
- repeat
- case stack of
- ltX, restgt if TX,next Y1Yn
- then stack ? ltY1 Yn restgt
- else error ()
- ltt, restgt if t next
- then stack ? ltrestgt
- else error ()
- until stack lt gt

LL(1) Parsing Example

- Stack Input Action
- E int int T X
- T X int int int Y
- int Y X int int terminal
- Y X int T
- T X int terminal
- T X int int Y
- int Y X int terminal
- Y X e
- X e
- ACCEPT

Constructing Parsing Tables

- LL(1) languages are those defined by a parsing

table for the LL(1) algorithm - No table entry can be multiply defined
- We want to generate parsing tables from CFG

Constructing Parsing Tables

- If A ? a, where in the line of A we place a ?
- In the column of t where t can start a string

derived from a - a gt t ß
- We say that t ? First(a)
- In column of t if a is e and t can follow an A
- S gt ß A t d
- We say t ? Follow(A)

Computing First Sets

- Definition First(X) t X gt ta ? e X

gt e - Algorithm sketch (see book for details)
- 1. For all terminals t do First(t) ? t
- 2. If X ? A1 Ak
- If a ? First(A1), add a to First(X)
- Everything in First(A1) is in First(X)
- If A1 does not drive e stop
- If A1gt e then we add First(A2), and so on
- 3. For each production X ? e, add e in First(X)

First Sets - Example

- Recall the grammar
- E ? T X X ? E e
- T ? ( E ) int Y Y ? T e
- First sets
- First( ( ) ( First( T ) int, (
- First( ) ) ) First( E ) int, (
- First(int) int First( X ) , e
- First( ) First( Y ) , e
- First( )

Computing Follow Sets

- Definition
- Follow(B) t S gt ß B t d
- If S is the start symbol then ? Follow(S)
- If A ? a B ß then First(ß) - e is in Follow(B)
- If A ? a B or A ? a B ß and e ? First(ß)
- Follow(A) is in Follow(B)

Follow Sets. Example

- Recall the grammar
- E ? T X X ? E e
- T ? ( E ) int Y Y ? T e
- Follow sets
- Follow( ) int, ( Follow( E ) ),

Follow( ( ) int, ( Follow( X ) ), - Follow( ) int, ( Follow( T ) , ) ,

- Follow( ) ) , ) , Follow( Y ) ,

) , - Follow(int) , , ) ,

Constructing LL(1) Parsing Tables

- Construct a parsing table T for CFG, G
- For each production A ? a in G do
- For each terminal t ? First(a) do
- TA, t a
- If e ? First(a), for each t ? Follow(A) do
- TA, t a
- If e ? First(a) and ? Follow(A) do
- TA, a

Constructing LL(1) Parsing Tables

- Grammar
- E ? T X
- X ? E e
- T ? ( E ) int Y
- Y ? T e

Follow Sets Follow( X ) ), Follow( E

) ), Follow( T ) , ) , Follow( Y )

, ) ,

First Sets First( T ) int, ( First( E

) int, ( First( X ) , e First( Y )

, e

int ( )

E T X T X

X E e e

T int Y ( E )

Y T e e e

LL(1) Parsing Example

- Stack Input Action
- E int int T X
- T X int int int Y
- int Y X int int terminal
- Y X int T
- T X int terminal
- T X int int Y
- int Y X int terminal
- Y X e
- X e
- ACCEPT

Predictive Parsing for Dangling Else Grammar

- Dangling else grammar
- S ? i E t S i E t S e S a
- E ? b
- Left factoring
- S ? i E t S S a
- S ? e S e
- E ? b

Predictive Parsing for Dangling Else Grammar

- S ? i E t S S a
- S ? e S e
- E ? b

First(S) i, a First(E) b First(S)

e, e

Follow(S) e, Follow(S) e,

Follow(E) t

a b e i T

S S?a S?iEtSS

S S?eS S?e S?e

E E?b

Error Handling in Syntax Analysis

- Goals
- Report the presence of errors clearly and

accurately - Recover from each error quickly
- To detect subsequent errors
- Add minimal overhead to the processing of correct

programs

Error Recovery Strategies

- Panic-Mode Recovery
- Discards input symbols one at a time
- Synchronizing tokens is used
- Follow set, keyword, etc
- Phrase-Level Recovery
- Perform local correction on the remaining inputs
- Replace a comma by a semicolon, delete an

extraneous semicolon - For the empty cells of the parsing table

implement the error correcting routines

Error Recovery Strategies

- Error Productions
- Augment the grammar for erroneous inputs
- Global Correction
- Make as few changes as possible in processing an

incorrect input string - Read section 4.4.5

Error Recovery

- Table entry A, a is empty input a is skipped
- If the entry is synch then the stack top is

popped - If the stack top terminal does not match input

then stack top is popped

id ( )

E E' T T' F E ? TE' T ? FT' F ? id E ? TE1 synch T' e synch T' ? FT' synch E ? TE' T ? FT' F ? (E) E ? e synch T? e synch synch E ? e synch T? e synch

Error Recovery Panic Mode

Stack Input Remark

E E TE' FT'E' id T'E' TIE' FT'E' FT'E' TIE' E' TE' TE' FT'E' id T'E' T'E' E' ) id id id id id id id id id id i d id id id id id id id id error, skip ) id is in FIRST(E) error, M F, synch F has been popped

Bottom-up Parsing

- Bottom-up parsing is more general than top-down

parsing - Efficient although difficult by hand
- Similar ideas of top-down parsing
- Bottom-up is the preferred method in practice
- Reading Section 4.5

Bottom-up Parsing

- Bottom-up parsers dont need left factored

grammars - Hence we can revert to the natural grammar for

our example - E ? T E T
- T ? int T int (E)
- Consider the string int int int

Bottom-up Parsing

- Bottom-up parsing reduces a string to the start

symbol by inverting productions - int int int T ? int
- int T int T ? int T
- T int T ? int
- T T E ? T
- T E E ? T E
- E

Observation

- Read productions from bottom-up parse in reverse

(i.e., from bottom to top) - This is a rightmost derivation!
- int int int T ? int
- int T int T ? int T
- T int T ? int
- T T E ? T
- T E E ? T E
- E

Trivial Bottom-Up Parsing Algorithm

- Let I input string
- repeat
- pick a non-empty substring ß of I
- where X? ß is a production
- if no such ß, backtrack
- replace one ß by X in I
- until I S (the start symbol) or
- all possibilities are exhausted