Scanner - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Scanner

Description:

... of regular sentences a finite automaton, also called a state machine, is ... Finite automata = mechanisms to generate tokens from input stream. Just Use ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 56
Provided by: ccck
Category:

less

Transcript and Presenter's Notes

Title: Scanner


1
Scanner
2
Grammar
3
Language
4
Recursive Definition
5
Mathematical Expression
6
Structure of Expressions
7
Formal Language
8
Backus Naur Form (BNF)
1960 by J. Backus and P. Naur
9
EBNF (Extended BNF)
10
BNF ? EBNF
BNF
EBNF
11
Formalism (Formal notation)
  • N. Chomsky
  • ?????????

N. Chromsky -
12
Differing structural trees for the same expression
13
Problem of Different structural trees
14
No Ambiguous Sentence
15
Context Free Language
  • Syntactic equations of the form defined in EBNF
    generate context-free languages.
  • The term "context free is due to Chomsky and
    stems from the fact that substitution of the
    symbol left of by a sequence derived from the
    expression to the right of is always permitted,
    regardless of the context in which the symbol is
    embedded within the sentence.
  • It has turned out that this restriction to
    context freedom (in the sense of Chomsky) is
    quite acceptable for programming languages, and
    that it is even desirable.
  • Context dependence in another sense, however, is
    indispensible. We will return to this topic in
    Chapter 8.

16
Regular Expression
  • A language is regular, if its syntax can be
    expressed by a single EBNF expression.
  • The requirement that a single equation suffices
    also implies that only terminal symbols occur in
    the expression.
  • Such an expression is called a regular
    expression.

17
Syntax Analysis v.s. Regular Expression
  • The reason for our interest in regular languages
    lies in the fact that programs for the
    recognition of regular sentences are particularly
    simple and efficient. By "recognition" we mean
    the determination of the structure of the
    sentence, and thereby naturally the determination
    of whether the sentence is well formed, that is,
    it belongs to the language. Sentence recognition
    is called syntax analysis.

18
Regular Expression v.s. State Machine
  • For the recognition of regular sentences a finite
    automaton, also called a state machine, is
    necessary and sufficient. In each step the state
    machine reads the next symbol and changes state.
    The resulting state is solely determined by the
    previous state and the symbol read. If the
    resulting state is unique, the state machine is
    deterministic, otherwise nondeterministic. If the
    state machine is formulated as a program, the
    state is represented by the current point of
    program execution.

19
EBNF ? Program
  • The analyzing program can be derived directly
    from the defining syntax in EBNF. For each EBNF
    construct K there exists a translation rule which
    yields a program fragment Pr(K). The translation
    rules from EBNF to program text are shown below.
    Therein sym denotes a global variable always
    representing the symbol last read from the source
    text by a call to procedure next. Procedure error
    terminates program execution, signaling that the
    symbol sequence read so far does not belong to
    the language.

20
Analyzing program
21
EBNF with only 1 rule
22
First()
23
Precondition
24
Lexical Analysis for Identifier
25
Lexical Analysis for Integer
26
Scanner
  • The process of syntax analysis is based on a
    procedure to obtain the next symbol. This
    procedure in turn is based on the definition of
    symbols in terms of sequences of one or more
    characters. This latter procedure is called a
    scanner, and syntax analysis on this second,
    lower level, lexical analysis.

27
Lexical Analysis v.s. Syntax Analysis
28
A Scanner Example
  • As an example we show a scanner for a parser of
    EBNF. Its terminal symbols and their definition
    in terms of characters are

29
Procedure GetSym() (1)
30
Procedure GetSym() (2)
31
Procedure GetSym() (3)
32
Syntax Analysis Overview
  • Goal determine if the input token stream
    satisfies the syntax of the program
  • What do we need to do this?
  • An expressive way to describe the syntax
  • A mechanism that determines if the input token
    stream satisfies the syntax description
  • For lexical analysis
  • Regular expressions describe tokens
  • Finite automata mechanisms to generate tokens
    from input stream

33
Just Use Regular Expressions?
  • REs can expressively describe tokens
  • Easy to implement via DFAs
  • So just use them to describe the syntax of a
    programming language
  • NO! They dont have enough power to express any
    non-trivial syntax
  • Example Nested constructs (blocks, expressions,
    statements) Detect balanced braces







. . .
- We need unbounded counting! - FSAs cannot count
except in a strictly modulo fashion





34
Context-Free Grammars
  • Consist of 4 components
  • Terminal symbols token or ?
  • Non-terminal symbols syntactic variables
  • Start symbol S special non-terminal
  • Productions of the form LHS?RHS
  • LHS single non-terminal
  • RHS string of terminals and non-terminals
  • Specify how non-terminals may be expanded
  • Language generated by a grammar is the set of
    strings of terminals derived from the start
    symbol by repeatedly applying the productions
  • L(G) language generated by grammar G

S ? a S a S ? T T ? b T b T ? ?
35
CFG - Example
  • Grammar for balanced-parentheses language
  • S ? ( S ) S
  • S ? ?
  • 1 non-terminal S
  • 2 terminals ), )
  • Start symbol S
  • 2 productions
  • If grammar accepts a string, there is a
    derivation of that string using the productions
  • (())
  • S (S) ? ((S) S) ? ((?) ? ) ? (())

? Why is the final S required?
36
More on CFGs
  • Shorthand notation vertical bar for multiple
    productions
  • S ? a S a T
  • T ? b T b ?
  • CFGs powerful enough to expression the syntax in
    most programming languages
  • Derivation successive application of
    productions starting from S
  • Acceptance? Determine if there is a derivation
    for an input token stream

37
A Parser
Context free grammar, G
Parser
Yes, if s in L(G) No, otherwise
Token stream, s (from lexer)
Error messages
Syntax analyzers (parsers) CFG acceptors which
also output the corresponding derivation when the
token stream is accepted Various kinds LL(k),
LR(k), SLR, LALR
38
RE is a Subset of CFG
Can inductively build a grammar for each RE ? S
? ? a S ? a R1 R2 S ? S1 S2 R1 R2 S ? S1
S2 R1 S ? S1 S ? Where G1 grammar for
R1, with start symbol S1 G2 grammar for R2,
with start symbol S2
39
Grammar for Sum Expression
  • Grammar
  • S ? E S E
  • E ? number (S)
  • Expanded
  • S ? E S
  • S ? E
  • E ? number
  • E ? (S)

4 productions 2 non-terminals (S,E) 4 terminals
(, ), , number start symbol S
40
Constructing a Derivation
  • Start from S (the start symbol)
  • Use productions to derive a sequence of tokens
  • For arbitrary strings a, ß, ? and for a
    production A ? ß
  • A single step of the derivation is
  • a A ? a ß ? (substitute ß for A)
  • Example
  • S ? E S
  • (S E) E ? (E S E) E

41
Class Problem
  • S ? E S E
  • E ? number (S)
  • Derive (1 2 (3 4)) 5

42
Parse Tree
S
E

S
  • Parse tree tree representation of the
  • derivation
  • Leaves of the tree are terminals
  • Internal nodes are non-terminals
  • No information about the order of the
    derivation steps

( S )
E
5
E S
E S
1
2
E
( S )
E S
E
3
4
43
Parse Tree vs Abstract Syntax Tree
S
Parse tree also called concrete syntax
E

S
( S )
E

5
E S

5
E S
1
1

2
2
E

3
4
( S )
AST discards (abstracts) unneeded information
more compact format
E S
E
3
4
44
Derivation Order
  • Can choose to apply productions in any order,
    select non-terminal and substitute RHS of
    production
  • Two standard orders left and right-most
  • Leftmost derivation
  • In the string, find the leftmost non-terminal and
    apply a production to it
  • E S ? 1 S
  • Rightmost derivation
  • Same, but find rightmost non-terminal
  • E S ? E E S

45
Leftmost/Rightmost Derivation Examples
  • S ? E S E
  • E ? number (S)
  • Leftmost derive (1 2 (3 4)) 5

S ? E S ? (S)S ? (ES) S ? (1S)S ?
(1ES)S ? (12S)S ? (12E)S ? (12(S))S ?
(12(ES))S ? (12(3S))S ? (12(3E))S ?
(12(34))S ? (12(34))E ? (12(34))5
  • Now, rightmost derive the same input string

S ? ES ? EE ? E5 ? (S)5 ? (ES)5 ? (EES)5
? (EEE)5 ? (EE(S))5 ? (EE(ES))5
? (EE(EE))5 ? (EE(E4))5 ? (EE(34))5
? (E2(34))5 ? (12(34))5
Result Same parse tree same productions chosen,
but in diff order
46
Class Problem
  • S ? E S E
  • E ? number (S) -S
  • Do the rightmost derivation of 1 (2 -(3
    4)) 5

47
Ambiguous Grammars
  • In the sum expression grammar, leftmost and
    rightmost derivations produced identical parse
    trees
  • operator associates to the right in parse tree
    regardless of derivation order


(12(34))5

5
1

2

3
4
48
An Ambiguous Grammar
  • associates to the right because of the
    right-recursive production S ? E S
  • Consider another grammar
  • S ? S S S S number
  • Ambiguous grammar different derivations produce
    different parse trees
  • More specifically, G is ambiguous if there are 2
    distinct leftmost (rightmost) derivations for
    some sentence

49
Ambiguous Grammar - Example
S ? S S S S number
Consider the expression 1 2 3
Derivation 2 S ? SS ? SSS ? 1SS ? 12S
? 123
Derivation 1 S ? SS ? 1S ? 1SS ? 12S
? 123



3

1
2
3
1
2
Obviously not equal!
50
Impact of Ambiguity
  • Different parse trees correspond to different
    evaluations!
  • Thus, program meaning is not defined!!




3

1
2
3
1
2
9
7
51
Can We Get Rid of Ambiguity?
  • Ambiguity is a function of the grammar, not the
    language!
  • A context-free language L is inherently ambiguous
    if all grammars for L are ambiguous
  • Every deterministic CFL has an unambiguous
    grammar
  • So, no deterministic CFL is inherently ambiguous
  • No inherently ambiguous programming languages
    have been invented
  • To construct a useful parser, must devise an
    unambiguous grammar

52
Eliminating Ambiguity
  • Often can eliminate ambiguity by adding
    nonterminals and allowing recursion only on right
    or left
  • S ? S T T
  • T ? T num num
  • T non-terminal enforces precedence
  • Left-recursion left associativity

S
S T
T
T 3
1
2
53
A Closer Look at Eliminating Ambiguity
  • Precedence enforced by
  • Introduce distinct non-terminals for each
    precedence level
  • Operators for a given precedence level are
    specified as RHS for the production
  • Higher precedence operators are accessed by
    referencing the next-higher precedence
    non-terminal

54
Associativity
  • An operator is either left, right or non
    associative
  • Left a b c (a b) c
  • Right a b c a (b c)
  • Non a lt b lt c is illegal (thus undefined)
  • Position of the recursion relative to the
    operator dictates the associativity
  • Left (right) recursion ? left (right)
    associativity
  • Non Dont be recursive, simply reference next
    higher precedence non-terminal on both sides of
    operator

55
Class Problem (Tough)
S ? S S S S S S S / S (S) -S S
S number
Enforce the standard arithmetic precedence rules
and remove all ambiguity from the above grammar
Precedence (high to low) (), unary , / ,
- Associativity right rest are left
Write a Comment
User Comments (0)
About PowerShow.com