Syntax and Grammar - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Syntax and Grammar

Description:

Rule-based formalism to specify a language syntax. Context-Free Grammar (CFG) A kind of grammar. Not as complex as context-sensitive and phase-structure grammar ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 54
Provided by: quanth
Category:

less

Transcript and Presenter's Notes

Title: Syntax and Grammar


1
Syntax and Grammar
  • Syntax (programming language sense)
  • Define structure of a program
  • Not reflect the meaning (semantic) of the program
  • Grammar
  • Rule-based formalism to specify a language syntax

2
Context-Free Grammar (CFG)
  • A kind of grammar
  • Not as complex as context-sensitive and
    phase-structure grammar
  • More powerful than regular grammar

3
Formal Definition of CFG
  • G (VN ,VT,S, P)
  • VN finite set of nonterminal symbols
  • VT finite set of tokens (VT?VN?)
  • S?VN start symbol
  • P finite set of rules (or productions) of BNF
    (Backus Naur Form) form A? (a) where A ? VN,
    a?(VT?VN)

4
Example 1
  • G (exp,op,exp,,-,,/,id)
  • exp ? exp op exp
  • exp ? id
  • op ? -/

5
Derivation
  • ? uXv derives ? u?v if X-gt ? is a
    production
  • Notation ? ? ? (directly derive)
  • ? ? ? (? ? ... ? ? ? ?)
  • ? ? ?
  • Derivations S ? ? where ? consists of tokens
    only.
  • Sentential form S ? ? ? ? is a sentential form
  • Sentence S ? ? is a derivation ? ? is a
    sentence
  • Language set of all sentences possibly derived

6
Example 1
  • exp ? exp op exp ? exp op id ? id op id ? id id
  • exp ? exp op exp ? id op exp ? id exp ? id id
  • exp ? exp op exp ? exp op exp op exp ? id op exp
    op exp ? id exp op exp ? id exp exp ? id
    id exp ? id id id

7
Example 3
  • exp ? exp op exp ? id op exp ? id exp ? id id
  • exp ? exp op exp ? exp op id ? exp id ? id id

8
Example 4
  • exp

exp
9
Example 4
  • exp ? exp op exp

exp
exp
op
exp
10
Example 4
  • exp ? exp op exp ? id op exp

exp
exp
op
exp
id
11
Example 4
  • exp ? exp op exp ? id op exp ? id exp

exp
exp
op
exp
id

12
Example 4
  • exp ? exp op exp ? id op exp ? id exp ? id id

exp
exp
op
exp
id

id
13
Classic Expression Grammar
  • exp ? exp term exp term term
  • term ? term factor term /factor factor
  • factor ? ( exp ) ID INT
  • why is this classic expression grammar better
    than the previously used one?

14
Operator Precedence
exp
exp
op
exp
id

exp
op
exp
id

id
15
Operator Precedence
exp
exp
op
exp

id
exp
op
exp
id

id
16
Operator Precedence
exp
exp
term

term
factor

term
id
factor
factor
id
id
17
Operator Precedence
  • (idid) id

exp
term
term
factor

id
factor
(
)
exp
18
Operator Associativity
exp
exp
op
exp

id
exp
op
exp
id

id
19
Operator Associativity
exp
exp
op
exp
id

exp
op
exp
id

id
20
Operator Associativity
exp
exp
term

factor

exp
term
id
factor
term
id
factor
id
21
Precedence and Associativity
  • When properly written, a grammar can enforce
    operator precedence and associativity as desired

22
Hands-on Excersice
  • Rewrite the grammar to fulfill the following
    requirements
  • operator takes lower precedence than
  • operator - is right-associativity

23
The Big Picture Again
source code
Scanner
Parser
Opt1
Opt2
Optn
. . .
machine code
Instruction Selection
Register Allocation
Instruction Scheduling
COMPILER
24
Syntactic Analysis
  • Lexical Analysis was about ensuring that we
    extract a set of valid words (i.e.,
    tokens/lexemes) from the source code
  • But nothing says that the words make a coherent
    sentence (i.e., program)

25
Syntactic Analysis
  • Example
  • for while i 12 for ( abcd)
  • Lexer will produce a stream of tokens
    ltTOKEN_FORgt ltTOKEN_WHILEgt ltTOKEN_IDENT, igt
    ltTOKEN_COMPAREgt ltTOKEN_COMPAREgt ltTOKEN_COMPAREgt
    ltTOKEN_NUMBER,12gt ltTOKEN_OP, gt ltTOKEN_FORgt
    ltTOKEN_OPARENgt ltTOKEN_ID, abcdgt ltTOKEN_CPARENgt
  • But clearly we do not have a valid program
  • This program is lexically correct, but
    syntactically incorrect

26
A Grammar for Expressions
  • Expr ? Expr Op Expr
  • Expr ? Number Identifier
  • Identifier ? Letter Letter Identifier
  • Letter ? a-z
  • Op ? - /
  • Number ? Digit Number Digit
  • Digit ? 0 1 2 3 4 5 6 7 8 9

27
Derivation for grammar
  • Expr ? Expr Op Expr ? Number Op Expr ? Digit
    Number Op Expr ? 3 Number Op Expr ? 34 Op Expr ?
    34 Expr ? 34 Identifier ? 34 Letter
    Identifier ? 34 a Identifier ? 34 a Letter ?
    34 ax

28
What is Parsing?
  • What we just saw is the process of, starting with
    the start symbol and, through a sequence of rule
    derivation obtain a string of terminal symbols
  • We could generate all correct programs (infinite
    set though)
  • Parsing the other way around
  • Give a string of non-terminals, the process of
    discovering a sequence of rule derivations that
    produce this particular string

29
What is parsing
  • When we say we cant parse a string, we mean that
    we cant find any legal way in which the string
    can be obtained from the start symbol through
    derivations
  • What we want to build is a parser a program that
    takes in a string of tokens (terminal symbols)
    and discovers a derivation sequence, thus
    validating that the input is a syntactically
    correct program

30
Derivations as Trees
  • A convenient and natural way to represent a
    sequence of derivations is a syntactic tree or
    parse tree
  • Example Expr ? Expr Op Expr ? Number Op Expr ?
    Digit Number Op Expr ? 3 Number Op Expr ? 34 Op
    Expr ? 34 Expr ? 34 Identifier ? 34 Letter
    Identifier ? 34 a Identifier ? 34 a Letter ?
    34 ax

Expr
Expr
Expr
Op
Identifier
Number

Letter
Identifier
Digit
Number
Letter
3
Digit
a
x
4
31
Derivations as Trees
  • Internally, in the parser, derivations are
    implemented as trees
  • Often, we draw trees without the full derivations
  • Example

Expr
Expr
Expr
Op
Identifier
Number

ax
34
32
Ambiguity
  • We call a grammar ambiguous if a string of
    terminal symbols can be reached by two different
    derivation sequences
  • In other terms, a string can have more than one
    parse tree
  • It turns out that our expression grammar is
    ambiguous!
  • Lets show that string 358 has two parse trees

33
Ambiguity
34
Problems with Ambiguity
  • The problem is that the syntax impacts meaning
    (for the later stages of the compiler)
  • For our example string, wed like to see the left
    tree because we most likely want to have a
    higher precedence than
  • We dont like ambiguity because it makes the
    parsers difficult to design because we dont know
    which parse tree will be discovered when there
    are multiple possibilities
  • So we often want to disambiguate grammars

35
Problems with Ambiguity
  • It turns out that it is possible to modify
    grammars to make them non-ambiguous
  • by adding non-terminals
  • by adding/rewriting production rules
  • In the case of our expression grammar, we can
    rewrite the grammar to remove ambiguity and to
    ensure that parse trees match our notion of
    operator precedence
  • We get two benefits for the price of one
  • Would work for many operators and many precedence
    relations

36
Non-Ambiguous Grammar
  • Expr ? Term Expr Term Expr - Term
  • Term ? Term Factor
  • Term / Factor
  • Factor
  • Factor ? Number Identifier
  • Example 453-89

Expr
Expr
Term
-
Expr
Term

Factor
Term
Term
Factor
Factor

Factor
Number
Number
Term
Factor

Number
Factor
Number
3
9
Number
8
5
4
37
Non-Ambiguous Grammar
  • Expr ? Term Expr Term Expr - Term
  • Term ? Term Factor
  • Term / Factor
  • Factor
  • Factor ? Number Identifier
  • Example 453-89

Expr
Expr
Term
-
Expr
Term

Factor
Term
Term
Factor
Factor

Factor
Number
Number
Term
Factor

Number
Factor
Number
3
9
Number
8
5
4
38
In-class Exercise
  • Consider the CFG
  • S ? ( L ) a
  • L ? L , S S
  • Draw parse trees for
  • (a, a)
  • (a, ((a, a), (a, a)))

39
In-class Exercise
  • Consider the CFG
  • S ? ( L ) a
  • L ? L , S S
  • Draw parse trees for
  • (a, a)
  • (a, ((a, a), (a, a)))

S
(
L
)
S
L
,
a
S
a
40
In-class Exercise
S
(
L
)
  • Consider the CFG
  • S ? ( L ) a
  • L ? L , S S
  • Draw parse trees for
  • (a, a)
  • (a, ((a, a), (a, a)))

S
L
,
L
S
)
(
a
S
L
,
L
)
(
S
S
L
,
(
L
)
S
a
S
L
,
a
S
a
a
41
In-class Exercise
  • Write a CFG grammar for the language of
    well-formed parenthesized expressions
  • (), (()), ()(), (()()), etc. OK
  • ()), )(, ((()), (((, etc. not OK

42
In-class Exercise
  • Write a CFG grammar for the language of
    well-formed parenthesized expressions
  • (), (()), ()(), (()()), etc. OK
  • ()), )(, ((()), (((, etc. not OK
  • P ? () PP (P)

43
In-class Exercise
  • Is the following grammar ambiguous?
  • A ? A and A not A 0 1

44
In-class Exercise
  • Is the following grammar ambiguous?
  • A ? A and A not A 0 1

A
A
not
A
A
A
and
A
1
A
not
A
and
0
1
0
45
Another Example Grammar
  • ForStatement ? for ( StmtCommaList
    ExprCommaList StmtCommaList )
    StmtSemicList
  • StmtCommaList ? ? Stmt Stmt ,
    StmtCommaList
  • ExprCommaList ? ? Expr Expr ,
    ExprCommaList
  • StmtSemicList ? ? Stmt Stmt
    StmtSemicList
  • Expr ? . . .
  • Stmt ? . . .

46
Full Language Grammar Sketch
  • Program ? VarDeclList FuncDeclList
  • VarDeclList ? ? VarDecl VarDecl VarDeclList
  • VarDecl ? Type IdentCommaList
  • IdentCommaList ? Ident Ident , IdentCommaList
  • Type ? int char float
  • FuncDeclList ? ? FuncDecl FuncDecl
    FuncDeclList
  • FuncDecl ? Type Ident ( ArgList )
    VarDeclList StmtList
  • StmtList ? ? Stmt Stmt StmtList
  • Stmt ? Ident Expr ForStatement ...
  • Expr ? ...
  • Ident ? ...

47
Real-world CFGs
  • Some sample grammars found on the Web
  • LISP 7 rules
  • PROLOG 19 rules
  • Java 30 rules
  • C 60 rules
  • Ada 280 rules

48
So What Now?
  • We want to write a compiler for a given language
  • We come up with a definition of the tokens
    embodied in regular expressions
  • We build a lexer (see previous lecture)
  • We come up with a definition of the syntax
    embodied in a context-free grammar
  • not ambiguous
  • enforces relevant operator precedences and
    associativity
  • Question How do we build a parser?

49
How do we build a Parser?
  • This question could keep us busy for 1/2 semester
    in a full-fledge compiler course
  • So were just going to see a very high-level view
    of parsing
  • If you go to graduate school youll most likely
    have an in-depth compiler course with all the
    details

50
How do we build a Parser?
  • There are two approaches for parsing
  • Top-Down Start with the start symbol and try to
    expand it using derivation rules until you get
    the input source code
  • Bottom-Up Start with the input source code,
    consume symbols, and infer which rules could be
    used
  • Note this does not work for all CFGs
  • CFGs much have some properties to be parsable
    with our beloved parsing algorithms

51
Writing Parsers?
  • Nowadays one doesnt really write parsers from
    scratch, but one uses a parser generator (Yacc is
    a famous one)

token stream
parse tree
Parser
compile time
compiler design time
grammar specification
Parser Generator
52
Sample (simplified) YACC Input
  • token DIGIT / Definition of token names /
  • line expr \n
  • expr expr term
  • term
  • term term factor
  • factor
  • factor ( expr )
  • DIGIT

53
So What Now?
  • The parser accepts syntactically correct programs
    and produces a full parse tree
  • Unfortunately, being syntactically correct is a
    necessary condition for the program to be correct
    (i.e., compilable), but is not sufficient
Write a Comment
User Comments (0)
About PowerShow.com