Syntax Analysis - PowerPoint PPT Presentation

1 / 85
About This Presentation
Title:

Syntax Analysis

Description:

Every construct that can be described by a regular expression can be described ... Yk is a production for some k 1, then place a in FIRST(X) of for some i, a is in ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 86
Provided by: 140191
Category:

less

Transcript and Presenter's Notes

Title: Syntax Analysis


1
Syntax Analysis
  • From Chapter 4, The Dragon Book, 2nd ed.

2
Content
  • 4.1 Introduction
  • 4.2 Context-Free Grammar
  • 4.3 Writing a Grammar
  • 4.4 Top-Down Parsing
  • 4.5 Bottom-Up Parsing
  • 4.6 Introduction to LR Parsing Simple LR
  • 4.7 More Powerful LR Parsers
  • 4.8 Using Ambiguous Grammars
  • 4.9 Parser Generators

3
4.1 Introduction
  • Examine the way the parser fits into a typical
    compiler.

4
4.1.1 The Role of the Parser
5
4.1.1 The Role of the Parser
  • Three general types of parser for grammars
  • universal,
  • E.g., Cocke-Younger-Kasami (CYK) algorithm,
    Earleys algorithm
  • Too inefficient to use in production compilers
  • top-down,
  • bottom-up
  • The most efficient top-down and bottom-up methods
    work only for sub-classes of grammars, but
    several of these classes, particularly, LL and LR
    grammars, are expressive enough to describe most
    of the syntactic construct in modern programming
    languages.
  • Parsers implemented by hand often use LL
    grammars for example, the predictive-parsing
    approach of Sec. 2.4.2 works for LL grammars.
  • Parsers for the larger class of LR grammars are
    usually constructed using automated tools.

6
4.1.2 Representative Grammar
  • Associativity and precedence are captured in
    grammar 4.1.
  • LR grammar, suitable for bottom-up parsing
  • Non-left-recursive variant of grammar (4.1)
  • Used for top-down parsing
  • Useful for illustrating techniques for handling
    ambiguities during parsing

E ? E E E E ( E ) id (4.3)
7
4.1.3 Syntax Error Handling
  • Common programming errors can occur as many
    different levels.
  • Lexical errors
  • misspelling of identifiers, keywords, or
    operators, and missing quotes around text
    intended as string
  • Syntactic errors
  • misplaced semicolons or extra or missing braces
  • In C or Java, the appearance of case statement
    without an enclosing switch
  • Semantic errors
  • type mismatches between operators and operands
  • A return statement in Java method with result
    type void
  • Logic errors
  • Anything from incorrect reasoning on the part of
    the programmer to the use in a C program of the
    assignment operator instead of comparison
    operator .

8
4.1.3 Syntax Error Handling
  • The error handler in a parser has goals that are
    simple to state but challenging to realize
  • Report the presence of errors clearly and
    accurately.
  • Recover from each error quickly enough to detect
    subsequent errors.
  • Add minimal overhead to the processing of correct
    programs.

9
4.1.4 Error-Recovery Strategies
  • Once an error is detected, how should the parser
    recover?
  • Although no strategy has proven itself
    universally acceptable, a few methods have broad
    applicability.
  • Panic-Mode Recovery
  • On discovering an error, the parser discards
    input symbols one at a time until one of a
    designated set of synchronizing token is found.
  • The synchronizing tokens are usually delimiters,
    such as semicolon or , whose role in the source
    program is clear and unambiguous.
  • Phrase-Level Recovery
  • On discovering an error, a parser may perform
    local correction on the remaining input that is,
    it may replace a prefix for the remaining input
    by some string that allows the parser to
    continue.
  • Error Productions
  • Global Corrections

10
4.2 Context-Free Grammars
  • Review the definition of a context-free grammar.
  • Introduce terminology for talking about parsing.
  • Derivation
  • Parse tree and derivations
  • Ambiguity

11
4.2.1 The Formal Definition of a Context-Free
Grammar
  • See Section 2.2
  • Example 4.5

12
4.2.2 Notational Conventions
  • pp. 198199 1 to 7
  • Example 4.6
  • Using these conventions, the grammar of Example
    4.5 can be rewritten as

13
4.2.3 Derivations
  • Section 2.2

E ? E E E E - E ( E ) id
(4.7) E ? - E ? -(EE) ? -(idE) ? -(idid)
(4.8) E ? - E ? -(EE) ? -(Eid) ? -(idid)
(4.9)
  • (4.8) is leftmost derivation, (4.9) is rightmost
    derivation

14
4.2.4 Parse Trees and Derivations
15
4.2.5 Ambiguity
  • Section 2.2.4
  • Example 4.11
  • Two distinct leftmost derivation for the sentence
    id id id

16
4.2.6 Verifying the Language Generated by a
Compiler
  • A proof that a grammar G generates a language L
    has two parts
  • show that every string generated by G is in L,
    and conversely that
  • every string in L can indeed be generated by G.
  • Example 4.12
  • Consider the grammar S ? (S) S e, which
    generates all strings of balanced parentheses,
    and only such strings.
  • We shall show first that every sentence derivable
    from S is balanced, and then that every balanced
    string is derivable from S.
  • Show by induction.

17
4.2.7 Context-Free Grammars vs. Regular
Expressions
  • Grammars are a more powerful notation than
    regular expressions.
  • Every construct that can be described by a
    regular expression can be described by a grammar,
    but not vice-versa.
  • For example, the regular expression (ab)abb and
    the grammar
  • describe the same language, the set of strings
    of as and bs ending in abb.
  • The language Lanbn n 1 is a prototypical
    example of a language that can be described by a
    grammar but not by a regular expression.
  • finite automata cannot count

18
4.3 Writing a Grammar
  • Grammars are capable of describing most, but not
    all, of the syntax of programming languages.
  • E.g., the requirement that identifiers be
    declared before that can be used, cannot be
    described by a context-free grammar
  • This section contains
  • how to divide work between a lexical analyzer and
    a parser,
  • transformations that could be applied to get a
    grammar more suitable for parsing
  • ambiguity elimination, left-recursion elimination
    and left factoring
  • PL constructs that cannot be described by any
    grammar

19
4.3.1 Lexical vs. Syntactic Analysis
  • Reasons of why use regular expressions to define
    the lexical syntax of a language.
  • Separating the syntactic structure of a language
    into lexical and non-lexical parts provides a
    convenient way of modularizing the front end of a
    compiler into two manageable-size components.
  • The lexical rules of a language are frequently
    quite simple, and to describe them we do not need
    a notation as powerful as grammars.
  • Regular expressions generally provide a more
    concise and easier-to-understand notation for
    tokens than grammars.
  • More efficient lexical analyzers can be
    constructed automatically from regular
    expressions than from arbitrary grammars.

20
4.3.2 Eliminating Ambiguity
  • Dangling-else grammar

21
4.3.2 Eliminating Ambiguity
22
4.3.2 Eliminating Ambiguity
  • Example 4.16

23
4.3.2 Elimination of Left Recursion
24
4.3.2 Elimination of Left Recursion
25
4.3.2 Elimination of Left Recursion
  • Example 4.20

S ? Aa b A? A c S d e
A? A c A a d b d e
S? A a b A ? b d A A A ? cA a d A e
26
4.3.4 Left Factoring
stmt ? if expr then stmt else stmt
if expr then stsmt A ? ??1 ??2 A ? ?A A
? ?1 ?2
27
4.4 Top-Down Parsing
  • Top-down parsing can be viewed as the problem of
    constructing a parse tree for the input string,
    starting from the root and creating the nodes of
    the parse tree in preorder.
  • Example 4.27

E ? T E E ? T E e T ? F T T ? F
T e F ? ( E ) id
28
4.4.1 Recursive-Descent Parsing
29
4.4.1 Recursive-Descent Parsing
  • Example 4.29
  • Grammar
  • S ? c A d
  • A ? a b a
  • Input w cad

30
4.4.2 FIRST and FOLLOW
  • During top-down parsing, FIRST and FOLLOW allow
    us to choose which production to apply, based on
    the input symbol.
  • Define FIRST(?), where ? is any string of grammar
    symbols
  • to be the set of terminals that begin strings
    derived from ?.
  • If ? ?ethen eis also in FIRST(?).
  • How FIRST can be used during predictive parsing
  • Consider A ? ? ?, where FIRST(?) and FIRST(?)
    are disjoint sets.
  • Next input symbol a. If a is in FIRST (?) then
    choose the production A ? ?


31
4.4.2 FIRST and FOLLOW
  • Define FOLLOW(A), for nonterminal A,
  • to be the set of terminals a that can appear
    immediately to the right of A in some sentential
    form that is
  • the set of terminals a such that there exists a
    derivation of the form S ??Aa? for some ?, ? in
    Fig. 4.14.


32
4.4.2 FIRST and FOLLOW
  • Rules for computing FIRST(X) for all grammar
    symbols X
  • If X is a terminal, then FIRST(X) X.
  • If X is a nonterminal and X? Y1Y2 ... Yk is a
    production for some k 1, then place a in
    FIRST(X) of for some i, a is in FIRST(Yi), and
    eis in all of FIRST(Y1), ... FIRST(Yi-1) that
    is, Y1Y2 ... Yi-1 ? e.
  • If eis in FIRST(Yj) for all j1, 2, ..., k, then
    add eto FIRST(X).
  • For example, everything in FIRST(Y1) is surely
    in FIRST(X). If Y1 does not derive e, then we add
    nothing more to FIRST(X), but if Y1 ? e, then we
    add FIRST(Y2), and so on.
  • If X ? eis a production, then add eto FIRST(X).
  • Compute FIRST for any string X1X2 ... Xn
  • Add all non-esymbols of FIRST(X1)
  • Add all non-esymbols of FIRST(X2), if eis in
    FIRST(X1)
  • Add all non-esymbols of FIRST(X3), if eis in
    FIRST(X2)
  • and so on



33
4.4.2 FIRST and FOLLOW
  • Compute FOLLOW(A) for all nonterminals A
  • Place in FOLLOW(S), where S is the start
    symbol, and is the input right delimiter.
  • If there is a production A ? ?B?, then everything
    in FIRST(?) except eis in FIRST(?).
  • If there is a production A ? ?B, or a production
    A ? ?B?, where FIRST(?) containse, then
    everything in FOLLOW(A) is in FOLLOW(B).

34
4.4.2 FIRST and FOLLOW
  • Example 4.30
  • FIRST(F)FIRST(T)FIRST(E)(, id
  • FIRST(E),e
  • FIRST(T),e
  • FOLLOW(E)FOLLOW(E)),
  • FOLLOW(T)FOLLOW(T), ),
  • FOLLOW(F), , ),

E ? T E E ? T E e T ? F T T ? F
T e F ? ( E ) id
35
4.4.3 LL(1) Grammars
  • Predictive parser, that is, recursive-descent
    parsers needing no backtracking, can be
    constructed for a class of grammar called LL(1).
  • Scanning from left to right, producing a leftmost
    derivation, and using one input symbol of look
    ahead at each step to make parsing action
    decision
  • The class of LL(1) grammars is rich enough to
    cover most programming constructs, although care
    is needed in writing a suitable grammar for the
    source language. For example, no left-recursive
    or ambiguous grammar can be LL(1).

36
4.4.3 LL(1) Grammars
  • A grammar G is LL(1) if and only if whenever A ?
    ? ? are two distinct productions of G, the
    following conditions hold
  • For no terminal a do both ? and ? derive strings
    beginning with a.
  • At most one of ? and ? can derive the empty
    string.
  • If ? ?e, then ? does not derive any string
    beginning with a terminal in FOLLOW(A). Likewise,
    if ? ?e, then ? does not derive any string
    beginning with a terminal in FOLLOW(A).
  • Conditions 1 and 2
  • FIRST(?) and FIRST(?) are disjoint sets.
  • Condition 3
  • if eis in FIRST(?), then FIRST(?) and FOLLOW(A)
    are disjoint sets, and likewise if eis in
    FIRST(?) .



37
4.4.3 LL(1) Grammars
  • Predictive parsers can be constructed for LL(1)
    grammars since
  • the proper production to apply for a nonterminal
    can be selected by looking only at the current
    input symbol.
  • stmt ? if ( expr ) stmt else stmt
  • while ( expr ) stmt
  • stmt_list

38
4.4.3 LL(1) Grammars
  • Algorithm 4.31 Construction of a predictive
    parsing table.
  • Input Grammar G.
  • Output Parsing table M.
  • Method For each production A ? ? of the
    grammar, do the following
  • For each terminal a in FIRST(A), add A ? ? to
    MA, a.
  • If eis in FIRST(?), then for each terminal b in
    FOLLOW(A), add A ? ? to MA, b. If eis in
    FIRST(?), and is in FOLLOW(A), add A ? ? to
    MA, as well.

39
4.4.3 LL(1) Grammars
  • Example 4.30
  • FIRST(F)FIRST(T)FIRST(E)(, id
  • FIRST(E),e
  • FIRST(T),e
  • FOLLOW(E)FOLLOW(E)),
  • FOLLOW(T)FOLLOW(T), ),
  • FOLLOW(F), , ),

E ? T E E ? T E e T ? F T T ? F
T e F ? ( E ) id
40
4.4.3 LL(1) Grammars
  • For every LL(1) grammar, each parsing-table entry
    uniquely identifies a production or signals an
    error.
  • Example 4.33 Dangling-else problem
  • S ? iEtSS a
  • S ? eS e
  • E ? b

41
4.4.4 Nonrecursive Predictive Parsing
  • A nonrecursive predictive parser can be built by
    manipulating a stack explicitly, rather than
    implicitly via recursive call.

42
4.4.4 Nonrecursive Predictive Parsing
43
4.4.4 Nonrecursive Predictive Parsing
44
4.4.5 Error Recovery in Predictive Parsing
  • Panic Mode
  • Phrase-level Recovery

45
4.5 Bottom-Up Parsing
  • Introduce a general style of bottom-up parsing
    shift-reduce parsing.
  • In Sections 4.6 and 7, introduce the LR grammars,
    the largest class of grammars for which
    shift-reduce parsers can be built.

46
4.5.1 Reductions
  • Think of bottom-up parsing as the process of
    reducing a string w to the start symbol of the
    grammar.
  • At each reduction step, a specific substring
    match the body of a production is replaced by the
    nonterminal at the head of that production.
  • The key decisions during bottom-up parsing as the
    parse proceeds are about
  • when to reduce and about
  • what production to apply.
  • Example 4.37
  • A reduction is the reverse of a step in a
    derivation.
  • The following derivation corresponds to the parse
    in Fig. 4.25.
  • E ? T ? T F ? T id ? F id ? id id

47
4.5.2 Handle Pruning
  • Bottom-up parsing during left-to-right scan of
    the input constructs a rightmost derivation in
    reverse.
  • Informally, a handle is a substring that
    matches the body of a production, and whose
    reduction represents one step along the reverse
    of a rightmost derivation.

48
4.5.2 Handle Pruning
  • Formally, if S ? ?A? ? ???, then production A?
    ? in the position following ? in a handle of ???.

rm
rm
  • Alternatively, a handle of a right-sentential
    form ? is a production A? ? and a position in ?
    where ? may be found, such that replacing ? at
    that position by A produces the pervious
    right-sentential form in rightmost derivation of
    ?.
  • A rightmost derivation in reverse can be obtained
    by handle pruning.

49
4.5.3 Shift-Reduce Parsing
  • Four possible actions a shift-reduce parser can
    make
  • Shift, reduce, Accept, Error
  • The use of a stack in shift-reduce parsing is
    justified by an important fact
  • The handle will always eventually appear on top
    of the stack, never inside.

50
4.5.3 Conflicts During Shift-Reduce Parsing
  • There are context-free grammars for which
    shift-reduce parsing cannot be used.
  • Shift/reduce conflict
  • Reduce-reduce conflict
  • Example 4.38
  • The dangling-else grammar
  • stmt? if expr then stmt if expr then stmt
    else stmt other
  • Shift-reduce conflict
  • STACK
    INPUT
  • if expr then stmt
    else

51
4.5.3 Conflicts During Shift-Reduce Parsing
  • Example 4.39
  • For the statement p(i,j), appeared as the token
    stream id(id, id), after shifting the first three
    tokens onto the stack, a shift-reduce parser
    would be in configuration
  • STACK
    INPUT
  • id ( id
    , id )
  • A reduce-reduce conflict occurs.

52
4.6 Introduction to LR Parsing Simple LR
  • The most prevalent type of bottom-up parser today
    is based on a concept called LR(k) parsing
  • Left-to-right scanning of the input,
  • Rightmost derivation in reverse,
  • k input symbols of lookahead used in making
    parsing decision.

53
4.6.1 Why LR Parsers?
  • A grammar for which we can construct a parsing
    table using one of the methods in Sections 4.6
    and 7 is said to be an LR grammar.
  • For a grammar to be LR, it is sufficient that a
    left-to-right shift-reduce parser be able to
    recognize handles of right-sentential forms when
    they appear on top of the stack.

54
4.6.1 Why LR Parsers?
  • LR parsing is attractive for a variety of
    reasons
  • LR parsers can be constructed for recognize
    virtually all programming language constructs for
    which CFGs can be written.
  • The LR-parsing method is the most general
    nonbacktracking shift-reduce parsing method
    known, yet it can be implemented as efficiently
    as others, more primitive shift-reduce methods.
  • An LR parser can detect a syntactic error as soon
    as it is possible to do so on a left-to-right
    scan of input.
  • The class of grammars that can be parsed using LR
    method is a proper superset of the class of
    grammars that can be parsed with predictive or LL
    method.
  • The principal drawback of the LR method is that
  • it is too much work to construct an LR parser by
    hand for a typical programming-language grammar.
  • LR parser generator is needed
  • Yacc, Section 4.9

55
4.6.2 Items and the LR(0) Automaton
  • How does a shift-reduce parser know when to shift
    and when to reduce?
  • A LR parser makes shift-reduce decisions by
    maintaining states to keep track of where we are
    in a parse.
  • States represent set of items.
  • An LR(0) item (item for short) of a grammar G is
    a production of G with a dot at some position of
    the body.
  • For example production A?XYZ yields the four
    items
  • A??XYZ
  • A?X?YZ
  • A?XY?Z
  • A?XYZ?
  • The production A?? generates only one item A? ?.

56
4.6.2 Items and the LR(0) Automaton
  • An item indicates how much of a production we
    have seen at a given point in the parsing
    process.
  • For example A??XYZ, A?X?YZ, A?XYZ?
  • One collection of sets of LR(0) items, called the
    canonical LR(0) collection, provides the basis
    for constructing a deterministic finite automaton
    that is used to make parsing decision.
  • Such an automaton is called an LR(0) automaton.

57
Items in the shaded parts are nonkernal items
others are kernel items.
58
4.6.2 Items and the LR(0) Automaton
  • To construct the canonical LR(0) collection for a
    grammar, we define an augmented grammar and two
    functions, CLOSURE and GOTO.
  • Augmented grammar
  • If G is a grammar with start symbol S, then G,
    the augmented grammar for G, is G with a new
    start symbol S and production S? S
  • This new starting production is to indicate to
    the parser when it should stop parsing and
    announce acceptance of the input.

59
4.6.2 Items and the LR(0) Automaton
  • Closure of Item Sets
  • If I is a set of items for a grammar G, the
    CLOSURE(I) is the set of items constructed from I
    by the two rules
  • Initially, add every item in I to CLOSURE(I).
  • If A ? ??B? is in CLOSURE(I) and ??? is a
    production, then add the item ???? to CLOSURE(I),
    if it is not already there. Apply this rule until
    no more new items can be added to CLOSURE(I).
  • Example 4.40 If I is the set of one item
    E??E, then CLOSURE(I) contains the set of
    item I0 in Fig. 4.31.

60
4.6.2 Items and the LR(0) Automaton
  • The Function GOTO
  • GOTO(I, X) is defined to be the closure of the
    set of all items A??X?? such that A???X? is
    in I.
  • Example 4.41 If I has two items E ? E?, E ?
    E?T, then GOTO(I, ) contains the items
  • E ? E ?T
  • T? ?T F
  • T?? F
  • F? ? (E)
  • F ? ? id

61
4.6.2 Items and the LR(0) Automaton
  • Use of the LR(0) Automaton
  • Then central idea behind Simple LR, or SLR,
    parsing is the construction from the grammar of
    the LR(0) automaton.

62
4.6.3 The LR-Parsing Algorithm
63
4.6.3 The LR-Parsing Algorithm
  • Structure of the LR Parsing Table
  • The parsing table consists of two parts
  • The ACTION function takes as arguments a state I
    and a terminal a (or the input endmarker). The
    value of ACTIONi, a can have one of four forms
  • Shift j, where j is a state.
  • Reduce A ? ?.
  • Accept.
  • Error.
  • We extend GOTO function, defined on set of items,
    to states if GOTOIi, AIj, the GOTO also maps
    a state i and a nonterminal A to state j.

64
4.6.3 The LR-Parsing Algorithm
  • LR-Parser Configurations
  • A configuration of an LR parser is a pair
  • (s0s1sm, aiai1an)
  • where the first component is the stack contents,
    and the second component is the remaining input.

65
4.6.3 The LR-Parsing Algorithm
  • Algorithm 4.44 LR-parsing algorithm
  • INPUT An input string w and an LR-parsing table
    with functions ACTION and GOTO for a grammar G.
  • OUTPUT If w is in L(G), the reduction steps of a
    bottom-up parser for w otherwise, an error
    indication.
  • METHOD Initially, the parser has s0 on its
    stack, where s0 is the initial state, and w in
    the input buffer. The parser then executes the
    following program.

66
4.6.3 The LR-Parsing Algorithm
  • Example 4.45

67
4.6.4 Constructing the SLR-Parsing Tables
  • Algorithm 4.46 Constructing an SLR-parsing
    table.
  • INPUT An augmented grammar G.
  • OUTPUT The SLR-parsing table functions ACTION
    and GOTO for G.
  • METHOD
  • Construct CI0, I1, , In, the collection of
    sets of LR(0) items for G.
  • State I is constructed from Ii. The parsing
    actions for state I are determined as follows
  • If A? ? ? a ? is in Ii and GOTO(Ii, a) Ii,
    then set ACTIONi, a to shift j. Here a must
    be a terminal.
  • If A? ?? is in Ii and ACTIONi, a to reduce
    A? ? for all a in FOLLOW(A) here A may not be
    S.
  • If S ? S is in Ii, then set ACTIONI, to
    accept.
  • If any conflicting actions result from the above
    rules, we say the grammar is not SLR(1). The
    algorithm fails to produce a parser in this case.
  • The GOTO transition for state I are constructed
    for all nonterminals A using the rule IF
    GOTO(Ii, A) Ij, then GOTOi, Aj.
  • All entries not defined by rules (2) and (3) are
    made error.
  • The initial state of the parser is the one
    constructed from the set of items containing S
    ? ?S.

68
4.6.4 Constructing the SLR-Parsing Tables
  • Example 4.47

69
4.6.4 Constructing the SLR-Parsing Tables
  • Example 4.48
  • Every SLR(1) grammar is unambiguous, but there
    are many unambiguous grammars that are not
    SLR(1).
  • S ? L R R
  • L ? R id
  • R ? L

Shift-reduce conflict
70
4.6.5 Viable Prefixes
  • The prefixes of right sentential forms that can
    appear on the stack of a shift-reduce parser are
    called viable prefixes.
  • They are defined as follows
  • A viable prefix is a prefix of a right sentential
    form that does not continue past the right end of
    the rightmost handle of that sentential form.

71
4.6.5 Viable Prefixes
  • SLR parsing is based on the fact that LR(0)
    automaton recognize viable prefixes.
  • We say item A??1??2 is valid for a viable
    prefixes ??1if there is a derivation S? ?A? ?
    ??1?2 ?.
  • The fact that A??1??2 is valid for ??1tells us a
    lot about whether to shift or reduce when we find
    ??1on the parsing stack.
  • If ?2 ? ?, then it suggests that we have not yet
    shift the handle into the stack, so shift is our
    move.
  • If ?2 ?, then it looks as if A??1 is the
    handle, and we should reduce by this production.
  • Two valid items may tell us to do different
    things for the same prefix.
  • Some of these conflict can be resolved by looking
    at the next input symbol, and others can be
    resolved by the methods of Sec. 4.8.
  • But we should not suppose that all parsing action
    conflicts can be resolved if the LR method is
    applied to an arbitrary grammar.


rm
rm
72
4.6.5 Viable Prefixes
  • Compute the set of valid items for each viable
    prefix that can appear on the stack of an LR
    parser.
  • The set of valid items for a viable prefix ? is
    exactly the set of items reached from the initial
    state along the path labeled by ? in the LR(0)
    automaton for the grammar.
  • Example 4.50 (Need the automaton of Fig. 4.31.)
  • The items valid for the viable prefix ET? are in
    state 7.

73
4.7 More Powerful LR Parsers
  • Extend the pervious LR parsing techniques to use
    one symbol of lookahead on the input.
  • The canonical-LR or just LR method, making
    full use of the lookahead symbol(s). This method
    uses a large set of items, called the LR(1)
    items.
  • The lookahead-LR or LALR method, which is
    based on the LR(0) sets of items, and has many
    fewer states than typical parsers based on the
    LR(1) items.

74
4.7.1 Canonical LR(1) Items
  • LR(1) item A ? ???, a is valid for a viable
    prefix ? if there is a derivation S ? ??? ? ????,
    where
  • ? ??, and
  • Either a is the first symbol of ?, or ? is ? and
    a is .


rm
rm
75
4.7.2 Constructing LR(1) Sets of Items
76
4.7.2 Constructing LR(1) Sets of Items
S ? S S ? C C C ? c C d
77
4.7.3 Canonical LR(1) Parsing Tables
78
4.7.4 Constructing LALR Parsing Tables
  • Example 4.60

79
4.8 Using Ambiguous Grammars
80
4.9 Parser Generators
  • We shall use the LALR parser generator Yacc as
    the basis of our discussion.
  • The first version of Yacc was created by S. C.
    Johnson.
  • Yacc is available as a command on the UNIX system.

81
4.9.1 The Parser Generator Yacc
  • A Yacc source program has three parts
  • declarations
  • translation rules
  • supporting functions

82
4.9.1 The Parser Generator Yacc
83
4.9.2 Using Yacc with Ambiguous Grammars
84
4.9.3 Creating Yacc Lexical Analyzer with Lex
  • Replace the routine yylex() in the third part of
    the Yacc specification by statement include
    lex.yy.c.

85
4.9.4 Error Recovery in Yacc
Write a Comment
User Comments (0)
About PowerShow.com