CSCI 435 Compiler Design - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

CSCI 435 Compiler Design

Description:

A a.Ab{b$} A .aAb{b} A .B{b} B .x{b} A aA.b{b$} b. S8,12. S6,11. A. S3,10. a ... 3) same precedence shift (see exercise 2.55) 5.1.16. CSCI 435 Compiler Design ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 23
Provided by: OwenAst9
Category:
Tags: csci | compiler | design

less

Transcript and Presenter's Notes

Title: CSCI 435 Compiler Design


1
CSCI 435 Compiler Design
  • Week 5 Class 1
  • Section 2.2.5.5 to2.3 skipping2.2.5.8
  • (165-185)
  • Ray Schneider

2
Topics of the Day
  • Finish off Chapter 2
  • LR(1)
  • LALR(1) parsing
  • Yacc/Bison
  • Conclusion and Summary

3
LR(1) Parsing
  • Conflict set resolution by FOLLOW set doesn't
    work as well as we might wish since it replaces
    the look-ahead of a single item of N in an LR
    state by the FOLLOW set of N, i.e. the Union of
    all the look-aheads of all alternatives of N in
    all states.
  • LR(1) is more discriminating, keeping a
    look-ahead set for each item to resolve conflicts
    when a reduce item has been reached
  • Increases the strength of the parser, but also
    the size of the parse tables
  • We will demo LR(1) using grammar
  • which is not LL(1) or SLR(1)

S?Axb A?aAbB B?x
4
not LL(1) ...
S?Axb A?aAbB B?x
  • grammar produces the language xb,anxbnn?0
  • this grammar is not LL(1) x is in FIRST(B) and
    so it is also in FIRST(A) and S exhibits a
    FIRST/FIRST conflict on x.
  • not SLR(1) either since SLR(1) bases its reduce
    decision using an item N?a? of the FOLLOW set of
    N.
  • S2 contains both a shift item on b and a reduce
    item B?x?b

5
SLR(1) automaton for grammar of fig 2.95
shift-reduce conflict
6
LR(1) keeps a specific look ahead
  • The LR(1) technique does not rely on FOLLOW sets,
    but keeps a specific look ahead with each item
  • We write N?a?bs where s is the set of tokens
    that can follow this specific item
  • When dot reaches the end of the item
    N?ab ?s it can be reduced only if the look
    ahead is in s at that moment, otherwise item is
    ignored

7
Rules for determining look aheads
  • Look ahead sets of existing items do not change
  • When a new item is created then a new look ahead
    set must be determined,
  • TWO SITUATIONS
  • When creating the initial item set which is
    the only token that can follow the initial item
    set S0
  • When doing e moves prediction rule creates new
    items for the alternatives of N in the presence
    of items of the form P?a?Nbs look ahead set is
    FIRST(bs) if FIRST(b) does not include e then
    FIRST(bs) FIRST(b), if b can produce e then
    FIRST(b) must include all the tokens in FIRST(b)
    excluding e and all the tokens in s

8
LR(1) automaton
S9
9
What can we say about LR(1)
  • More discriminating than SLR(1)
  • So strong that any language that CAN be parsed
    from let to right with a one token look ahead in
    linear time can be parse using LR(1)
  • LR(1) is the strongest possible left-to-right
    parsing method since as Knuth demonstrated in
    1965, the set of LR items implements the best
    possible breadth-first search for handles.
  • BUT LR(1) parsing tables are one or two orders of
    magnitude larger than SLR(1)
  • TANSTAAFL

10
LALR(1) Parsing
  • LR(1) state diagram includes many similar states,
    i.e. with almost identical item sets, identical
    except for look-ahead sets.
  • Examples are S3,S10 and S4,S9 and S6,S11 and
    S8,S12 -- if we ignore the look-ahead what
    remains is called the CORE of the LR(1) state
  • CORES of LR(1) states correspond to LR(0) states
  • LR(1) states are split up versions of LR(0)
    states based on the look ahead

11
Combining CORE Sets Reduces States to SLR(1)/LR(0)
  • Combining States
  • S4 and S9
  • S3 and S10
  • S6 and S11
  • S8 and S12

12
Resulting LALR(1) Automaton for fig. 2.95 Grammar
S7
S4,9
S?Axb A?aAbB B?x
B
S1
x
B
S0
S3,10
A?a.Abb A?.aAbb A?.Bb B?.xb
S?.A S?.xb A?.aAb A?.B
B?.x
A
a
S2
S?x.b B?x.
x
a
S6,11
A
A?aA.bb
b
b
S5
S8,12
13
Summing up LALR(1)
  • Reduce LR(1) number of states to SLR(1) and LR(0)
    automaton by combining CORE states
  • Most popular method in use today
  • Combines most of the power of LR(1) with
    efficiency and has memory requirements of LR(0)
  • State combination cannot cause shift/reduce
    conflicts (see 172)

14
Making a grammar LR(1) or not
  • One still encounters grammars that are not LR(1)
    usually because the grammar is ambiguous
  • example the dangling else problem
  • if_statement?'if' '(' expression ')'statement
  • 'if' '('expression')'statement'else'statement
  • statement ? ... if_statement ...
  • item 1
  • if_statement?
  • 'if' '('expression')'statement...'else'...
  • item 2
  • if_statement?
  • 'if' '('expression')'statement'else'stateme
    nt
  • thus we see a shift/reduce conflict

15
Resolving shift-reduce conflicts
  • traditionally resolved similarly to conflict
    resolution in lexical analyzers
  • the longest possible sequence of grammar symbols
    is taken for reduction, easy to implement in a
    shift/reduce conflict do the shift
  • in the case of the dangling 'else' this results
    in pairing with the latest if without an else as
    stipulated in the C- manual.
  • another useful technique
  • use of precedence between tokens, can be used
    only if the reduce item in the conflict ends in a
    token followed by at most one non-terminal
  • P?a?tb... //the shift item
  • Q?guR? ...t... //the reduce item where R is
    empty or one non-terminal
  • one of three actions
  • 1) u higher precedence than t ?reduce Q
  • 2) t higher precedence than u ?shift continues P
  • 3) same precedence ? shift (see exercise 2.55)

16
Resolving reduce-reduce conflicts
  • corresponds to the situation in the lexical
    analyzer where two patters have the same length
    so the longest token still matches more than one
    pattern
  • Usual resolution textually first grammar rule in
    the parser generator input wins
  • easy to implement
  • generally satisfactory

17
A traditional bottom up parser generator
  • yacc/bison started as a UNIX utility in
    mid-1970's a LALR(1) parser generator
  • problem generates C not ANSI C, bison rectifies
    this problem
  • Unlike Top-Down parsing it is unsafe to associate
    code with a Bottom-Up parse until the entire
    alternative has been recognized
  • yacc associates exactly one parameter with each
    member of the alternative 1,2, ... n including
    terminal symbols is associated with the rule
    non-terminal

18
Example yacc code
include "tree.h" union struct expr
expr struct term term type ltexprgt
expression type lttermgt term token
IDENTIFIER start main main expression
print_expr(1)printf("\n") expression
expression '-' term new_expr()-gttype'-'
-gtexpr1-gtterm3 term
new_expr()-gttype'T'-gtterm1 term
IDENTIFIER new_term()-gttype'I'
Declarations
Start of grammar proper
Grammar Rules
Start of auxiliary C code
19
Auxiliary Code for yacc parser
include "lex.h" int main(void)
start_lex() yyparse() /routine generated
by yacc/ return 0 int yylex(void)
get_next_token() return Token.class
Very High Level View of Analysis Techniques
fig 2.110
20
Summary
  • Lexical Analysis and Syntax Analysis
  • input character ? tokens, tokens?parse tree (AST)
  • Abstract Syntax Tree (AST) is version retaining
    semantically important nodes
  • FSA's can be used to automate the process
  • Different methods for different grammars
  • Parsing
  • two ways Top-Down and Bottom-Up
  • Top-Down (written manually or automatically)
  • recursive descent parser works for a relatively
    small subset of grammars, generated top-down
    parsers use precomputation and generate
    unambiguous transition tables for LL(1) grammars
  • Bottom-Up methods generally automated repeatedly
    identify a handle
  • LR(0), SLR(1), LR(1) and LALR(1) grammars were
    covered the last being the most popular in
    current use

21
Homework for Week 7
  • Get Lex/Flex generated code from page 95 figure
    2.41 to run under Visual C
  • Hints resolve function conflicts by adding
    include ltappropriate librarygt in the generated
    C-file. ex. exit, malloc, realloc, and free are
    in ltstdlib.hgt, the strcpy() function is in
    ltstring.hgt
  • You can include the default main() by adding a 1
    to the line define YY_MAIN 1
  • And then amplify the default main to called
    get_next_token() and print out appropriate results

22
References
  • Text Modern Compiler Design
Write a Comment
User Comments (0)
About PowerShow.com