CSCI 435 Compiler Design - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

CSCI 435 Compiler Design

Description:

names and literals, and the : and ; are. Yacc punctuation. Grammar Rules ... dinosaur.compilertools.net/yacc/index.html and http://dinosaur.compilertools.net ... – PowerPoint PPT presentation

Number of Views:95
Avg rating:3.0/5.0
Slides: 25
Provided by: OwenAst9
Category:

less

Transcript and Presenter's Notes

Title: CSCI 435 Compiler Design


1
CSCI 435 Compiler Design
  • Week 6 Class 2
  • Section 3.2.3 to section 3.2.3.2
  • (253-260)
  • Ray Schneider

2
Topics of the Day
  • Data Flow Equations
  • Setting them up
  • Solving them

3
Data-flow Equations
  • a half-way automation of full symbolic
    interpretation
  • Stack Representation is replaced by a Collection
    of Sets
  • the Semantics of the Node is described more
    formally
  • Interpretation is replaced by a built-in and
    fixed propagation mechanism
  • two set variables are associated with each node N
    in the control flow graph (both start off empty)
  • the input set IN(N), and
  • the output set OUT(N)
  • they replace the stack representation and are
    computed by the propagation mechanism

4
Node(s) of the Control Flow Graph
  • for each Node
  • IN(N) input set
  • OUT(N) output set
  • Input / Output sets contain static information
    about the run-time situation at the node
  • Variable X is equal to 1 here
  • There has been no remote procedure call in any
    path from the routine entry to here
  • Definitions for the variable y reach here from
    nodes N1 and N2
  • Global variable line_count has been modified
    since routine entry
  • GEN(N) contains items added by the node
  • KILL(N) contains items removed by the node N

5
Interpretation mechanism is missing so ...
  • nodes that modify the stack size are not handled
    easily in setting up the data flow equations
  • ex. nodes occurring in expressions like '' which
    will remove two entries from the stack and push
    one entry back on.
  • in practice this is dealt with by combining
    groups of control flow nodes such that there is
    no net stack effect
  • ex. for data-flow equation purposes this entire
    set of control flow nodes is considered a single
    node, with one IN, OUT, GEN, and KILL set.

6
Setting up the data-flow equations
IN(N)Mdynamic? predecessor of N
OUT(M) OUT(N)(IN(N)\KILL(N)) ? GEN(N)
  • actual data-flow equations are the same for all
    nodes
  • information at the ENTRANCE of a node N is equal
    to the union of the information at the exit of
    all dynamic predecessors to N
  • obviously true since no information is lost going
    from Node to Node
  • information at the EXIT of a node N is in
    principle equal to that at the entrance, except
    that all the information in the KILL set has been
    removed from it and all the information in the
    GEN set has been added to it. (The order of
    removing and adding is important first the
    information being invalidated is removed, then
    the new information is added.)

7
example
  • Arrive at a node xy with the IN set Variable
    x in equal to 0 here.
  • THEN the KILL set of the node contains the item
    Variable x is equal to here and the GEN set
    contains Variable x equals y here
  • 1) all items in the IN set that are also in the
    KILL set are erased, i.e. Variable x is equal to
    here, subsumes Variable x equals 0 here, so
    that item is erased
  • 2) next items from the GEN set are added,
  • 3) so the OUT set is Variable x equals y here

8
Interpreting the Data-flow Equations
  • While the operators for set union ? and set
    difference \ are used they really apply as
    information union and information difference
    operators
  • sometimes behave as ordinary set union and
    difference and can be implemented with binary,
    Boolean representations (say bit sets), ex.
    Variable V may be unintialized here set union,
    (i.e. some predecessor node may be
    uninitialized) or ex. Variable V is guaranteed
    to have a value here set intersection (i.e. all
    predecessor nodes must have a value)
  • sometimes the information is more complicated,
    ex. Variable x has a value in the range M to N,
    requiring ad hoc code be designed and written
    that knows how to create, merge and examine such
    ranges

9
Third Data-flow Equation
  • Zeroth Data Flow Equation
  • Defines the IN set of the first node of the
    routine as the set of information items
    established by the parameters of the routine
  • in particular each IN and INOUT parameter gives
    rise to an item 'Parameter Pi has a value here'

IN all value parameters have values
KILL all local information
10
Solving the Data-flow equations (Closure)
First Data Flow equation tells us how to obtain
the IN set of all nodes when we know the OUT sets
of all nodes. Second Data Flow equation tells us
how to obtain the OUT set of a node if we know
its IN set (and its GEN and KILL sets, but they
are constants).
Closure Algorithm for Solving the Data-Flow
Equations Data definitions 1. Constant KILL
and GEN sets for each node. 2. Variable IN and
OUT sets for each node. Initializations 1. The
IN set of the top node is initialized with
information established externally 2. For all
other nodes N, IN(N) and OUT(N) are set to
empty. Inference rules
IN(N)Mdynamic? predecessor of N
OUT(M) OUT(N)(IN(N)\KILL(N)) ? GEN(N)
11
Implementation of the Closure Algorithm
  • implemented by traversing the control graph
    repeatedly computing IN and OUT sets of the nodes
    visited until a complete traversal of the Control
    Flow Graph produces no further change
  • Now we're ready to use the information for
    context checking and code generation.
  • NOTE predecessors of a node are easy to find if
    the Control Flow Graph is doubly linked as shown
    earlier

12
Trivalent Logic for initialization of variables
Note 11 may or may not have a value 10
definitely does not have a value 01 definitely
has a value 00 an error
x is guaranteed to have a value y may or may not
have a value
x is guaranteed not to have a value the
combination of 00 for y is an error
13
if ygt0 then xy else y0 end if
x y
Note 11 may or may not have a value 10
definitely does not have a value 01 definitely
has a value 00 an error
14
Summing Up a Little
  • Generally we visit all the nodes in Control Flow
    Graph order this is not necessary but is
    generally logical and convenient
  • The data-flow algorithm in itself only collect
    information it does not checking and does not
    generate error messages or warnings
  • Additional traversals are needed to use the
    information
  • ex. checking for uninitialized variables

15
Flex and Bison
  • Lex/Flex as we have seen previously is a program
    generator for lexical processing of character
    input streams
  • It accepts a high-level description for character
    string matching and produces a program which
    recognizes regular expressions
  • Lex written code recognizes the expressions in
    the input
  • The Lex source file associates regular
    expressions and program fragments provided by the
    user which are executed as each expression
    appears in the input

16
General Format of Lex
ex. \t \t printf(" ") /
this lex input causes lex to ignore sequences
of 1 or more blanks or tabs up to the end of
line and for blanks not followed by the end
of line it will substitute a single blank /
  • definitions
  • rules
  • user subroutines

User definitions and user subroutines are often
omitted
17
Uses of Lex
  • It can be used alone for simple transformations
    of files, or for analysis and statistics
    gathering at the lexical level
  • Lex generates lexical analyzers that are easy to
    interface with Yacc/Bison
  • Lex programs recognize only regular expressions
  • Yacc writes parsers that accept a large class of
    context free grammars, but requires a lower level
    analyzer to recognize the input tokens

18
Combining Lex and Yacc
  • Lex is used to partition the input stream and
    Yacc (the parser generator) assigns structure to
    the remaining pieces

Lexical Rules
Grammar Rules
Yacc
Lex
yyparse
yylex
Parsed Input
Input
Note all Yacc variables begin with 'yy' so you
can avoid collisions with the user generated code.
19
Yacc Specifications
  • Generally the Lexical Analyzer (ex. yylex.c) is
    included as part of the Yacc Specification File
  • The full Yacc Specification File looks like
  • declarations
  • rules
  • programs
  • Where have we seen this before? Structure is
    similar to Lex input, but what goes in the
    sections is different.

20
Grammar Rules and Actions
Grammar Rules
Smallest legal Yacc Specification is rules
Grammar Rules look like A BODY where A
a non-terminal name BODY a sequence of zero or
more names and literals, and the and
are Yacc punctuation.
Actions Associated with Rules
With each grammar rule, the user may associate
actions to be performed each time the rule is
recognized in the input process. An action is
specified by one or more statements enclosed in
curly braces '' and ''
21
Examples
  • A '(' B ')' hello(1,"abc")
  • XXX YYY ZZZ printf("a message \n")
  • flag25
  • To facilitate easy communication between the
    actions and the parser, Yacc uses the special ''
    symbol. '' is a pseudo-variable for the left
    hand side of the grammar rule, and 1, 2, etc.
    are pseudovariables for the elements of the rhs
  • A B C D 2 has the value returned by C
    etc.
  • default is 1, the value of the first element

22
How the parser works
  • Yacc turns the specification file into a C
    program which parses the input according to the
    specification given.
  • The parser that is produced consists of a Finite
    State Machine with a stack with a look ahead
    token. The current state is the one on top of
    the stack.
  • The machine has only four actions shift, reduce,
    accept and error
  • We'll LEAVE YACC THERE You need to read about it
    so you can use it.

23
Homework for Week 8
  • Bison Familiarization
  • Read the entire 39 pages of "A Compact Guide To
    Lex and Yacc" // you can skim through it the
    first time
  • THEN concentrate first on getting the lex example
    on page 10 running
  • THEN after you have that running go on to
    Practice, Part 1 and strive to get the primitive
    calculator running (pages 14 through 17)

24
References
  • Text Modern Compiler Design Figures
  • Lex A Lexical Analyzer Generator by M.E. Lesk
    and E. Schmidt
  • Yacc Yet Another Compiler-Compiler by Stephen C.
    Johnson
  • see http//dinosaur.compilertools.net/yacc/index.h
    tml and http//dinosaur.compilertools.net/lex/in
    dex.html
Write a Comment
User Comments (0)
About PowerShow.com