Compilers - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Compilers

Description:

If it satisfies, the parser creates the parse tree of that program. ... A grammar produces more than one parse tree for a sentence is called as an ambiguous grammar. ... – PowerPoint PPT presentation

Number of Views:85
Avg rating:3.0/5.0
Slides: 43
Provided by: Srin2
Category:
Tags: compilers | parse

less

Transcript and Presenter's Notes

Title: Compilers


1
Compilers
  • Introduction
  • Basic Compiler Function
  • Lexical Analysis
  • Syntactic Analysis
  • Operator-Precedence Parsing
  • Shift Reduce Parsing
  • Recursive Descent Parsing

N.K. Srinath srinath_nk_at_yahoo.com 1
RVCE
2
Lexical Analysis
Lexical Analysis involves scanning the program to
be compiled. Scanned items are recognized
directly as single tokens. These tokens could
be defined as a part of the grammar.
Example ltidgtltlettergtltidgtltlettergtltidgtltdigitgt
ltlettergtABC ... Z ltdigitgt
012 ...9 In such a case the scanner world
recognize as tokens the single characters
A,B,...Z,0,1,...9. The parser could interpret a
sequence of such characters as the language
construct ltidgt.
N.K. Srinath srinath_nk_at_yahoo.com 31
RVCE
3
  • Features
  • The length of the identifiers could be
    restricted.
  • The scanner generally recognizes both single and
    multiple character tokens directly.
  • The scanner output consists of sequence of
    tokens.
  • This token can be considered to have a fixed
    length code.
  • For a Pascal grammar list of integer code for
    each token is provided in table.

N.K. Srinath srinath_nk_at_yahoo.com 32
RVCE
4
N.K. Srinath srinath_nk_at_yahoo.com 33
RVCE
5

Issues in Lexical Analyzer
  • The lexical analyzer has to recognize the
    longest possible string.
  • Ex identifier newva -- n ne new newv
    newva
  • There is no end delimiter for the tokens
    defined.
  • Normally we dont return a comment as a token
    and the comments are only processed by the
    lexical analyzer.
  • Symbol table holds information about
    token.

N.K. Srinath srinath_nk_at_yahoo.com 34
RVCE
6
  • Some scanners enter the identifiers directly
    into a symbol table. The token specifier for the
    identifiers may be a pointer to the symbol table
    entry for that identifier.
  • The entire program is not scanned at one time.
  • Scanner is a operator as a procedure that is
    called by the processor when it needs another
    token.
  • Scanner is responsible for reading the lines of
    the source program and possible for printing the
    source listing.

N.K. Srinath srinath_nk_at_yahoo.com
35 RVCE
7
  • The scanner, except for printing as the
  • output listing, ignores comments.
  • Scanner must look into the language
    characteristics.
  • Example FOTRAN
  • Columns 1 - 5 Statement number
  • Column 6 Continuation of line
  • Column 7-72 Program statement
  • PASCAL Blanks function as delimiters for tokens
  • Statement can be continued freely
  • End of statement is indicated by (semi
    column)

N.K. Srinath srinath_nk_at_yahoo.com
36 RVCE
8
  • Scanners should look into the rules
  • for the formation of tokens.
  • Example 'READ' Should not be considered as
    keyword as it is within quotes. i.e., all string
    within quotes should not be considered as token.
  •   Blanks are significant within the quoted
    string.
  •   Blanks has important factor to play in
    different language
  • Example 1 FORTRAN Statement
  • Do 10 I 1, 100
  • Do is a key word, I is identifier, 10 is the
    statement number.

N.K. Srinath srinath_nk_at_yahoo.com 37
RVCE
9
Statement DO 10 I 1 It is an identifier
Do 10 I 1 Note Blanks are ignored in
FORTRAN statement and hence it is a assignment
statement. In this case the scanner must look
ahead to see if there is a comma (,) before it
can decide in the proper identification of the
character. Example 2 In FORTRAN keywords may
also be used as an identifier. Words such as if,
then, and ELSE might represent either keywords
or variable names.
N.K. Srinath srinath_nk_at_yahoo.com 38
RVCE
10
if (then .EQ. ELSE) then if
then ELSE then if endif
Modeling Scanners as Finite Automata   Finite
automatic provides an easy way to visualize the
operation of a scanner. An algorithm is shown
to recognize a token.
N.K. Srinath srinath_nk_at_yahoo.com
39 RVCE
11
Get first Input-character if
Input-character in 'A' .. ' Z' then
Begin while Input-character in 'A' ..
'Z', ' 0'.. ' 9' do Begin get next
input character if Input_character _
then Begin get next
Input_character Last_Char_is_Underscore
true End if _
Else

N.K. Srinath srinath_nk_at_yahoo.com
40 RVCE
12
Last_Char_Is_Underscorefalse
end while if
Last_Char_Is_Underscore then
return(token-error) else
return (Valid_token) end if first is
'A' .. ' Z' else return
(token-error)
N.K. Srinath srinath_nk_at_yahoo.com 41
RVCE
13
SYNTACTIC ANALYSIS
  • Syntax Analyzer creates the syntactic
    structure of the given source program.
  • Syntax Analyzer is also known as parser.
  • The syntax of a programming is described by a
    context-free grammar (CFG). We will use BNF
    (Backus-Naur Form) notation in the description of
    CFGs.
  • The syntax analyzer (parser) checks whether a
    given source program satisfies the rules implied
    by a context-free grammar or not.

N.K. Srinath srinath_nk_at_yahoo.com 42
RVCE
14
  • If it satisfies, the parser creates
    the parse tree of that
    program.
  • Otherwise the parser gives the error messages.
  • A context-free grammar
  • gives a precise syntactic specification of a
    programming language.
  • the design of the grammar is an initial phase
    of the design of a compiler.
  • a grammar can be directly converted into a
    parser by some tools.

N.K. Srinath srinath_nk_at_yahoo.com 43
RVCE
15
  • Parser works on a stream of tokens.
  • The smallest item is a token.
  • We categorize the parsers into two groups
  • Top-Down Parser the parse tree is created top
    to bottom, starting from the root.
  • Bottom-Up Parser the parse is created bottom to
    top starting from the leaves

N.K. Srinath srinath_nk_at_yahoo.com 44
RVCE
16
Bottom up Bottom up methods begin with the
terminal nodes of the tree and attempt to
combine these into successively high - level
nodes until the root is reached. Top down Top
down methods begin with the rule of the grammar
that specifies the goal of the analysis ( i.e.,
the root of the tree), and attempt to construct
the tree so that the terminal nodes match the
statement being analyzed.
N.K. Srinath srinath_nk_at_yahoo.com 45
RVCE
17
  • Both top-down and bottom-up
  • parsers scan the input from left
  • to right (one symbol at a time).
  • Efficient top-down and bottom-up parsers can be
    implemented only for sub-classes of context-free
    grammars.
  • LL for top-down parsing
  • LR for bottom-up parsing

N.K. Srinath srinath_nk_at_yahoo.com 46
RVCE
18

Context-Free Grammars
  • Inherently recursive structures of a programming
    language are defined by a context-free grammar.
  • In a context-free grammar, we have
  • A finite set of terminals (in our case, this
    will be the set of tokens)
  • A finite set of non-terminals
    (syntactic-variables)
  • A finite set of productions rules in the
    following form

N.K. Srinath srinath_nk_at_yahoo.com 47
RVCE
19
  • A ? ?
  • where A is a non-terminal and
  • ? is a string of terminals and non-terminals
    (including the empty string)
  • A start symbol (one of the non-terminal symbol)
  • Example
  • E ? E E E E E E E / E
    - E
  • E ? ( E )
  • E ? id

N.K. Srinath srinath_nk_at_yahoo.com 48
RVCE
20

Derivations
E EE EE derives from E we can replace E by
EE to able to do this, we have to have a
production rule EEE in our grammar. E
EE idE idid A sequence of replacements of
non-terminal symbols is called a derivation of
idid from E.
N.K. Srinath srinath_nk_at_yahoo.com 49
RVCE
21

Left-Most Derivation E ? -E ? -(E) ? -(EE) ?
-(idE) ? -(idid) Right-Most Derivation E ? -E
? -(E) ? -(EE) ? -(Eid) ? -(idid) We will see
that the top-down parsers try to find the
left-most derivation of the given source
program. We will see that the bottom-up parsers
try to find the right-most derivation of the
given source program in the reverse order.
N.K. Srinath srinath_nk_at_yahoo.com 50
RVCE
22

Parse Tree
  • Inner nodes of a parse
  • tree are non-terminal symbols.
  • The leaves of a parse tree
  • are terminal symbols.
  • A parse tree can be seen as a graphical
    representation of a derivation.

? -(E)
N.K. Srinath srinath_nk_at_yahoo.com 51
RVCE
23

Ambiguity
A grammar produces more than one parse tree for
a sentence is called as an ambiguous grammar.
E ? EE ? idE ? idEE ? ididE ? ididid
E ? EE ? EEE ? idEE ? ididE ? ididid
N.K. Srinath srinath_nk_at_yahoo.com 52
RVCE
24
  • Operator-Precedence Parsing
  • It is very simple
  • Used in languages where virtually all
    operators are used. Example SNOBOL.
  • Three disjoint relations lt. and .gt are used
    between certain pairs of terminals.
  • If alt. b we say that a yields precedence to
    b.
  • if ab we say that a has same precedence as b.
  • If a .gtb we say that a takes precedence over b.

N.K. Srinath srinath_nk_at_yahoo.com 53
RVCE
25
OPERATOR PRECEDENCE PARSING The bottom up
parsing technique considered is called the
operator precedence method. This method is loaded
on examining pairs of consecutive operators in
the source program and making decisions about
which operation should be performed first.
Example A B C - D (1)
N.K. Srinath srinath_nk_at_yahoo.com 54
RVCE
26
There are two ways of determining what precedence
relation should hold between a pair of
terminals. First Method Intuitive method based
on the traditional notions of associativity and
precedence of operators. The usual procedure of
operation is multiplication and division has
higher precedence over addition and
subtraction. the two operators ( and ), we find
that has lower precedence than . This is
written as ? has lower precedence .
N.K. Srinath srinath_nk_at_yahoo.com 55
RVCE
27
Consider the following grammar for expressions E
E A E (E) -E id (2) A -
/ It is not an operator grammar. If we
substitute for A each of its alternates, we
obtain the following operator grammar E E
E E E E E E / E (E) -E id The
ambiguity with this grammar is that it does not
indicate precedence of relations.
N.K. Srinath srinath_nk_at_yahoo.com 56
RVCE
28
There are two ways of determining what precedence
relation should hold between a pair of
terminals. First Method Intuitive method based
on the traditional notions of associativity and
precedence of operators. This approach will
resolve the ambiguities of grammar shown in (2)
and allow us to resolve the ambiguities. Second
Method An Unambiguous grammar for the
N.K. Srinath srinath_nk_at_yahoo.com 57
RVCE
29
language is constructed first. This grammar
reflects the correct associativity and precedence
in its parse trees. Example For arithmetic
expressions involving , -, , / the grammar is
E E E E E E E E / E (E)
-E id
N.K. Srinath srinath_nk_at_yahoo.com 58
RVCE
30
To construct an unambiguous
grammar, there is a mechanical
method for constructing
operator-precedence relations form it. Example
for the expression
id id id the operator
precedence
relations table
is as follows
precedence
relations table
N.K. Srinath srinath_nk_at_yahoo.com 59
RVCE
31
  • The given expression is considered
    as string.
  • All the nonterminals are removed
    and correct relation ? ,? and ? are
    placed between terminals as per the operator
    precedence relations.
  • is placed at the beginning and the end of
    the string.
  • Example id id id id
  • ? ? ?

N.K. Srinath srinath_nk_at_yahoo.com 60
RVCE
32
Precedence matrix for the grammar Pascal rammar
N.K. Srinath srinath_nk_at_yahoo.com 61
RVCE
33
For a Pascal Grammar the precedence for some of
the tokens are explained. Example PROGRAM?VAR
These two tokens have equal precedence Begin ?
FOR begin has lower precedence over FOR.
There are some values which do not follow
precedence relations for comparisons. Example
? end and end ? i.e., when is
followed by end, the ' ' has higher precedence
and when end is followed by the end has higher
precedence.
N.K. Srinath srinath_nk_at_yahoo.com 62
RVCE
34
In all the statements where precedence
relation does not exist in the table,
two tokens cannot appear
together in any legal statement. If such
combination occurs during parsing it should be
recognized as error. Example Pascal
Statement begin READ (VALUE)
These Pascal statements scanned from left to
right, one token at a time. For each pair of
operators, the precedence relation between them
is determined.
N.K. Srinath srinath_nk_at_yahoo.com 63
RVCE
35
  • . . . begin READ ( id )

    ? ? ? ?
  • 2. . . . begin READ ( lt N1 gt )
    (N1)
  • ? ? ?
    ?
  • id

  • Value

N.K. Srinath srinath_nk_at_yahoo.com 64
RVCE
36
. . . begin lt N2 gt
ltN2 gt
READ ( ltN1gt )
N.K. Srinath srinath_nk_at_yahoo.com 65
RVCE
37
  • Example Show a step-by-step parsing for the
    assignment
  • VARIANCE SUMSQ DIV 100 - MEAN MEAN
  • . . id 1 id 2 DIV int -
    id3 id4
  • ? ? ?
  • Left to right scan is continued in each step
    only far enough to determine the next portion of
    the statement to be recognized, which is the
    first portion delimited by ? and ?.
  • Once this portion has been determined, it is
    interpreted as a nonterminal according t some
    rule of the grammar.

N.K. Srinath srinath_nk_at_yahoo.com 66
RVCE
38
Parse tree is constructed from the terminal nodes
up towards the root, hence the term bottom-up
parsing.
The id SUMSQ is interpreted as the single
nonterminal ltN1gt, which is an operand of the
DIV. That is, ltN1gt in the tree corresponds to two
non terminals, ltfactorgt and lttermgt as per the
pascal grammar.
ii . . . id 1 ltN1gt DIV int -
id3 id4
? ? ? ?
N.K. Srinath srinath_nk_at_yahoo.com 67
RVCE
39
iii . id 1 ltN1gt DIV ltN2gt- id3 id4
? ?
? iv id 1 ltN3gt - id3 id4

? ? ? ?
N.K. Srinath srinath_nk_at_yahoo.com 68
RVCE
40
v id 1 ltN3gt - ltN4gt id4

? ? ? ?
? vi id 1 ltN3gt - ltN4gt ltN5gt

? ? ?
? vii id 1 ltN3gt - ltN6gt

? ? ?
N.K. Srinath srinath_nk_at_yahoo.com 69
RVCE
41
.. id1 ltN7gt
?
? ?
N.K. Srinath srinath_nk_at_yahoo.com 70
RVCE
42
ltN8gt
N.K. Srinath srinath_nk_at_yahoo.com 71
RVCE
Write a Comment
User Comments (0)
About PowerShow.com