CSC 8310 Linguistics of Programming Languages - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

CSC 8310 Linguistics of Programming Languages

Description:

Compile the generated files. Make sure the ANTLR jar file is on classpath. To use: ... input = new ANTLRInputStream(System.in); TLexer lexer = new TLexer(input) ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 25
Provided by: vijayg2
Category:

less

Transcript and Presenter's Notes

Title: CSC 8310 Linguistics of Programming Languages


1
CSC 8310 Linguistics of Programming Languages
  • Fall 2009
  • Instructor Vijay Gehlot
  • Week 2

2
Phases/Components of Compiler
Source
Tokens
AST
1. Scanner (Lexical Analyzer)
2. Parser (Syntax Analyzer)
3. Semantic Analyzer
AnnotatedAST
Intermediate Code
Machine Code
Target
Other Components Symbol Table and Error Handler
1-3
Analysis Phase, 4-7 Synthesis Phase, 1-5 Front
End, 6-7 Back End
3
1. Scanner (Lexical Analyzer)
  • Remove white spaces and comments
  • Tokenize (group into meaningful pieces)
  • Handle lexical errors (this is the first stage in
    compilation from which we get compilation error)
  • Tokens may have values
  • Token id (identifier)
  • Token value e.g. x

4
2. Parser (Syntax Analyzer)
  • Checks whether sequence of tokens conforms to the
    (syntactic) rules of the language.
  • E.g. ab each token is valid but the sequence is
    not
  • Handle syntax errors
  • Uses Context Free Grammar (CFG) defined later

5
3. Semantic Analyzer (Type checker)
  • Type checking
  • Other semantic checking
  • Unbound/undeclared variable
  • Multiply defined names
  • Uninitialized variable

6
4-5. Intermediate Code Generator and Optimizer
  • Many forms
  • Machine independent optimization
  • Algebraic simplification
  • Optional component

7
6-7. Code Generator and Optimizer
  • Target specific architecture
  • Machine dependent optimization
  • Specialized instructions
  • Optimization optional
  • Examples code and optimization (Java/C)

8
Languages
  • Any language has
  • Syntax
  • Described using a formal notationContext Free
    Grammar (CFG)
  • Specialized notation called Regular Expressions
    can be used for lexical syntax, i.e., tokens
  • Semantics
  • Many different ways
  • Operational, Axiomatic, Denotational

9
Context Free Grammar (CFG)
  • Derivations
  • Parse trees
  • Abstract syntax trees (ABS)
  • Ambiguity (Syntactic)
  • BNF (Backus Naur Form), EBNF (Extended Backus
    Naur Form), Syntax diagrams

10
Definition of Context Free Grammar
  • Definition A CFG consists of
  • A finite set of non-terminal symbols (N)
  • A finite set of terminal symbols (tokens) (T)
  • A distinguished start symbol S from N
  • A finite collection of rules (or productions) of
    the form
  • X ? A1 A2,An where
  • X is from N,
  • Ai is from N or T, n0,
  • if n0 then write Ai ? e

11
One Step Derivation
  • Definition Given a sequence of symbols
  • A1 A2 Ai Ak
  • If Ai ? B1 B2 Bj is a production,
  • then we can obtain
  • A1 A2Ai-1 B1 B2 Bj Ai1 An
  • in one step.
  • We denote it as
  • A1 A2 Ak gt A1 A2Ai-1 B1 B2 Bj Ai1 An

Ai
12
Derivation Sequence
  • Definition
  • Let S be the start symbol.
  • Let t1 t2 ti be sequence of terminal symbols
    (tokens, a program).
  • A derivation sequence for t1 t2 ti is a sequence
    of one step derivations that starts with S and
    ends with t1 t2 ti .

13
Definition of Parsing
  • Definition A sentence (sequence of terminal
    symbols) is syntactically valid if there is a
    derivation sequence for it.
  • Typically choice of nonterminals to be expanded
  • Two canonical ways
  • Leftmost expand leftmost nonterminal at each
    step
  • Rightmost expand rightmost nonterminal at each
    step
  • Correspond to Top-down and Bottom-up parsing

14
Types of Parsers
  • Top down parser
  • Mimics leftmost derivation
  • Bottom up parser
  • Mimics rightmost derivation in reverse
  • Top down parsers cannot handle left recursion.
  • These are deterministic parsers.
  • General parsing is expensive and not practical

15
Parse Trees
  • Definition Parse Tree is a tree such that
  • all interior nodes are labeled from (N)
  • root labeled with S (start symbol)
  • all leaves are labeled from (T)
  • if X
  • A1 A2 Ai Ak
  • where X is a nonterminal and Ais are terminals or
    non terminals, then
  • X ? A1 A2 Ai Ak
  • must be a production.

16
Parse Trees (cont.)
  • Definition t1 t2 tn is valid if there is a
    parse tree whose leaves spell t1 t2 tn when
    read left to write.
  • Can be constructed from derivation or directly.

17
Ambiguity
  • Definition A CFG is ambiguous if there is at
    least one sentence for which there is more than
    one leftmost (or rightmost) derivations or parse
    trees.
  • Should be avoided
  • No general algorithm
  • Typically has to do with grouping
  • Can rewrite or redefine syntax
  • Pros/Cons

18
AST
  • Definition Abstract Syntax Tree (AST) is a tree
    in which interior nodes are operations and
    children are operands.
  • while (condition) body

19
Other Approaches for Describing Syntax
  • BNF
  • EBNF All are equivalent to
    CFG
  • Syntax Diagrams
  • Some actual examples
  • http//java.sun.com/docs/books/jls/second_edition/
    html/grammars.doc.html44271
  • http//www.scheme.com/tspl2d/grammar.htmlg2488
  • http//www.schemers.org/Documents/Standards/R5RS/H
    TML/r5rs-Z-H-10.html_chap_7

20
ANTLR Tool
  • Parsers can be automatically generated from a CFG
    description
  • ANTLR is one such tool that generates a
    (recursive descent) parser (in Java by default)
  • Other tools Yacc, Bison, SableCC, JavaCC,
    MLYacc, etc.

ANTLR Tool
Grammar file
Java Code
21
ANTLR Tool
  • Allows EBNF.
  • Is in LL category and hence cannot handle
    right-recursive grammar rules
  • Has a GUI-based grammar development environment
    called ANTLRWorks
  • Includes automatic transformation of
    right-recursive rules

22
ANTLR Grammar File Format
  • Simplified version
  • grammar name
  • / Comment Lexical Rules (Tokens).
  • Token names must begin with an uppercase
    letter /
  • RULE1 ...
  • RULE2 ...
  • ...
  • / Comment Syntax Rules.
  • Non-terminals must begin with a lowercase
    letter /
  • rule1 ... ... ...
  • rule2 ... ... ...
  • ...

23
ANTLR Output
  • From a grammar named T in file T.g it generates
  • TLexer.java
  • TParser.java
  • T.tokens
  • ANTLR generates a method for each rule in a
    grammar.
  • The methods are wrapped in a Java class
    definition (Tparser.java).
  • ANTLR provides named actions so you can insert
    fields and instance methods into the generated
    class definition. E.g.,
  • grammar T
  • _at_header import java.util.
  • _at_members
  • int n
  • public void foo() ...

24
ANTLR Output
  • Compile the generated files. Make sure the ANTLR
    jar file is on classpath
  • To use
  • import org.antlr.runtime.
  • public class Test
  • public static void main(String args) throws
    Exception
  • ANTLRInputStream input new ANTLRInputStream(Sys
    tem.in)
  • TLexer lexer new TLexer(input)
  • CommonTokenStream tokens new
    CommonTokenStream(lexer)
  • TParser parser new TParser(tokens)
  • parser.rule1() // invoke method associated with
    the start symbol
Write a Comment
User Comments (0)
About PowerShow.com