A (Long) Introduction to AntLR - PowerPoint PPT Presentation

About This Presentation

Title:

A (Long) Introduction to AntLR

Description:

http://supportweb.cs.bham.ac.uk/docs/tutorials/docsystem/build/tutorials/an tlr/antlrhome.html ... in which you describe the language grammatically ... – PowerPoint PPT presentation

Number of Views:775

Avg rating:3.0/5.0

Slides: 38

Provided by: Han8150

Category:

more less

Transcript and Presenter's Notes

Title: A (Long) Introduction to AntLR

1
A (Long) Introductionto AntLR

Slides adapted from
AntLR Reference Manual by Terence Pratt
antlr.org/share/1084743321127/ANTLR_Reference_Manu
al.pdf
AntLR Tutorial by Ashley J.S Mills
http//supportweb.cs.bham.ac.uk/docs/tutorials/doc
system/build/tutorials/antlr/antlrhome.html
An Introduction to AntLR by Terence Pratt
http//www.cs.usfca.edu/parrt/course/652/lectures
/antlr.html
An AntLR Tutorial by Scott Stanchfield
javadude.com/articles/antlrtut/

2
AntLR

ANother Tool for Language Recognition
(or anti-LR??)
a LL(k) parser and translator generator tool
which can create
lexers
parsers
abstract syntax trees (ASTs)
in which you describe the language grammatically
and in return receive a program that can
recognize and translate that language

3
Tasks Divided

Lexical Analysis (scanning)
Semantic Analysis (parsing)
Tree Generation
Code Generation

4
Lexer

A source file is streamed to a lexer on a
character by character basis by some kind of
input interface.
Lexer groups characters into meaningful tokens
that are meaningful to the parser.
A token may be
keywords
identifiers
symbols
operators
Lexer also removes comments and whitespace from
the program, which are meaningless to the parser.
So it creates a stream of tokens, which are
received one by one by the parser.

5
Parser

Parser organizes the tokens into the allowed
sequences defined by the grammar of the language.
If the parser encounters a sequence of tokens
that match none of the allowed sequences of
tokens, it will issue an error
A design choice is whether to try to recover from
the error by making assumptions.
Parsers may either do syntax-directed translation
on-the-fly,
or convert the sequences of tokens into an
Abstract Syntax Tree (AST).
An AST is a structure which
keeps information in an easily traversable form
(such as operator at a node, operands at children
of the node)
ignores form-dependent superficial details
More on ASTs later...
Parser also generates one or more symbol table(s)
which contain information, about the tokens it
encounters.

6
What does a grammar file look like?

It is composed of rules
ANTLR accepts three types of grammar
specifications
parsers
lexers
tree-parsers (also called tree-walkers)
Uses LL(k) analysis for all
So the grammar specifications are similar, and
the generated lexers and parsers behave similarly

7
Sample File

taken from AntLR tutorial of Ashley J.S Mills

8
Sample File Divided (1/3)

An arbitrary number of parsers, lexers, and
tree-parsers in a grammar file
a separate class file will be generated for each
i.e, YourLexerClass.class, YourParserClass.class,
YourTreeParserClass.class
Header
put preamble that will be put on top of each of
these classes
an import, maybe?

9
Sample File Divided (2/3)

Options
file-wide
charVocabulary '\0'..'\377' //defines the
alphabet (usage in complement and wildcard)
k2 // means two characters of lookahead
Class specific
... header for parser class only ...
class MyParser extends Parser
options ...parser options...
parser class members
parser rules

10
Sample File Divided (3/3)

Rules in EBNF notation

taken from AntLR tutorial of Ashley J.S Mills

You simply list a set of lexical rules that match
tokens. The tool automatically generates code to
map the next input character(s) to a rule likely
to match. A big "switch that routes
recognition flow to the appropriate rule
11
Symbols in AntLR

taken from AntLR reference manual

12
Lexer

taken from AntLR tutorial of Ashley J.S Mills

With one restriction
Rules defined within a lexer grammar must have a
name beginning with an uppercase letter

13
Lexer Rules
You can define operators like BECOMES
COLON SEMI EQUALS
LBRACKET RBRACKET LPAREN
( RPAREN ) LT lt LTE
lt PLUS MINUS - TIMES
DIV / And then you can
define a token class such as OPS (PLUS MINUS
MULT DIV)
14
Actions

Blocks of source code (expressed in the target
language) enclosed in curly braces
Executed
after the preceding production element has been
recognized
before the recognition of the following element
Typically used to generate output, construct
trees, or modify a symbol table
Position dictates when it is recognized relative
to the surrounding grammar elements.
If the first element of a production, it is
executed before any other element in that
production, but only if that production is
predicted by the lookahead
rule_name
(
init-action
action of 1st production production_1
action of 2nd production production_2
)?

15
Tip Skipping Tokens

A white space has nothing to do in a grammar
WS
( \n \t)
setType(Token.SKIP) ? action
? Do not pass this token to the parser.
Recognize it and then throw it away.
Same for comments )

16
Tip Newline Stuff

Line number of input is used for reporting error
Must be incremented by hand when lexer encounters
a newline
WS
( ' ' '\t' '\f'
// handle newlines
(
"\r\n" // DOS/Windows
'\r' // Macintosh
'\n' // Unix )
// increment the line count
newline() ? action executed only in
this case )
setType(Token.SKIP)

17
Parser

class ExprParser extends Parser
expr
mexpr ((PLUSMINUS) mexpr)
mexpr
atom (STAR atom)
atom
INT
LPAREN expr RPAREN
Rules defined within a parser grammar must have a
name beginning with a lowercase letter

18
Tip Keywords and Literals (1/2)

Many languages have a general "identifier"
lexical rule, and keywords that are special cases
of the identifier pattern
A typical identifier token may be defined as
ID LETTER (LETTER DIGIT)
So how can AntLR understand if is not an
identifier?
You put fixed keywords into a literals table.
checked after each token is matched
Any double-quoted string used in a parser is
automatically entered into the literals table of
the associated lexer.
subprogramBody
(basicDecl)
(procedureDecl)
"begin"
(statement)
"end" IDENT

19
Tip Keywords and Literals (2/2)

option testLiterals
By default, ANTLR will generate code in all lexer
rules to test each token against the literals
table
However, you may suppress this code generation in
the lexer by using a grammar option
class L extends Lexer
options testLiteralsfalse
...
If you turn this option off for a lexer, you may
re-enable it for specific rules
ID options testLiteralstrue
LETTER (LETTER DIGIT)

20
Tip Token Object Creation

You will sometimes want to access information
about the token being matched
Label lexical rules and obtain a Token object
representing the text, token type, line number,
etc... matched for that rule reference
Lexer rule
INT ('0'..'9')
Parser rule
INDEX
'' iINT ''
System.out.println(i.getText())

21
Tip Syntactic / Semantic Predicates

There are other situations where you have to turn
on and off certain rules
depending on prior context or semantic
information
Use predicates to decide

22
Syntactic Predicates

ANTLR (tree) parsers usually use only a single
symbol of lookahead, which is normally not a
problem as intermediate forms are explicitly
designed to be easy to walk
However, there is occasionally the need to
distinguish between similar tree structures
Syntactic predicates can be used to overcome the
limitations of limited fixed lookahead
For example, distinguishing between the unary and
binary minus operator
expr ( (MINUS expr expr) )gt ( MINUS expr expr
)
( MINUS expr )
...
The order of evaluation is very important as the
second alternative is a "subset" of the first
alternative
Syntactic predicates are a form of selective
backtracking and, therefore, actions are turned
off while evaluating a syntactic predicate so
that actions do not have to be undone

23
Semantic Predicates

Semantic predicates
at the start of an alternative decides whether
or not to match
in the middle of productions throw exceptions
when they evaluate to false
stat
isTypeName(LT(1))? ID ID " // declaration
"type varName"
ID "" expr "" // assignment
decl "var" ID "" tID
isTypeName(t.getText()) ? //used to throw an
exception

24
Eg Keeping State Information

Context-sensitive recognition example
If you are matching tokens that separate rows of
data such as "----", you probably only want to
match this if the "begin table" sequence has been
found
BEGIN_TABLE
'' this.inTabletrue // enter table context
ROW_SEP
this.inTable? "---- // sematic predicate
END_TABLE
'' this.inTablefalse // exit table context

25
The Java Code

The code to invoke the parser
import java.io.
class Main
public static void main(String args)
try
// use DataInputStream to grab bytes
MyLexer lexer new MyLexer(new
DataInputStream(System.in))
MyParser parser new MyParser(lexer)
int x parser.expr()
System.out.println(x)
catch(Exception e)
System.err.println("exception "e)

26
Running AntLR

In Linux
runantlr ltantlr_filegt.g
javac .java
java Main
In Windows
Eclipse has a very easy-to-use plugin for AntLR
http//antlreclipse.sourceforge.net/ for very
very detailed instructions
The plugin will run AntLR on the grammar file

27
Expression Evaluation 1 Syntax-Directed
Translation

To evaluate the expressions on the fly as the
tokens come in, add actions to the parser
class ExprParser extends Parser
expr returns int value0 int x
valuemexpr
(
PLUS xmexpr value x
MINUS xmexpr value - x
)
mexpr returns int value0 int x
valueatom
( STAR xatom value x )
atom returns int value0
iINT valueInteger.parseInt(i.getText())
LPAREN valueexpr RPAREN

28
Expression Evaluation 2 via AST Intermediate
Form

A more powerful strategy than syntax-directed
translation is
to build an AST
intermediate representation that holds all or
most of the input symbols and has encoded, in the
structure of the data, the relationship between
those tokens
For this kind of tree, you will use a tree walker
to compute the same values as before, but using a
different strategy
The utility of ASTs becomes clear when you must
do multiple walks over the tree to figure out
what to compute or to do tree rewrites, morphing
the tree towards another language.

29
Abstract Syntax Trees

Abstract Syntax Tree Like a parse tree, without
unnecessary information
Two-dimensional trees that can encode the
structure of the input as well as the input
symbols
Either
homogeneous all objects of the same type e.g.,
CommonAST in ANTLR
or heterogeneous multiple types such as
PlusNode, MultNode...
An AST for (34) might be represented as
No parantheses are included in the tree!

30
AST Construction

To get ANTLR to generate a useful AST
turn on the buildAST option
add a few suffix operators
class ExprParser extends Parser
options buildASTtrue
expr mexpr ((PLUSMINUS) mexpr)
mexpr atom (STAR atom)
atom INT LPAREN! expr RPAREN!
No changes in the Lexer.

31
AST Operators

AST root operator
Normally AntLR makes the first token it
encounters the root of the tree
We usually want to manipulate this, eg, for
operators
A token suffixed with the root operator
forces that token as the root of the current
tree
expr mexpr ((PLUSMINUS) mexpr)
AST exclude operator.
Tokens / rule references suffixed with the
exclude operator are not included in the AST
eg, for parantheses
atom INT LPAREN! expr RPAREN!

32
AST Parsing and Evaluation

Rule format is like (A B C)
which means "match a node of type A, and then
descend into its list of children and match B and
C".
This notation can be nested arbitrarily, using
(...) for child trees
eg, (A B (C D) )
class ExprTreeParser extends TreeParser
expr returns int r0 int a,b
(PLUS aexpr bexpr) r ab
(MINUS aexpr bexpr) r a-b
(STAR aexpr bexpr) r ab
iINT r (int)Integer.parseInt(i.getText())
Important Sufficient matches are not exact
matches. As long as the tree satistfies the
pattern, a match is reported, regardless of how
much is left unparsed
( A B ) ( A (B C) D).

33
in Java

The code to launch the parser and the tree
walker
import java.io.
import antlr.CommonAST
import antlr.collections.AST
class Calc
public static void main(String args)
try
CalcLexer lexer new CalcLexer(new
DataInputStream(System.in))
CalcParser parser new CalcParser(lexer)
parser.expr() // Parse the input expression
CommonAST t (CommonAST)parser.getAST()
System.out.println(t.toStringList()) // Print
the resulting tree out in LISP notation
CalcTreeWalker walker new CalcTreeWalker()
// Traverse the tree created by the parser
int r walker.expr(t)
System.out.println("value is "r)
catch(Exception e)
System.err.println("exception "e)

34
AST Construction by Hand

In some cases, you may want to transfom a tree
yourself. eg, Optimization of addition with zero
class CalcTreeWalker extends TreeParser
options buildAST true // "transform" mode
expr
! (PLUS leftexpr rightexpr) // '!' turns off
auto transform
if ( right.getType()INT
Integer.parseInt(right.getText())0 ) // x0
x
expr left
else if ( left.getType()INT
Integer.parseInt(left.getText())0 ) // 0x x
expr right
else // xy
expr (PLUS, left, right)

35
in Java

The code to launch the parser and tree trasformer
is
import java.io.
import antlr.CommonAST
import antlr.collections.AST
class Calc
public static void main(String args)
try
CalcLexer lexer new CalcLexer(new
DataInputStream(System.in))
CalcParser parser new CalcParser(lexer)
parser.expr() // Parse the input expression
CommonAST t (CommonAST)parser.getAST()
System.out.println(t.toLispString()) // Print
the resulting tree out in LISP notation
CalcTreeWalker walker new CalcTreeWalker()
walker.expr(t) // Traverse the tree created by
the parser
t (CommonAST)walker.getAST() // Get the
result tree from the walker
System.out.println(t.toLispString())
catch(Exception e)
System.err.println("exception "e)

36
Left Recursion Solved

E ? E T T written in AntLR as expr expr PLUS
term term
The code generated checks for expr infinitely
expr()
expr()
match(PLUS)
expr()
Eliminate left recursion by
E ? TE
E ? TE e
results in
expr term (PLUS term)

37
Links

AntLR Reference Manual by Terence Pratt
antlr.org/share/1084743321127/ANTLR_Reference_Manu
al.pdf
AntLR Tutorial by Ashley J.S Mills
http//supportweb.cs.bham.ac.uk/docs/tutorials/doc
system/build/tutorials/antlr/antlrhome.html
An Introduction to AntLR by Terence Pratt
http//www.cs.usfca.edu/parrt/course/652/lectures
/antlr.html
An AntLR Tutorial by Scott Stanchfield
javadude.com/articles/antlrtut/