Using JavaCC - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Using JavaCC

Description:

Table-driven recognizers waste a lot of effort. Read (& classify) the ... Unclosed strings. Unclosed comments. JavaCC as a Parsing Tool. 19. Javacc Overview ... – PowerPoint PPT presentation

Number of Views:446
Avg rating:3.0/5.0
Slides: 23
Provided by: Shmue2
Category:
Tags: javacc | unclosed | using

less

Transcript and Presenter's Notes

Title: Using JavaCC


1
Using JavaCC
  • CMSC 431

2
Automating Lexical Analysis Overall picture
Tokens
3
Building Faster Scanners from the DFA
  • Table-driven recognizers waste a lot of effort
  • Read ( classify) the next character
  • Find the next state
  • Assign to the state variable
  • Branch back to the top
  • We can do better
  • Encode state actions in the code
  • Do transition tests locally
  • Generate ugly, spaghetti-like code
  • (it is OK, this is automatically generated
    code)
  • Takes (many) fewer operations per input character

state s0 string ? char
get_next_char() while (char ! eof) state
?(state,char) string string char char
get_next_char() if (state in Final) then
report acceptance else report failure
4
Inside lexical analyzer generator
  • How does a lexical analyzer work?
  • Get input from user who defines tokens in the
    form that is equivalent to regular grammar
  • Turn the regular grammar into a NFA
  • Convert the NFA into DFA
  • Generate the code that simulates the DFA

5
Flow for Using JavaCC
Extracted from http//www.cs.unb.ca/profs/nickers
on/courses/cs4905/Labs/L1_2006.pdf
6
Structure of a JavaCC File
  • A JavaCC file is composed of 3 portions
  • Options
  • Class declaration
  • Specification for lexical analysis (tokens), and
    specification for syntax analysis.
  • For the very first example of JavaCC, let's
    recognize two tokens '', and numerals.
  • Use an editor to edit and save it with file name
    numeral.jj

7
Using javaCC for lexical analysis
  • javacc is a top-down parser generator.
  • Some parser generators (such as yacc , bison, and
    JavaCUP) need a separate lexical-analyzer
    generator.
  • With javaCC, you can specify the tokens within
    the parser generator.

8
Example File
/ main class definition / PARSER_BEGIN(Numeral)
public class Numeral public static void
main(String args) throws
ParseException, TokenMgrError
Numeral numeral new Numeral(System.in)
while (numeral.getNextToken().kind!EOF)
PARSER_END(Numeral) / token
definitions / TOKEN ltADD ""gt
ltNUMERAL ("0"-"9")gt
9
Options
  • The options portion is optional and is omitted in
    the previous example.
  • STATIC is a boolean option whose default value is
    true. If true, all methods and class variables
    are specified as static in the generated parser
    and token manager.
  • This allows only one parser object to be present,
    but it improves the performance of the parser.
  • To perform multiple parses during one run of your
    Java program, you will have to call the ReInit()
    method to reinitialize your parser if it is
    static.
  • If the parser is non-static, you may use the
    "new" operator to construct as many parsers as
    you wish. These can all be used simultaneously
    from different threads.

10
Start
/ main class definition / PARSER_BEGIN(Numeral)
public class Numeral public static void
main(String args) throws
ParseException, TokenMgrError
Numeral numeral new Numeral(System.in)
while (numeral.getNextToken().kind!EOF)
PARSER_END(Numeral) / token
definitions / TOKEN ltADD ""gt
ltNUMERAL ("0"-"9")gt
11
Compilation
12
javaCC specification of a lexer
Note the need for ( )!
Defining Whitespace
13
A Full Example
  • See the sample file

14
Dealing with errors
  • Error reporting 123eq
  • Could consider it an invalid token (lexical
    error) or
  • return a sequence of valid tokens
  • 123, e, , q,
  • and let the parser deal with the error.

15
Lexical error correction?
  • Sometimes interaction between the Scanner and
    parser can help
  • especially in a top-down (predictive) parse
  • The parser, when it calls the scanner, can pass
    as an argument the set of allowable tokens.
  • Suppose the Scanner sees calss in a context where
    only a top-level definition is allowed.

16
Same symbol, different meaning.
  • How can the scanner distinguish between binary
    minus and unary minus?
  • x -a vs x 3 a

17
Scanner troublemakers
  • Unclosed strings
  • Unclosed comments.

18
JavaCC as a Parsing Tool
19
Javacc Overview
  • Generates a top down parser.
  • Could be used for generating a Prolog parser
    which is in LL.
  • Generates a parser in Java.
  • Hence can be integrated with any Java based
    Prolog compiler/interpreter to continue our
    example.
  • Token specification and grammar specification
    structures are in the same file gt easier to
    debug.

20
Types of Productions in Javacc
  • There can be four different kinds of Productions.
  • Javacode
  • For something that is not context free or is
    difficult to write a grammar for.
  • eg) recognizing matching braces and error
    processing.
  • Regular Expressions
  • Used to describe the tokens (terminals) of the
    grammar.
  • BNF
  • Standard way of specifying the productions of the
    grammar.
  • Token Manager Declarations
  • The declarations and statements are written into
    the generated Token Manager (lexer) and are
    accessible from within lexical actions.

21
Javacc Look-ahead mechanism
  • Exploration of tokens further ahead in the input
    stream.
  • Backtracking is unacceptable due to performance
    hit.
  • By default Javacc has 1 token look-ahead. Could
    specify any number for look-ahead.
  • Two types of look-ahead mechanisms
  • Syntactic
  • A particular token is looked ahead in the input
    stream.
  • Semantic
  • Any arbitrary Boolean expression can be
    specified as a look-ahead parameter.
  • eg) A -gt aBc and B -gt b ( c )? Valid strings
    abc and abcc

22
References
  • Compilers Principles, Techniques and Tools, Aho,
    Sethi, and Ullman
  • http//www.cc.gatech.edu/classes/AY2002/cs2130_spr
    ing/
  • http//www.rose-hulman.edu/Class/se/csse404/class-
    notes/day07-javaCC.ppt
  • http//students.csci.unt.edu/pgupta/2
  • http//www.cs.utsa.edu/danlo/teaching/cs4713/lect
    ure/node14.html
Write a Comment
User Comments (0)
About PowerShow.com