LANGUAGE TRANSLATORS: WEEK 14 - PowerPoint PPT Presentation

1 / 14

About This Presentation

Title:

LANGUAGE TRANSLATORS: WEEK 14

Description:

Is the first step in the translation/compilation process. input language ==== output language ... throws java.io.IOException { for (;;) switch (next_char) ... – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 15

Provided by: Computing115

Category:

more less

Transcript and Presenter's Notes

Title: LANGUAGE TRANSLATORS: WEEK 14

1
LANGUAGE TRANSLATORS WEEK 14

LECTURE
REGULAR EXPRESSIONS
FINITE STATE MACHINES
LEXICAL ANALYSERS
TUTORIAL
CAPTURING LANGUAGES USING REGULAR EXPRESSIONS

2
LEXICAL ANALYSIS

Is the first step in the translation/compilation
process
input language gt output language
means putting the raw characters of the input
into TOKENS.

3
LEXICAL ANALYSIS PHASE

The language of TOKENS e.g. Identifiers is always
a regular language.
REGULAR EXPRESSIONS generate regular languages
(as do Regular Grammars..) The tokens of
languages are often specified by regular
expressions.
Finite State Machines consume regular languages

4
REGULAR EXPRESSIONS

One line method of specifying a language
equivalent to type 3 or regular grammars
used to parameterize UNIX/LINUX file processing
commands

5
REGULAR EXPRESSIONS - DEFINITION

EXAMPLE DEFINITION
a b means choice
a b c abc .. is shorthand for
multiple choice
e e means the empty
word
(abc) means repetition 0,1
or more ..
(abcd) means repetition 1 or
more times

6
REGULAR EXPRESSIONS - EXAMPLES

a - z A - Za - z A - Z 0 - 9
defines the language of IDENTIFIERS in some
programming languages
(xyz) defines the language
e , xyz, xyzxyz, xyzxyzxyz, ..
abcd defines the language
a, b, c, d, aa, ab, ac, ad, ba, bb, bc, bd, ca,
..
Putting choice and repetition together produces
complicated regular languages

7
Finite State Machines

Can be defined by annotated nodes and arcs.
Can translate Reg. Exps into FSMs but must add
ERROR STATES onto the FSMs

8
Regular Expression gt NDFSM

ab
ab
a
then NDFSM gt FSM..

a
b
a
b
a
9
Example

Specify a language of alphabet w,x,y,z with
the only restrictions being that
1. no strings contain both x and y, and
2. If there is a y and w in a string, then the
first w ALWAYS occurs before the first y
SOLUTION
1. Write down exs and counter exs
2. Decide on any ambiguities
3.. Use Case Analysis to sub-divide the problem
language (a) strings of w,x,z UNION
(b)strings of w,y,z with restriction 2.
- Part (a) w x z
- Part (b) can assume y is always in a string
y z z w wz y x y z
-. Put together answer w x z y z
z w wz y x y z

10
A LEXICAL ANALYSER - GENERATOR (e.g. LEX, JLEX) -
how they work

INPUT REGULAR EXPRESSIONS
TRANSLATE REGULAR EXPRESSION INTO
NON-DETERMINISTIC FSM
TRANSLATE NON-DETERMINISTIC FSM INTO
DETERMINISTIC FSM (which is easily described as a
simple program)

11
EXAMPLE INPUT TOA LEXICAL ANALYSER - GENERATOR

"" return new Symbol(sym.SEMI)
"" return new Symbol(sym.PLUS)
"" return new Symbol(sym.TIMES)
"(" return new Symbol(sym.LPAREN)
")" return new Symbol(sym.RPAREN)
0-9 return new Symbol(sym.NUMBER, new
Integer(yytext()))
\t\r\n\f / ignore white space. /
. System.err.println("Illegal character
"yytext())
example if string (2313)3 was input to the
generated lexical analyser the output would be
LPAREN (NUMBER,231) PLUS (NUMBER,3) RPAREN
TIMES (NUMBER,3)

12
Simple Lexical Analyser
for () switch (next_char)
case '0' case '1' case '2' case '3'
case '4' case '5' case '6' case
'7' case '8' case '9' / parse a
decimal integer / int i_val 0
do i_val i_val
10 (next_char - '0')
advance() while (next_char gt
'0' next_char lt '9') return new
Symbol(sym.INT, new Integer(i_val))
case 'p' advance() return new
Symbol(sym.PRINT) case 'r'
advance() return new Symbol(sym.REPEAT)
case 'u' advance() return new
Symbol(sym.UNTIL) case ''
advance() return new Symbol(sym.ASSIGNS)
case '' advance() return new
Symbol(sym.SEMI) case ''
advance() return new Symbol(sym.PLUS)
case '-' advance() return new
Symbol(sym.MINUS) case '('
advance() return new Symbol(sym.LPAREN)
case ')' advance() return new
Symbol(sym.RPAREN) case 'x'
advance() return new Symbol(sym.ID,"x")
case 'y' advance() return new
Symbol(sym.ID,"y") case 'z'
advance() return new Symbol(sym.ID,"z")
case -1 return new Symbol(sym.EOF)
default advance() break

public class scanner
protected static int next_char
protected static void advance()
throws java.io.IOException
next_char System.in.read()
public static void init()
throws java.io.IOException
advance()
public static Symbol next_token()
throws java.io.IOException

13
nb

Regular expressions simplify pattern-matching
code
Discover the elegance of regular expressions in
text-processing scenarios that involve pattern
matching
By Jeff Friesen, JavaWorld.com, 02/07/03
Text processing frequently requires code to match
text against patterns. That capability makes
possible text searches, email header validation,
custom text creation from generic text (e.g.,
"Dear Mr. Smith" instead of "Dear Customer"), and
so on. Java supports pattern matching via its
character and assorted string classes. Because
that low-level support commonly leads to complex
pattern-matching code, Java also offers regular
expressions to help you write simpler code.
Regular expressions often confuse newcomers.
However, this article dispels much of that
confusion. After introducing regular expression
terminology, the java.util.regex package's
classes, and a program that demonstrates regular
expression constructs, I explore many of the
regular expression constructs that the Pattern
class supports. I also examine the methods
comprising Pattern and other java.util.regex
classes. A practical application of regular
expressions concludes my discussion.
See http//www.javaworld.com/javaworld/jw-02-2003/
jw-0207-java101.html

14
Summary

Regular expressions are a quick and easy way to
specify simple forms of language. They can be
easily translated into FSMs (which have nice
properties e.g. they have linear time complexity
in their execution)
There are tools (JLEX) which input regular
expressions and output a lexical analyser which
recognises the language they define.

Write a Comment

User Comments (0)