LANGUAGE TRANSLATORS: WEEK 14 - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

LANGUAGE TRANSLATORS: WEEK 14

Description:

Is the first step in the translation/compilation process. input language ==== output language ... throws java.io.IOException { for (;;) switch (next_char) ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 15
Provided by: Computing115
Category:

less

Transcript and Presenter's Notes

Title: LANGUAGE TRANSLATORS: WEEK 14


1
LANGUAGE TRANSLATORS WEEK 14
  • LECTURE
  • REGULAR EXPRESSIONS
  • FINITE STATE MACHINES
  • LEXICAL ANALYSERS
  • TUTORIAL
  • CAPTURING LANGUAGES USING REGULAR EXPRESSIONS

2
LEXICAL ANALYSIS
  • Is the first step in the translation/compilation
    process
  • input language gt output language
  • means putting the raw characters of the input
    into TOKENS.

3
LEXICAL ANALYSIS PHASE
  • The language of TOKENS e.g. Identifiers is always
    a regular language.
  • REGULAR EXPRESSIONS generate regular languages
    (as do Regular Grammars..) The tokens of
    languages are often specified by regular
    expressions.
  • Finite State Machines consume regular languages

4
REGULAR EXPRESSIONS
  • One line method of specifying a language
  • equivalent to type 3 or regular grammars
  • used to parameterize UNIX/LINUX file processing
    commands

5
REGULAR EXPRESSIONS - DEFINITION
  • EXAMPLE DEFINITION
  • a b means choice
  • a b c abc .. is shorthand for
    multiple choice
  • e e means the empty
    word
  • (abc) means repetition 0,1
    or more ..
  • (abcd) means repetition 1 or
    more times

6
REGULAR EXPRESSIONS - EXAMPLES
  • a - z A - Za - z A - Z 0 - 9
  • defines the language of IDENTIFIERS in some
  • programming languages
  • (xyz) defines the language
  • e , xyz, xyzxyz, xyzxyzxyz, ..
  • abcd defines the language
  • a, b, c, d, aa, ab, ac, ad, ba, bb, bc, bd, ca,
    ..
  • Putting choice and repetition together produces
  • complicated regular languages

7
Finite State Machines
  • Can be defined by annotated nodes and arcs.
  • Can translate Reg. Exps into FSMs but must add
    ERROR STATES onto the FSMs

8
Regular Expression gt NDFSM
  • ab
  • ab
  • a
  • then NDFSM gt FSM..

a
b
a
b
a
9
Example
  • Specify a language of alphabet w,x,y,z with
    the only restrictions being that
  • 1. no strings contain both x and y, and
  • 2. If there is a y and w in a string, then the
    first w ALWAYS occurs before the first y
  • SOLUTION
  • 1. Write down exs and counter exs
  • 2. Decide on any ambiguities
  • 3.. Use Case Analysis to sub-divide the problem
  • language (a) strings of w,x,z UNION
  • (b)strings of w,y,z with restriction 2.
  • - Part (a) w x z
  • - Part (b) can assume y is always in a string
  • y z z w wz y x y z
  • -. Put together answer w x z y z
    z w wz y x y z

10
A LEXICAL ANALYSER - GENERATOR (e.g. LEX, JLEX) -
how they work
  • INPUT REGULAR EXPRESSIONS
  • TRANSLATE REGULAR EXPRESSION INTO
    NON-DETERMINISTIC FSM
  • TRANSLATE NON-DETERMINISTIC FSM INTO
    DETERMINISTIC FSM (which is easily described as a
    simple program)

11
EXAMPLE INPUT TOA LEXICAL ANALYSER - GENERATOR
  • "" return new Symbol(sym.SEMI)
  • "" return new Symbol(sym.PLUS)
  • "" return new Symbol(sym.TIMES)
  • "(" return new Symbol(sym.LPAREN)
  • ")" return new Symbol(sym.RPAREN)
  • 0-9 return new Symbol(sym.NUMBER, new
    Integer(yytext()))
  • \t\r\n\f / ignore white space. /
  • . System.err.println("Illegal character
    "yytext())
  • example if string (2313)3 was input to the
  • generated lexical analyser the output would be
  • LPAREN (NUMBER,231) PLUS (NUMBER,3) RPAREN
  • TIMES (NUMBER,3)

12
Simple Lexical Analyser
for () switch (next_char)
case '0' case '1' case '2' case '3'
case '4' case '5' case '6' case
'7' case '8' case '9' / parse a
decimal integer / int i_val 0
do i_val i_val
10 (next_char - '0')
advance() while (next_char gt
'0' next_char lt '9') return new
Symbol(sym.INT, new Integer(i_val))
case 'p' advance() return new
Symbol(sym.PRINT) case 'r'
advance() return new Symbol(sym.REPEAT)
case 'u' advance() return new
Symbol(sym.UNTIL) case ''
advance() return new Symbol(sym.ASSIGNS)
case '' advance() return new
Symbol(sym.SEMI) case ''
advance() return new Symbol(sym.PLUS)
case '-' advance() return new
Symbol(sym.MINUS) case '('
advance() return new Symbol(sym.LPAREN)
case ')' advance() return new
Symbol(sym.RPAREN) case 'x'
advance() return new Symbol(sym.ID,"x")
case 'y' advance() return new
Symbol(sym.ID,"y") case 'z'
advance() return new Symbol(sym.ID,"z")
case -1 return new Symbol(sym.EOF)
default advance() break
  • public class scanner
  • protected static int next_char
  • protected static void advance()
  • throws java.io.IOException
  • next_char System.in.read()
  • public static void init()
  • throws java.io.IOException
  • advance()
  • public static Symbol next_token()
  • throws java.io.IOException

13
nb
  • Regular expressions simplify pattern-matching
    code
  • Discover the elegance of regular expressions in
    text-processing scenarios that involve pattern
    matching
  • By Jeff Friesen, JavaWorld.com, 02/07/03
  • Text processing frequently requires code to match
    text against patterns. That capability makes
    possible text searches, email header validation,
    custom text creation from generic text (e.g.,
    "Dear Mr. Smith" instead of "Dear Customer"), and
    so on. Java supports pattern matching via its
    character and assorted string classes. Because
    that low-level support commonly leads to complex
    pattern-matching code, Java also offers regular
    expressions to help you write simpler code.
  • Regular expressions often confuse newcomers.
    However, this article dispels much of that
    confusion. After introducing regular expression
    terminology, the java.util.regex package's
    classes, and a program that demonstrates regular
    expression constructs, I explore many of the
    regular expression constructs that the Pattern
    class supports. I also examine the methods
    comprising Pattern and other java.util.regex
    classes. A practical application of regular
    expressions concludes my discussion.
  • See http//www.javaworld.com/javaworld/jw-02-2003/
    jw-0207-java101.html

14
Summary
  • Regular expressions are a quick and easy way to
    specify simple forms of language. They can be
    easily translated into FSMs (which have nice
    properties e.g. they have linear time complexity
    in their execution)
  • There are tools (JLEX) which input regular
    expressions and output a lexical analyser which
    recognises the language they define.
Write a Comment
User Comments (0)
About PowerShow.com