Lexical Analysis - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Lexical Analysis

Description:

A run over a word is an alternating sequence of ... Run 2: s0 0 s0 1 s1 0 s2 0 s3. Accepts. Accepts when there exists an accepting run. ... – PowerPoint PPT presentation

Number of Views:45
Avg rating:3.0/5.0
Slides: 37
Provided by: doron1
Category:
Tags: analysis | lexical

less

Transcript and Presenter's Notes

Title: Lexical Analysis


1
Lexical Analysis
  • Dragon Book chapter 3

2
Compiler structure
Source program
Lexical analyzer
Syntax analyzer
Error handling
Semantic analyzer
Symbol table
Intermediate codegenerator
Code optimizer
Code generator
Target program
3
Compiler structure
Source program
Lexical analyzer
token
Get next token
Syntax analyzer
Error handling
Symbol table
4
Tokens in programming languages
5
Tokens may be difficult to recognize
  • Fortran DO 5 I1.25 DO 5
    I1,25(spaces do not count).
  • PL/I IF THEN THEN THENELSE ELSE
    ELSETHEN(no reserved keywords).
  • PL/I PR1(2, 7, 18, D3, 175.14)3(proc. call or
    array reference).

6
Strings, languages.
  • A sequence of characters over somealphabet,
    e.g., 0100110 over 0, 1.
  • In computers, usually ASCII or EBCDIC.
  • Length of strings number of characters.
  • Empty string ? (size 0).
  • Concatenation putting one string after another.
    Xdog, Yhouse, XYdoghouse (also X.Y).
  • Prefix ban is prefix of banana.Suffix ana is
    prefix of banana.

7
Language a set of strings
  • The alphabet is a languageLA, B, , Z, a, b,
    , z.
  • Constant languages Xab, ba, Ya.
  • Concatenation X.Y aba, baa.
    Y.X aab, aba.
  • Union X?YXYXYab, ba, a.
  • Exponentation X3 X.X.X
  • Star X zero or more occurrences.L all
    words with letters from L.
  • L all words with one or more letters from L.

8
Regular expressions
  • XY X?Y s s?X or s?Y .
  • X.Y x.y x?X and y?Y .
  • X ?i0,? Xi.
  • X ?i1,? Xi.

9
Examples
  • ab a, b.
  • (ab).(ab) aa, ab, ba, bb.
  • a ? , a, aa, aaa, .
  • (ab) ? , a, b, ab, ba, aa, aba,

10
Defining tokens
  • digit ? 0-9
  • digits ? digit
  • fraction ? . digits ?
  • exponent ? E ( - ? ) digits ?
  • const ? digits fraction exponent

11
Not everything is regular!
  • All the words of the form w c w, wherew is a
    word and c a letter.
  • The syntax of a program, e.g., the recursive
    definition of if-then-else.stmt?if expr then
    stmt else stmt.

12
Reading the input
If agt8 then goto nextloop else begin while zgt8 do
Token starts here
Last character read
  • Need sometimes to lookahead. For example
    identifying the variable done.
  • May need to unread a character.

13
Returning token attributes.
  • if xyz gt 11 then
  • if, keyword
  • id, valuexyz
  • op, valuegt.
  • const, value11
  • then, keyword.

14
Finite Automata
Includes States s1,s2,,s5. Initial states
s1. Accepting states s3,s5. Alphabet a, b,
c. Transitions (s1,a,s2), (s2, a, s3), .
s1
a
b
b
s2
b
a
s5
b
c
a
c
s4
s3
a
Deterministic?
15
Automaton. What is the language?
b
Formally An input is a word over the alphabet
?. A run over a word is an alternating sequence
ofstates and letters, starting from the initial
state. Accepting run ends with an accepting
state.
16
Example
b
Input aabbb Run s0 a s0 a s0 b s1 b s1 b s1.
Accepts. Input aba Run s0 a s0 b s1 a s0. Does
not accept.
17
Automaton. What is the language?
b
s0
s1
a
b
a
18
Automaton. What is the language?
b
s1
a
b
s0
a
19
Identifying tokens
F
I
T
H
E
N
E
L
S
E
letter
letterdigit
20
Non deterministic automata
Allows more than a single transition from a state
with the same label. There does not have to be a
transition from every state with every
label. Allows multiple initial states. Allows ?
transitions.
21
Nondeterministic runs
  • Input 0100
  • Run 1 s0 0 s0 1 s0 0 s0 0 s0. Does not accept.
  • Run 2 s0 0 s0 1 s1 0 s2 0 s3. Accepts.
  • Accepts when there exists an accepting run.

22
Determinizing Automata
Each state of D is a set of the states of
N. Sa?T when Tts?S and sa?t. The initial
state of D includes all the initial states of
N. Accepting states in D include at least one
acceptingstate of N.
23
Determinization
1
0,1
0,1
s0
s1
s2
s3
0,1
1
0
24
Determinization
1
0
25
Translating regular expressions into automata
L1
?
?
?
?
L2
L1?L2
L1.L2
L1
L2
?
?
L
?
?
?
L
26
Automatic translation
  • (ab).(a.b)(a?b)(a?b)(ab).(ab)

a
b
a
?
a
?
?
?
?
b
b
?
?
?
?
27
Determinization with ? transitions.
a
?
a
?
?
?
s7
s9
s1
s3
?
s5
s6
s11
s0
b
b
s8
s10
s2
s4
?
?
?
?
Add to each set states reachable using ?
transitions.
28
Minimization
? Group all the states together. ? Separate
states according to available exit transitions. ?
Separate a set to two if from some of its states
one can reach another set and with others one
cannot. Repeat until cannot separate.
29
Minimization
Group all the states together. p0, p1, p2, p3,
p4.
30
Minimization
  • Separate states according to available exit
    transitions.

31
Minimization
? Separate a set to two if from some of its
states one can reach another set and with others
one cannot. Repeat until cannot separate.
32
Can minimize now
?
a
a
b
b
33
Lex
  • Declarations
  • Translation rules
  • Auxiliary procedures

34
Lex behavior
Lex Program
C Compiler
lex.yy.c
Lex sourceprogramlex.l
a.out
a.out
Input streem
Output tokens
35
Lex behavior
  • Translates the definitions into an automaton.
  • The automaton looks for the longest matching
    string.
  • Either return some value to the reading program
    (parser), or looks for next token.
  • Lookahead operator x/y ? allow the token x only
    if y follows it (but y is not part of the token).

36
Lex Project
  • Project collection date Feb 11th.
  • Work in pairs (singles).
  • Use lex to take a text and check whether the
    number of open parentheses of any kind is equal
    to the number of closed parentheses.
  • Exception Inside quotes. \ is not a closing
    quote.
Write a Comment
User Comments (0)
About PowerShow.com