How to recognize tokens specified by regular expressions?

About This Presentation

Title:

Description:

Number of Views:160

Avg rating:3.0/5.0

Slides: 10

Provided by: xyu

Learn more at: http://www.cs.fsu.edu

Category:

more less

Transcript and Presenter's Notes

Title: How to recognize tokens specified by regular expressions?

1

How to recognize tokens specified by regular
expressions?
A recognizer for a language is a program that
takes a string x as input and answers yes if x
is a sentence of the language and no otherwise.
A regular expression can be compiled into a
recognizer by constructing a finite automata
which can be deterministic or non-deterministic.
A non-deterministic finite automata (NFA) is a
mathematical model that consists of (a 5-tuple
a set of states Q
a set of input symbols
a transition function that maps state-symbol
pairs to sets of states.
A state q0 that is distinguished as the start
(initial) state
A set of states F distinguished as accepting
(final) states.
An NFA example

An NFA is non-deterministic in that (1) same
character can label two or more transitions out
of one state (2) empty string can label
transitions.
An NFA accepts an input string x if and only if
there is some path in the transition graph from
the start state to some accepting state.
For example, here is an NFA that recognizes the
language ???.
An NFA can easily implemented using a transition
table.
State
a b
0 0, 1 0
1 - 2
2 - 3

a
2
3
1
0
a
b
b
b
3

The algorithm that recognizes the language
accepted by NFA.
Input an NFA (transition table) and a string x
(terminated by eof).
output yes if accepted, no otherwise.
S e-closure(s0)
a nextchar
while a ! eof do begin
S e-closure(move(S, a))
a next char
end
if (intersect (S, F) ! empty) then return yes
else return no
Note e-closure(S) are the state that can be
reached from states in S through transitions
labeled by the empty string.

a
N(s)
N(t)
N(s)
N(t)
N(s)
6

Using NFA, we can recognize a token in
O(S2X) time, we can improve the time
complexity by using deterministic finite
automaton instead of NFA.
An NFA is deterministic (a DFA) if
no transitions on empty-string
for each state S and an input symbol a, there is
at most one edge labeled a leaving S.
What is the time complexity to recognize a token
when a DFA is used?

Algorithm to convert an NFA to a DFA that accepts
the same language (algorithm 3.2, page 118)
initially e-closure(s0) is the only state in
Dstates and it is marked
while there is an unmarked state T in Dstates do
begin
mark T
for each input symbol a do begin
U e-closure(move(T, a))
if (U is not in Dstates) then
add U as an unmarked state to
Dstates
DtranT, a U
end
end
Initial state e-closure(s0), Final state ?