91'304 Foundations of Theoretical Computer Science

About This Presentation

Title:

91'304 Foundations of Theoretical Computer Science

Description:

... (q,x) is the state of the machine after starting in state q and ... logical OR (A [ B) logical AND (A B) concatenation (A B) and star (A*) hard to prove! ... – PowerPoint PPT presentation

Number of Views:246

Avg rating:3.0/5.0

Slides: 138

Provided by: csU72

Category:

more less

Transcript and Presenter's Notes

Title: 91'304 Foundations of Theoretical Computer Science

1
91.304 Foundations of (Theoretical) Computer
Science

Chapter 1 Lecture Notes
David Martin
dm_at_cs.uml.edu

This work is licensed under the Creative Commons
Attribution-ShareAlike License. To view a copy of
this license, visit http//creativecommons.org/lic
enses/by-sa/2.0/ or send a letter to Creative
Commons, 559 Nathan Abbott Way, Stanford,
California 94305, USA.
2
Chapter 1 Regular Languages

Simple model of computation
Input a string, and either accept or reject it
Models a very simple type of function, a
predicate on strings f ? ! 0,1
See example of a state-transition diagram

3
Syntax of DFA

A deterministic finite automaton (DFA) is a
5-tuple (Q,?,delta,q0,F) such that
Q is a finite set of states
? (sigma) is an alphabet
?Q?!Q (delta) is the transition function
q02 Q (q naught) is the start state
F µ Q is the set of accepting states
Usually these names are used, but others are
possible as long as the role is clear

4
DFA syntax

It is deterministic because for every input
(q,c), the next state is a uniquely determined
member of Q
because the codomain of ? is Q
Fix the previous example to fit these constraints
The same example DFA, specified formally

5
DFA computation

This definition is different from but equivalent
to the one in the text
Let M(Q,?,?,q0,F) be a DFA. We define the
extended transition function ?Q?!
Qinductively as follows. For all q2 Q,
?(q,?) q.If w2? and c2?, let ?(q,wc)
?(?(q,w),c)
According to this definition, ?(q,x) is the
state of the machine after starting in state q
and reading the entire string x
See example

6
Language recognized by DFA

The language recognized by the DFA M is written
L(M) and defined as L(M)x2? ?(q0,x) 2 F
Think of L() as an operator that turns a program
into the language it specifies
We will use L() for other types of machines and
grammars too

7
Example

Let L2x20,1 the binary number x is a
multiple of 2 and build a DFA M2 such that
L(M2) L2
Remember this means L(M2) µ L2 and L2 L2 µ L(M2)

8
Definition of regular languages

A language L is regular if there exists a DFA M
such that L L(M)
The class of regular languages over the alphabet
? is called REG and defined REG L µ ? L
is regular L(M) M is a DFA
over ?
Now we know 4 classes of languages , FIN, REG,
and ALL

9
Problems

For all k1, let Ak0kn n0. Prove that (8
k1) Ak 2 REG
Solution is a scheme, not a single DFA
(Harder) Build a DFA for L3x20,1 the binary
number x is a multiple of 3
Build a DFA for L4x2a,b x contains an odd
of bs and an even of as

10
Measuring DFA complexity

Suppose
you have a DFA with states named 00000000 ..
11111111 (28 256 unique states)
an LCD attached to the thing showing the current
state name
? c (for clock pulse)
?(q, c) (q 1) 0xFF
This is a simple counter machine feed it clocks
and it counts upwards

11
Measuring DFA complexity

Time complexity
A DFA always takes one transition per input
character
So time complexity is not useful here
Program complexity
A DFAs program is (mostly) its ?
The model specifies no particular programming
language for ? its just a table mapping
(state, input) pairs to (state) outputs
Though it can sometimes be specified concisely,
as in ?(q, c) (q 1) 0xFF
Reprogram the clock for any permutation of 0,18
and ?s table remains just as big

12
Measuring DFA complexity

Space complexity the amount of memory used
But a DFA has no extra memory it only remembers
what state it is in
Cant look back or forward
So a DFA always uses the same amount of memory,
namely the amount of memory required to remember
what state its in
Needs to remember current element of Q
Can write down that number in log2 Q bits

13
DFAs as real computers

Consider a 256 MB computer that takes a finite
input and produces a finite output
Inputs clock pulses, interrupts, hard drive,
keyboard, mouse, network, etc.
Outputs video, hard drive, network, etc.
Can code everything in binary
But DFA only accepts or rejects input

14
Recognition model for functions

Can still sort of be modeled by a DFA
PC x y x,y 20,1 and the input x
produces the output y
Note character is just a separator
DFA plays the role of equipment verifier
Verifying correctness seems easier than computing
the output, but at least its related

15
Are DFAs reasonable?

One issue is that the programs dont seem to
reflect much about the problem being solved
If you can figure out how many bits of memory are
needed for the solution, then you can always
build a DFA based on that knowledge could be
tedious and really large
No difference in program complexity between same
amount of memory means DFAs dont help us see the
difference between programs very easily
Neural nets??

16
Are DFAs reasonable?

Similarly An 8-bit counter is structurally very
different than a 9-bit counter
More memory needed ) totally different ? program
needed
Not very modular!

17
Are DFAs reasonable?

Another issue is that DFAs prefer the beginning
of their inputs to the end of their inputs
L5 x20,1 the fifth digit from the left
of x is 0
L6 x20,1 the fifth digit from the right
of x is 0
DFAs know where the input begins but not where it
ends

18
Is REG reasonable?

We should be able to combine computations as
subroutines in simple ways
logical OR (A B)
logical AND (A Å B)
concatenation (A B) and star (A)
hard to prove!! motivation for NFA
compl?ment (Ac)
reversal (AR)
All above are easy to do as logic circuits
Will discuss further as closure under language
operations

19
Nondeterministic Finite Automata

Will relax two of these DFA rules
Each (state, char) input must produce exactly one
(state) output
Must consume one character in order to advance
state
Example L6 ?bob?
See M6
The NFA accepts the input if there exists any way
of reading the input that winds up in an
accepting state at the end of the string
Otherwise it rejects the input

20
NFAs

Thus the NFA rejects the input if there doesnt
exist any way of reading the input that winds up
in an accepting state at the end of the string
In other words every way of reading the input
leads to a nonaccepting state
Example M7
L7 ?

a
b
c
?
?
1
2
3
21
Ways to think of NFAs

NFAs want to accept inputs and will always take
the most advantageous alternative(s)
Because they will accept if there exists any way
to get to an accepting state at the end of the
string
The quickest way there may be just one of many
ways, but it doesnt matter
http//www.chompchomp.com/frag05/frag05.01.a.htm

22
Ways to think of NFAs
a
a
a

fork() model
Input string is in a variable
fork() at every nondeterministic choice point
subprocess 1 (parent) follows first transition
subprocess 2 (child) follows second
subprocess 3 (child) follows third (if any), etc.
A process that cant follow any transition calls
exit() -- and gives up its ability to accept
A process that makes it through the whole string
and is in an accepting state prints out ACCEPT
A single ACCEPT is enough

23
Syntax of DFA (repeat)

A deterministic finite automaton (DFA) is a
5-tuple (Q,?,delta,q0,F) such that
Q is a finite set of states
? is an alphabet
?Q ? ! Q is the transition
function
q02 Q is the start state
F µ Q is the set of accepting states
Usually these names are used, but others are
possible as long as the role is clear

24
Syntax of NFA

A nondeterministic finite automaton (NFA) is a
5-tuple (Q,?,delta,q0,F) such that
Q is a finite set of states
? is an alphabet
?Q(? ?)!P(Q) is the transition function
q02 Q is the start state
F µ Q is the set of accepting states
Usually these names are used, but others are
possible as long as the role is clear

25
Syntax of NFA

Definition ?? ? ?
Well use this frequently enough
Differences on state-transition diagram
?(1,a) 1 (not ?(1,a) 1)
?(1,?) 1, 2
?(3, c) 2, 3
?(2,a)
?(3,?) 3

a
b
c
?
?
1
2
3
c
Example M8
26
NFA computation

This next definition is different from but
equivalent to the one in the text
Books definition may be easier to understand at
first, but that makes its version of Theorem 1.39
(subset construction) harder
Goal a function ?Q?! P(Q) where ?(q,x) is
the set of all states reachable in the machine
after starting in state q and reading the entire
string x
Then for an NFA M, we will define something like
L(M) x2? ?(q0,x) contains some
accepting state

27
NFA computation

Let M(Q,?,?,q0,F) be an NFA. We define some
auxiliary functions
E Q ! P(Q) by ("?-closure")
E(q) p2 Q p is reachable from q by
following a chain of 0 or more ?
transitions
Although E takes elements of Q as input, we'll
also use it as a function that takes subsets of Q
as input (that is, elements of P(Q)). SoE P(Q)
! P(Q) by

In other words, given a set as input, just
process each element independently...
28
NFA computation

Thus E(q) is the set of all states you can get to
from q without reading any input
In M8, E(3) ? E(2,1) ?
We define a simple extension of ? that takes a
set of states as input
? Q ??! P(Q) (this comes with the NFA)
?P(Q)?? ! P(Q) defined by

Again, given a set as input, just process each
element independently...
29
NFA computation

We have a function E() that follows ?-transitions
and a function ? that behaves like ? but takes
sets as input
?Q?! P(Q) is defined inductively For all q2
Q, ?(q,?) E( q )
If w2? and c2?, let
?(q,wc) E(?(?(q,w),c))

30
NFA computation

Finally, we defineL(M) x2? ?(q0,x)
contains some accepting state
x2?
?(1,ac) E(?(?(1,a),c))
?(1,a)E(?(?(1,?),a))
?(1,?) ?
?(1,ac) ?

?(q0,x) Å F ?
31
Question

"How do I know when to follow ? transitions and
when not to?"
If you're talking about ?, then don't--it's the
program itself. ? can express that "there is an
? transition here" but you never go any further
than that one hop.
If you're talking about ?, then do--because it
includes E() as part of its definition, which is
there precisely in order to follow ? transitions

32
NFAs are good at union (or)

L2x20,1 the binary number x is a multiple
of 2
L3x20,1 the binary number x is a multiple
of 3
Let A L2 L3
NFA for A using guess-and-verify strategy
Preview of Theorem 1.45

33
The Subset Construction

Theorem 1.39 For every NFA M1 there exists a DFA
M2 such that L(M1) L(M2)
Proof idea Well, how does fork() work on a
uniprocessor machine?

34
The Subset Construction

Proof Let M1(Q1,?,?1,init1,F1) be the NFA and
define the DFA M2(Q2,?,?2,init2,F2) as follows
Q2 P(Q1).
Each state of the DFA records the set of states
that the NFA can simultaneously be in
Can compare DFA states for equality but also look
"inside" the state name to find a set of NFA
state names
Define ?2 Q2 ? ! Q2 ?2 P(Q1)? !
P(Q1) by
?2(S,a) E1(?1(S,a)) Go to whatever states
are reachable from the states in S and reading
the character a

Remember in an NFA,?1 Q1 ?? ! P(Q1) from
def ?1P(Q1)?? ! P(Q1) extend to sets E1P(Q1)
!P(Q1) ?-closure
35
The Subset Construction

init2 E(init1)
F2q 2 Q2 q Å F1? , in other wordsF2S µ
Q1 S Å F1?
The effect is that the DFA knows all states that
are reachable in the NFA after reading the string
so far. If any one of them is accepting, then
the current DFA state is accepting too, otherwise
it's not.
If you believe this then that's all it takes to
see that the construction is correct. So,
convince yourself with an example. QED

36
Subset construction example

Q2 ,1,2,3,1,2,1,3,2,3,1,2,3
(On board)
init21,2,3
F23,1,3,2,3,1,2,3

a
b
c
?
?
3
1
2
c
Example M8 (think of this as M1 in the
construction)
37
Be methodical

Need to compute ?2(1,2,3,c)
E1(?1(1,2,3,c))
By definition, ?1(1,2,3,c) ?1(1,c) ?1(2,c)
?1(3,c)
2,3
Then take E1( 2,3 ) 2,3
Save intermediate results for reuse
It's OK to eliminate unreachable states in
practice, even though that's not what the
construction really does

38
Subset construction conclusion

Adding nondeterminism makes programs shorter but
not able to do new things
Remember regular languages are defined to be
those "recognized by a DFA"
We now have a result that says that every
language that is recognized by an NFA is regular
too
So if you are asked to show that a language is
regular, you can exhibit a DFA or NFA for it and
rely on the subset construction theorem
Sometimes questions are specifically about DFAs
or NFAs, though... pay attention to the precise
wording

39
More NFA examples

Write an NFA for ab,abc with 3 states
NFA and DFA for ? over ?0,1
Rule ? 2 L(M) , ?
NFA and DFA for over ?0,1

40
Closure properties

The presence or absence of closure properties
says something about how well a set tolerates an
operation
Definition. Let S µ U be a set in some universe
U and be an operation on elements of U. We say
that S is closed under if applying to
element(s) of S produces another element of S.
For example, if is a binary operation UU!U,
then we're saying that (8 x2S and y2S) x y 2 S

41
Closure properties illustrated
U
Applying the operation to elements of S never
takes you ouside of S. S is closed with respect
to This example shows unary operations

S
42
Closure properties

Having a closure property usually means there is
some type of "natural fit" between the operation
and the set
Examples
N is closed under and and but not - and
Z is closed under and - and and unary -
(negation) but not or
Q-0 is closed under and but not or -

43
More examples

L1x2 0,1 x is a multiple of 3
is closed under string reversal and concatenation
L3x20,1 the binary number x is a multiple
of 3
is also closed under string reversal and
concatenation, harder to see though
L4x2a,b x contains an odd of bs and an
even of as
is closed under string reversal
is not closed under string concatenation

44
Closure higher abstraction

We will usually be concerned with closure of
language classes under language operations
Previous examples were closure of sets containing
non-set elements under various familiar
operations
We consider DFAs and NFAs to be programs and we
want assurance that their outputs can be combined
in desired ways just by manipulating their
programs (like using one as a subroutine for the
other)
Representative question is REG closed under
(language) concatenation?

45
The regular operations

The regular operations on languages are
(union)
(concatenation)
(Kleene star)
The name "regular operations" is not that
important
Too bad we use the word "regular" for so much
REG is closed under these regular operations
That's why they're called "regular" operations
This does not mean that each regular language is
closed under each of these operations!

46
The regular operations

REG is closed under union Theorem 1.25 (using
DFAs), Theorem 1.45 (using NFAs)
REG is closed under concatenation Theorem 1.47
(NFAs)
REG is closed under Theorem 1.49 (NFAs)
Study these constructions!!
REG is also closed under complement and reversal
(not in book)

47
Regular expressions

You are probably familiar with these
Example "int .\(.\)" is a (flex format)
regular expression that appears to match C
function prototypes that return ints
In our treatment, a regular expression is a
program that generates a language of matching
strings when you "run it"
We will use a very compact definition that
simplifies things later

48
Regular expressions

Definition. Let ? be an alphabet not containing
any of the special characters in this list ?
) ( We define the syntax of the
(programming) language REX(?), abbreviated as
REX, inductively
Base cases
For all a2?, a2REX. In other words, each single
character from ? is a regular expression all by
itself.
?2REX. In other words, the literal symbol ? is a
regular expression. In this context it is not
the empty string but rather the single-character
name for the empty string.
2REX. Similarly, the literal symbol is a
regular expression.

49
Regular expressions

Definition continued
Induction cases
For all r1, r22 REX,( r1 r2 ) 2 REX
also
For all r1, r22 REX,( r1 r2 ) 2 REX also

literal symbols
variables
50
Regular expressions

Definition continued
Induction cases continued
For all r 2 REX,( r ) 2 REX also
Examples over ?0,1
? and 0 and 1 and
(((10)(?)))
?? is not a regular expression
Remember, in the context of regular expressions,
? and are ordinary characters

51
Semantics of regular expressions

Definition. We define the meaning of the
language REX(?) inductively using the L()
operator so that L(r) denotes the language
generated by r as follows
Base cases
For all a2?, L(a) a . A single-character
regular expression generates the corresponding
single-character string.
L(?) ? . The symbol for the empty string
actually generates the empty string.
L() . The symbol for the empty language
actually generates the empty language.

52
Regular expressions

Definition continued
Induction cases
For all r1, r22 REX,L( (r1 r2) ) L(r1)
L(r2)
For all r1, r22 REX,L( (r1 r2) ) L(r1)
L(r2)
For all r 2 REX,L( ( r ) ) (L(r))
No other string is in REX(?)
Example
L( ( ((10)(?)) ) ) includes
?,10,1010,101010,10101010,...

53
Orientation

We used highly flexible mathematical notation and
state-transition diagrams to specify DFAs and
NFAs
Now we have a precise programming language REX
that generates languages
REX is designed to close the simplest languages
under , ,

54
Abbreviations

Instead of parentheses, we use precedence to
indicate grouping when possible.
(highest)
(lowest)
Instead of , we just write elements next to
each other
Example (((10)(?))) can be written as
(10(?)) but there is no further abbreviation
(Not in text) If r2 REX(?), instead of writing
rr, we write r

55
Abbreviations

Instead of writing a union of all characters from
? together to mean "any character", we just write
?
In a flex/grep regular expression this would be
called "."
Instead of writing L(r) when r is a regular
expression, we consider r alone to simultaneously
mean both the expression r and the language it
generates, relying on context to disambiguate

56
Abbreviations

Caution regular expressions are strings
(programs). They are equal only when they
contain exactly the same sequence of characters.
(((10)(?))) can be abbreviated (10(?))
however (((10)(?))) ? (10(?)) as strings
but (((10)(?))) (10(?)) when they are
considered to be the generated languages
more accurately then, L( (((10)(?))) )
L( (10(?)) )
L( (10) )

57
Facts

REX(?) is itself a language over an alphabet ?
that is
? ? ) , ( , , , ? ,
For every ?, REX(?) 1
,(),(()),...
even without knowing ? there are infinitely many
elements in REX(?)
Question Can we find a DFA or NFA M with L(M)
REX(?)?

58
Examples

Find a regular expression for w20,1 w ?
10
Find a regular expression for x20,1 the
6th digit counting from the rightmost
character of x is 1
Find a regular expression forL3x20,1 the
binary number x is a multiple of 3

59
The DFA for L3
1
0
1
0
1
0
2
0
1
(0 1 0)
Regular expression(0 1 _____________ 1 )
60
Regular expression for L3

(0 1 (0 1 0) 1 )
L3 is closed under concatenation, because of the
overall form ( )
Now suppose x2L3. Is xR 2 L3?
Yes see this is by reversing the regular
expression and observing that the same regular
expression results
So L3 is also closed under reversal

61
Regular expressions generate regular languages

Lemma 1.55 For every regular expression r, L(r)
is a regular language.
Proof by induction on regular expressions.
We used induction to create all of the regular
expressions and then to define their languages,
so we can use induction to visit each one and
prove a property about it

62
L(REX) µ REG

Base cases
For every a2 ?, L(a) a is obviously
regular
L(?) ? 2 REG also
L() 2 REG

a
63
L(REX) µ REG

Induction cases
Suppose the induction hypothesis holds for r1 and
r2. Namely, L(r1) 2 REG and L(r2) 2 REG. We
want to show that L( (r1 r2) ) 2 REG also. But
look by definition, L( (r1 r2) ) L(r1)
L(r2)
Since both of these languages are regular, we
can apply Theorem 1.45 (closure of REG under )
to conclude that their union is regular.

64
L(REX) µ REG

Induction cases
Now suppose L(r1)2 REG and L(r2)2 REG. By
definition, L( (r1 r2) ) L(r1) L(r2)
By Theorem 1.47, this concatenation is regular
too.
Finally, suppose L(r)2 REG. Then by
definition, L( (r) ) (L(r))
By Theorem 1.49, this language is also regular.
QED

65
On to REG µ L(REX)

Now we'll show that each regular language (one
accepted by an automaton) also can be described
by a regular expression
Hence REG L(REX)
In other words, regular expressions are
equivalent in power to finite automata
This equivalence is called Kleene's Theorem (1.54
in book)

66
Converting DFAs to REX

Lemma 1.60 in textbook
This approach uses yet another form of finite
automaton called a GNFA (generalized NFA)
The technique is easier to understand by working
an example than by studying the proof

67
Syntax of GNFA

A generalized NFA is a 5-tuple (Q,?,?,qs,qa) such
that
Q is a finite set of states
? is an alphabet
?(Q-qa)(Q-qs)! REX(?) is the transition
function
qs2 Q is the start state
qa2 Q is the (one) accepting state

68
GNFA syntax summary

Arcs are labeled with regular expressions
Meaning is that "input matching the label moves
from old state to new state" -- just like NFA,
but not just a single character at a time
Start state has no incoming transitions, accept
has no outgoing
Every pair of states (except start accept) has
two arcs between them
Every state has a self-loop (except start
accept)

69
Construction strategy

Will convert a DFA into a GNFA then iteratively
shrink the GNFA until we end up with a diagram
like thismeaning that exactly that input
that matches the giant regular expression is in
the langauge

giant regular expression
qa
qs
70
Converting DFA to GNFA
1
0
1
0
DFA
1
0
2
0
1
qa
1
0
Adding new start state qs is straightforward Then
make each DFA accepting state have an ?
transition to the single accepting state qa
1
0
?
1
2
0
0
1
?
qs
GNFA
71
Interpreting arcs

?(Q-qa)(Q-qs)! REX(?)In this diagram,
?(0,1)1 ?(2,0) ?(2,qa)
?(1,1) ?(2,2)1 ?(0,qa)?

qa
1
0
1
0
?
1
2
0
0
1
?
qs
72
Eliminating a GNFA state

We arbitrarily choose an interior state (not qs
or qa) to rip out of the machine

Question how is the ability of state i to get to
state j affected when we remove rip? Only the
solid and labeled states and transitions are
relevant to that question
R4
i
j
R1
R3
rip
R2
73
Eliminating a GNFA state

We produce a new GNFA that omits rip
Its i-to-j label will compensate for the missing
state
We will do this for every (i,j) 2
(Q-qa)(Q-qs)
So we have to rewrite every label in order to
eliminate this one state
New label for i-to-j is
R4 (R1 (R2) R3)

R4
i
j
R1
R3
rip
R2
74
Don't overlook

The case (i,i) 2 (Q-qa)(Q-qs)
New label for i-to-i is still
R4 (R1 (R2) R3)
Example proceeds on whiteboard, or see textbook
for a different one

R4
i
R3
R1
rip
R2
75
g/re/p

What does grep do?
(int float)_rec.emp becomes
(?)(int float)_rec(?)emp(?)
What does it mean?
How does it work?
Regular expression ! NFA ! DFA ! state reduction
Then run DFA against each line of input, printing
out the lines that it accepts

76
State machines

Very common programming technique
while (true)
switch (state)
case NEW_CONNECTION
process_login()
stateRECEIVE_CMD
break
case RECEIVE_CMD
if (process_cmd() CMD_QUIT)
stateSHUTDOWN
break
case SHUTDOWN

77
This course so far

1.1 Introduction to languages DFAs
1.2 NFAs and DFAs recognize the same class of
languages
1.3 REX generates the same class of languages
Three different programming "languages" specified
in different levels of formality that solve the
same types of computational problems
Four, if you count GNFAs
Five, if you count UFAs

78
Strategies

If you're investigating a property of regular
languages, then as soon as you know L 2 REG, you
know there are DFAs, NFAs, Regexes that describe
it. Use whatever representation is convenient
But sometimes you're investigating the properties
of the programs themselves changing states,
adding a to a regex, etc. Then the knowledge
that other representations exist might be
relevant and might not

79
All finite languages are regular

Theorem (not in book) FIN µ REG
Proof Suppose L 2 FIN.
Then either L , or L s1, s2, ?, sn where
n2N and each si2?.
A regular expression describing L is, therefore,
either or
s1 s2 ? sn QED
Note that this proof does not work for n1

80
Picture so far
ALL
Each point is a language in this Venn
diagram REG L(DFA) L(NFA) L(REX)
L(UFA) L(GNFA) ? FIN
REG
is there a language out here?
FIN
"the class of languages generated by DFAs"
81
1.4 Nonregular languages

For each possible language L,
µ L. So is the smallest language. And is
regular
L µ ?. So ? is the largest language. And ? is
regular
Yet there are languages in between these two
extremes that are not regular

82
A nonregular language

B 0n 1n n 0
?, 01, 0011, 000111, ?
is not regular
Why?
Q how many bits of memory would a DFA need in
order to recognize B?
A there appears to be no single number of bits
that's big enough to work for every element of B
Remember, the DFA needs to reject all strings
that are not in B

83
Other examples

C w20,1 n0(w) n1(w)
Needs to count a potentially unbounded number of
'0's... so nonregular
D w20,1 n01(w) n10(w)
Needs to count a potentially unbounded number of
'01' substrings... so ??
Need a technique for establishing nonregularity
that is more formal and... less intuitive?

84
Proving nonregularity

To prove a language that a language is
nonregular, you have to show that no DFA
whatsoever recognizes the language
Not just the DFA that is your best effort at
recognizing the language
The pumping lemma can be used to do that
The pumping lemma says that every regular
language satisfies the "regular pumping property"
(RPP)
Given this, if we can show that a language like B
doesn't satisfy the RPP, then it's not regular

85
Pumping lemma, informally

Roughly "if a regular language contains any
'long' strings, then it contains infinitely many
strings"
Start with a regular language and suppose that
some DFA M(Q,?,?,q0,F) for it has Q10 states.
What if M accepts some particular string s where
sc1c2?c15 so that s15?

q0
86
Pigeonhole principle

With 15 input characters, the machine will visit
at most 16 states
But there are only 10 states in this machine
So clearly it will visit at least one of its
states more than once
Let rpt be our name for the first state that is
visited multiple times on that particular input s
Let acc be our name for the accepting state that
s leads to, namely, ?(q0,s) acc
Let y be our name for the leftmost substring of s
for which ?(rpt, y)rpt
Since there are no ? transitions in a DFA, a
state being "visited multiple times" means that
it read at least one character. Therefore, y gt
0

87
sequence of states that M visits after
readingthe characters below
gt0
10
After reading c1? c10 (first 10 chars of s), M
must have already been to state rpt and returned
to it at least once... because there are only 10
states in M. Of course the repetition could have
been encountered earlier than 10 characters too...
88
sequence of states that M visits after
readingthe characters below
gt0
10
Assigning new names to the pieces of s...
89
sequence of states that M visits after
readingthe characters below
gt0
10
Assigning new names to the pieces of s... So s
xyz as shown above. With these names, the other
constraints can be written y gt 0 xy 10
90
M accepts other strings too

Consider the string xz

91
M accepts other strings too

Consider the string xz
?(q0,x) rpt
?(rpt,z) acc (from previous slide)
So xz 2 L(M) too

92
M accepts other strings too

Consider the string xyyz
?(q0,xy)rpt (from 2 slides ago)
? (rpt,y)rpt (from same previous result)
? (rpt,z)acc (from same previous result)
So xyyz2 L(M) also
Apparently we can repeat y as many times as we
want

93
p-regular-pumpable strings

Definition (not in textbook) A string s is said
to be p-regular-pumpable in a language L µ ? if
there exist x,y,z 2 ? such that
sxyz ("x,y,z are a decomposition of s")
ygt0
xy p
For all i 0,
x yi z 2 L ("the y part of s can be pumped
to produce other strings in the language")
It follows that s must be a member of L for it to
be p-pumpable
The 15-character string s in the previous example
was 10-pumpable in L(M)

94
p-regular-pumpable languages

Definition A language L is p-regular-pumpable if
for every s 2 L such that s p, the string s
is p-pumpable in L
in other words, "every long enough string in L is
pumpable"
Our previous example language was
15-regular-pumpable

95
RPP(p) and RPP

Definition RPP(p) is the class of languages that
are p-regular-pumpable. In other words,RPP(p)
Lµ? L is p-regular-pumpable
Definition RPP is the class of languages that are
p-regular pumpable for some p. In other
words,
Lots of notation and apparent complexity, but the
idea is simple RPP is the class of languages in
which every long string is pumpable

96
Pumping lemma

Theorem 1.70 (rephrased) If Lµ? is recognized
by a p-state DFA, then L 2 RPP(p)
Proof Just like our example, but use p instead of
the constant 15 (number of states)
Corollaries
REG µ RPP

Primary application of Pumping Lemma
97
Proving a language nonregular

First unravel these definitions, but it amounts
to proving that L is not a member of RPP. Then
it follows that L isn't regular
Proving that L isn't in RPP allows you to
concentrate on the language rather than
considering all possible proposed programs that
might recognize it

98
Unraveling RPP a direct rephrasing

Rephrasing L is a member of RPP if
There exists p0 such that
For every s2L satisfying s p,
There exist x,y,z 2 ? such that
sxyz
ygt0
xy p
For all i 0,
x yi z 2 L

(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated
99
Question from last time

(Question) Didn't you earlier say "regular
languages are closed under concatenation"?
(Answer) No, I wrote that REG is closed under
concatenation
Subtle but important distinction. REG (the class
of all regular languages) is closed under
language concatenation
If A,B2REG then AB2REG
That does not mean that each regular language is
itself closed under string concatenation
10, 1 2 REG but 101 10, 1

100
Nonregularity proof by contradiction

Claim Let B 0n 1n n 0 . Then B is not
regular
Proof We show that B is not a member of RPP by
contradiction.
So assume that B 2 RPP (and hope to reach a
contradiction soon). Then there exists p 0
associated with the definition in RPP.
We let s 0p 1p. (Not the exact same variable
as in the RPP property, but an example of one
such possible setting of it.) Now we know that s
2 B because it has the right form.

101
Proof continued

Now s 2p p. By assumption that B 2 RPP,
there exist x,y,z such that
sxyz ( 0p 1p, remember)
ygt0
xy p
For all i 0,
x yi z 2 B
Part (3) implies that xy 2 0 because the first
p-many characters of sxyz are all 0
So y consists solely of '0' characters
... at least one of them, according to (2)

102
Proof continued

But consider
s xyz xy1z 0p 1p (where we started)
y consists of one or more '0' characters
so xy2z contains more '0' characters than '1'
characters. In other words,
xy2z 0py 1p
so xy2z B 0n 1n n 0 .
This contradicts part (4)!!
Since the contradiction followed merely from the
assumption that B2RPP (and right and meet and
true reasoning about which we have no doubt),
that assumption must be wrong QED

103
Observations

We needed (and got) a contradiction that was a
necessary consequence of the assumption that B 2
RPP and then relied on the Theorem 1.70
corollaries
RPP mainly concerns strings that are longer than
p
So you should concentrate on strings longer than
p...
even though p is a variable. But clearly
0p1pgtp
In our example we didn't "do" much after our
initial choice of s and thinking about the
implications we found a contradiction right away
Many other choices of s would work, but many
don't, and even some that do work require more
complex argumentsfor example, s0bp/2c1
1bp/2c1
Choosing s wisely is usually the most important
thing

104
Picture so far
ALL
Each point is a language in this Venn diagram
RPP
We'll see anexample later
0(101)
REG
0101, ?
FIN
B 0n 1n n 0
105
More on contradictions

Consider this shortcut attempt to prove that B
0n 1n n 0 is not regular
Proof Suppose B2 RPP. By RPP,
There exists p0 such that
For every s2B satisfying s p,
There exist x,y,z 2 ? such that
sxyz
ygt0
xy p
For all i 0,
x yi z 2 B
So let s (1010)p. Then s B, which is
inconsistent with the RPP statement.
Contradiction??

NO
106
Simplifying RPP proofs

I find it easier to forget about contradiction
proofs and instead prove directly that a language
is not in RPP
So we need a direct, formal version of of the
statement that L RPP

107
Unraveling RPP (repeat)

Rephrasing L is a member of RPP if
There exists p0 such that
For every s2L satisfying s p,
There exist x,y,z 2 ? such that
sxyz
ygt0
xy p
For all i 0,
x yi z 2 L

(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated
108
Unraveling non-RPP

Rephrasing L is not in RPP if
For every p0
There exists some s2L satisfying s p such
that
For every x,y,z 2 ? satisfying 1-3
sxyz,
ygt0, and
xy p
There exists some i 0 for which
x yi z L

(8 p) (9 s) (8 x,y,z) (9 i) Still complicated
but you don't have to use contradiction now
109
A direct proof of nonregularity

Let Dan2 n0 ?,a1,a4,a9, ? ('a' is just
some character). Then D is not regular.
Proof idea The pumping lemma says there's a
fixed-size loop in any DFA that accepts long
strings. You can repeat the characters in that
loop as many times as you want to get longer
strings that the machine accepts. Each time you
add a repetition you grow the pumped string by a
constant length.
But the spacing between strings in D above keeps
changing it's never constant. So D doesn't have
the pumping property.

110
A direct proof of nonregularity

Let Dan2 n0 ?,a1,a4,a9, ?. Then D is
not in RPP and thus not regular.
Proof Let p0 and set sa(p1)2. Then s2D and
sgtp (so such an s certainly exists).
Now let x,y,z2? be any strings satisfying
xyz s a(p1)2
ygt0, and
xy p
Our goal is to produce some i such that xyiz D

111
Direct proof continued

(We'll actually show that xy0z D)
Observe that yaj for some 1 j p, so
xy0z a(p1)2-j lt (p1)2
Since j p we know that -j -p and thus
xy0z (p1)2 - j
(p1)2 - p
p2 p 1
gt p2
In other words, xy0z has gt p2 characters and lt
(p1)2 characters. So xy0z is not a perfect
square and thus xy0z D. QED

112
Direct or contradiction proof?

Both work fine... it's your choice
But you must clearly state what you are doing
If proof by contradiction, say so
If direct proof, say so

113
Game theory formulation

The direct proof technique can be formulated as a
two-player game
You are the player who wants to establish that L
is not pumpable
Your opponent wants to make it difficult for you
to succeed
Both of you have to play by the rules

114
Game theory continued

The game has just four steps.
Your opponent picks p0
You pick s2L such that s p
Your opponent chooses x,y,z 2 ? such that sxyz,
ygt0, and xy p
You produce some i 0 such that xyiz L

115
Game theory continued

If you are able to succeed through step 4, then
you have won only one round of the game
Like winning one round of Tic-tac-toe
Do example for a member of D
To show that a language is not in RPP you must
show that you can always win, regardless of your
opponent's legal moves
Realize that the opponent is free to choose the
most inconvenient or difficult p and x,y,z
imaginable that are consistent with the rules

116
Game theory continued

So you have to present a strategy for always
winning and convincingly argue that it will
always win
So your choices in steps 2 4 have to depend on
the opponent's choices in steps 1 3
And you don't know what the opponent will choose
So your choices need to be framed in terms of the
variables p, x, y, z

117
Game theory continued

Ultimately it is not very different from the
direct proof
But it states clearly what choices you may make
and what you may not a common cause of errors
in proofs
Repeat previous proof in this framework

118
A direct proof of nonregularity
Step 1, opponent's choice
Step 2, your choice and reasoning

Let Dan2 n0 ?,a1,a4,a9, ?. Then D is
not in RPP and thus not regular.
Proof Let p0 and set sa(p1)2. Then s2D and
sgtp (so such an s certainly exists).
Now let x,y,z2? be any strings satisfying
xyz s a(p1)2
ygt0, and
xy p
Our goal is to produce some i such that xyiz D

Step 3, opponent's choice
119
Direct proof continued

(We'll actually show that xy0z D)
Observe that yaj for some 1 j p, so
xy0z a(p1)2-j lt (p1)2
Since j p we know that -j -p and thus
xy0z (p1)2 - j
(p1)2 - p
p2 p 1
gt p2
In other words, xy0z has gt p2 characters and lt
(p1)2 characters. So xy0z is not a perfect
square and thus xy0z D. QED

Step 4, your choice
Step 4, your reasoning
120
Unraveling RPP (repeat)

Rephrasing L is a member of RPP if
There exists p0 such that
For every s2L satisfying s p,
There exist x,y,z 2 ? such that
sxyz
ygt0
xy p
For all i 0,
x yi z 2 L
Theorem REG µ RPP

121
Structural facts about RPP

If L 2 RPP(p) (meaning "strings in L with length
p are pumpable") and qgtp then L 2 RPP(q)
If L RPP(q) and qgtp then L RPP(p)
(contrapositive of 1)
Thus if you have a proof that establishes L
RPP(q) only when q5, that's good enough it
follows that L is not regular
Relevant for C is not regular problem

122
Structural facts about RPP

If L 2 FIN and the longest string in L has length
n, then
L 2 RPP(n1)
L RPP(q) for all q lt n1
Note RPP is a class of languages that's only
interesting because of its relation to REG. It
is not a reasonable proposal for a computation
model!

123
Unraveling non-RPP (repeat)

L is not in RPP if
For every p0 (opponent choice)
There exists some s2L satisfying s p such
that (your choice)
For every x,y,z 2 ? satisfying 1-3
sxyz,
ygt0, and
xy p
There exists some i 0 for which
x yi z L

(opponent's)
(yours)
124
Another example

Let C 0m 1n m ? n . Is C regular? Try to
prove it isn't
Set s0p 12p. If opponent chooses x?, y0p,
z12p, then we can set i2 and win because
xy2z02p 12p C.
What if opponent chooses a shorter y?
Looks like it's relatively easy to be a member of
C and hard to not be a member of C
Can force opponent to choose y 2 0
So try to arrange it so that no matter what y
is, some number of repetitions of it will match
the target number of '1's

125
Direct proof?

Hmmm

126
Using closure properties

Can simplify argument a great deal
Fact If L is not regular then Lc is not regular
either.
Proof If L is not regular but Lc were regular,
then (Lc)c would also be regular because REG is
closed under complement. But (Lc)c L QED
Recall the languagesB 0m 1n m n C
0m 1n m ? n C is similar to Bc...

127
Using closure properties

Start over
B 0m 1n m n (known nonreg)C 0m 1n
m ? n (suspected nonreg)
Certainly B µ Cc
If mn then it's true that (not m ? n)
But B ? Cc
Find example x 2 Cc - B...
On the other hand, B 01 Å Cc

128
Using closure properties

Fact If L1ÅL2 REG and L1 2 REG, then L2 REG
Proof Suppose (a) L1Å L2 REG and L12 REG and
(b) L22REG. Since REG is closed under Å we know
that L1ÅL2 2 REG, but that contradicts assumption
(a). Thus (a) and (b) can't both be true. QED

129
Topics for Exam 1

Basic objects
The main hierarchy alphabets, strings,
languages, classes
Functions
Relations
Sets and operations on sets
, Å, complement, , P(S), A-B, S
µ, 2
element predicate(element)
Propositional and predicate logic
8 and 9

130
Topics for Exam 1

Strings
? versus
Operations on strings concatenation,
exponentiation, reversal
Languages
Operations concatenation, exponentiation,
reversal, , Å, , complement, everything
applicable to sets, ? versus
Language classes
FIN, REG, ALL

131
Topics for Exam 1

REG and its many formulations
DFA, NFA, GNFA, UFA, REX
Syntax and semantics of each model
L() as program-to-language operator
Conversions between models
Subset construction for NFA, UFA
DFA ! GNFA ! REX
REX ! NFA

132
Topics for Exam 1

Closure properties of language classes
REG as a reasonable model of computation
Arguments for, against
Homework problems through homework 3
Lectures reading up through section 1.3
(excluding nonregularity)

133
Exam 1

You may bring and consult a single-sided,
handwritten sheet of notes, which you must turn
in with the exam (and will get back later)

134
Applying these closure properties

B 0m 1n m n C 0m 1n m ? n
01 Å Cc B
Thus C is nonregular too

obviously regular
known to be nonregular
therefore nonregular
135
Another closure properties attempt

B 0m 1n m n 0n 1n n 0
(known nonreg)
BB 0n1n 0m1m n,m 0
Want to show that BB REG
We know that REG is closed under language
concatenation. What does that say about whether
BB is regular or not?
Is the class of non-regular languages (REGc)
closed under language concatenation too?

136
No

Let ? a and D an2 n 2
Then Dc ak k 1 or k is not a square
?, a1,a2,a3,a5,a6,a7,a8,a10,?
We previously proved that D REG
Thus Dc REG (by "fact" we proved)
But Dc Dc a 2 REG !!!
Thus REGc is not closed under language
concatenation

137
Back to problem

B 0m 1n m n (known nonreg)BB 0n1n
0m1m n,m 0
Want to show that BB REG
But there's no general result for that
When applying a closure property, you have to
make sure it's true!
Nonetheless, it is true that BB REG
Because (BB) Å 01 B

138
Chapter 1 closing considerations

We don't and won't have many results about the
class REGc
Being nonregular says that the language lacks a
certain type of structure it's more complicated
than a DFA can handle
All real computers are finite devices and all
finite languages are regular
Yet the programming models are brittle the
program has to change for larger and larger
inputs
We've seen some easy-to-specify languages that
aren't regular
So REG is not a good general-purpose programming
model...?