# 91'304 Foundations of Theoretical Computer Science - PowerPoint PPT Presentation

1 / 137
Title:

## 91'304 Foundations of Theoretical Computer Science

Description:

### ... (q,x) is the state of the machine after starting in state q and ... logical OR (A [ B) logical AND (A B) concatenation (A B) and star (A*) hard to prove! ... – PowerPoint PPT presentation

Number of Views:169
Avg rating:3.0/5.0
Slides: 138
Provided by: csU72
Category:
Tags:
Transcript and Presenter's Notes

Title: 91'304 Foundations of Theoretical Computer Science

1
91.304 Foundations of (Theoretical) Computer
Science
• Chapter 1 Lecture Notes
• David Martin
• dm_at_cs.uml.edu

enses/by-sa/2.0/ or send a letter to Creative
Commons, 559 Nathan Abbott Way, Stanford,
California 94305, USA.
2
Chapter 1 Regular Languages
• Simple model of computation
• Input a string, and either accept or reject it
• Models a very simple type of function, a
predicate on strings f ? ! 0,1
• See example of a state-transition diagram

3
Syntax of DFA
• A deterministic finite automaton (DFA) is a
5-tuple (Q,?,delta,q0,F) such that
• Q is a finite set of states
• ? (sigma) is an alphabet
• ?Q?!Q (delta) is the transition function
• q02 Q (q naught) is the start state
• F µ Q is the set of accepting states
• Usually these names are used, but others are
possible as long as the role is clear

4
DFA syntax
• It is deterministic because for every input
(q,c), the next state is a uniquely determined
member of Q
• because the codomain of ? is Q
• Fix the previous example to fit these constraints
• The same example DFA, specified formally

5
DFA computation
• This definition is different from but equivalent
to the one in the text
• Let M(Q,?,?,q0,F) be a DFA. We define the
extended transition function ?Q?!
Qinductively as follows. For all q2 Q,
?(q,?) q.If w2? and c2?, let ?(q,wc)
?(?(q,w),c)
• According to this definition, ?(q,x) is the
state of the machine after starting in state q
and reading the entire string x
• See example

6
Language recognized by DFA
• The language recognized by the DFA M is written
L(M) and defined as L(M)x2? ?(q0,x) 2 F
• Think of L() as an operator that turns a program
into the language it specifies
• We will use L() for other types of machines and
grammars too

7
Example
• Let L2x20,1 the binary number x is a
multiple of 2 and build a DFA M2 such that
L(M2) L2
• Remember this means L(M2) µ L2 and L2 L2 µ L(M2)

8
Definition of regular languages
• A language L is regular if there exists a DFA M
such that L L(M)
• The class of regular languages over the alphabet
? is called REG and defined REG L µ ? L
is regular L(M) M is a DFA
over ?
• Now we know 4 classes of languages , FIN, REG,
and ALL

9
Problems
• For all k1, let Ak0kn n0. Prove that (8
k1) Ak 2 REG
• Solution is a scheme, not a single DFA
• (Harder) Build a DFA for L3x20,1 the binary
number x is a multiple of 3
• Build a DFA for L4x2a,b x contains an odd
of bs and an even of as

10
Measuring DFA complexity
• Suppose
• you have a DFA with states named 00000000 ..
11111111 (28 256 unique states)
• an LCD attached to the thing showing the current
state name
• ? c (for clock pulse)
• ?(q, c) (q 1) 0xFF
• This is a simple counter machine feed it clocks
and it counts upwards

11
Measuring DFA complexity
• Time complexity
• A DFA always takes one transition per input
character
• So time complexity is not useful here
• Program complexity
• A DFAs program is (mostly) its ?
• The model specifies no particular programming
language for ? its just a table mapping
(state, input) pairs to (state) outputs
• Though it can sometimes be specified concisely,
as in ?(q, c) (q 1) 0xFF
• Reprogram the clock for any permutation of 0,18
and ?s table remains just as big

12
Measuring DFA complexity
• Space complexity the amount of memory used
• But a DFA has no extra memory it only remembers
what state it is in
• Cant look back or forward
• So a DFA always uses the same amount of memory,
namely the amount of memory required to remember
what state its in
• Needs to remember current element of Q
• Can write down that number in log2 Q bits

13
DFAs as real computers
• Consider a 256 MB computer that takes a finite
input and produces a finite output
• Inputs clock pulses, interrupts, hard drive,
keyboard, mouse, network, etc.
• Outputs video, hard drive, network, etc.
• Can code everything in binary
• But DFA only accepts or rejects input

14
Recognition model for functions
• Can still sort of be modeled by a DFA
• PC x y x,y 20,1 and the input x
produces the output y
• Note character is just a separator
• DFA plays the role of equipment verifier
• Verifying correctness seems easier than computing
the output, but at least its related

15
Are DFAs reasonable?
• One issue is that the programs dont seem to
reflect much about the problem being solved
• If you can figure out how many bits of memory are
needed for the solution, then you can always
build a DFA based on that knowledge could be
tedious and really large
• No difference in program complexity between same
amount of memory means DFAs dont help us see the
difference between programs very easily
• Neural nets??

16
Are DFAs reasonable?
• Similarly An 8-bit counter is structurally very
different than a 9-bit counter
• More memory needed ) totally different ? program
needed
• Not very modular!

17
Are DFAs reasonable?
• Another issue is that DFAs prefer the beginning
of their inputs to the end of their inputs
• L5 x20,1 the fifth digit from the left
of x is 0
• L6 x20,1 the fifth digit from the right
of x is 0
• DFAs know where the input begins but not where it
ends

18
Is REG reasonable?
• We should be able to combine computations as
subroutines in simple ways
• logical OR (A B)
• logical AND (A Å B)
• concatenation (A B) and star (A)
• hard to prove!! motivation for NFA
• compl?ment (Ac)
• reversal (AR)
• All above are easy to do as logic circuits
• Will discuss further as closure under language
operations

19
Nondeterministic Finite Automata
• Will relax two of these DFA rules
• Each (state, char) input must produce exactly one
(state) output
• Must consume one character in order to advance
state
• Example L6 ?bob?
• See M6
• The NFA accepts the input if there exists any way
of reading the input that winds up in an
accepting state at the end of the string
• Otherwise it rejects the input

20
NFAs
• Thus the NFA rejects the input if there doesnt
exist any way of reading the input that winds up
in an accepting state at the end of the string
• In other words every way of reading the input
• Example M7
• L7 ?

a
b
c
?
?
1
2
3
21
Ways to think of NFAs
• NFAs want to accept inputs and will always take
• Because they will accept if there exists any way
to get to an accepting state at the end of the
string
• The quickest way there may be just one of many
ways, but it doesnt matter
• http//www.chompchomp.com/frag05/frag05.01.a.htm

22
Ways to think of NFAs
a
a
a
• fork() model
• Input string is in a variable
• fork() at every nondeterministic choice point
• subprocess 1 (parent) follows first transition
• subprocess 2 (child) follows second
• subprocess 3 (child) follows third (if any), etc.
• A process that cant follow any transition calls
exit() -- and gives up its ability to accept
• A process that makes it through the whole string
and is in an accepting state prints out ACCEPT
• A single ACCEPT is enough

23
Syntax of DFA (repeat)
• A deterministic finite automaton (DFA) is a
5-tuple (Q,?,delta,q0,F) such that
• Q is a finite set of states
• ? is an alphabet
• ?Q ? ! Q is the transition
function
• q02 Q is the start state
• F µ Q is the set of accepting states
• Usually these names are used, but others are
possible as long as the role is clear

24
Syntax of NFA
• A nondeterministic finite automaton (NFA) is a
5-tuple (Q,?,delta,q0,F) such that
• Q is a finite set of states
• ? is an alphabet
• ?Q(? ?)!P(Q) is the transition function
• q02 Q is the start state
• F µ Q is the set of accepting states
• Usually these names are used, but others are
possible as long as the role is clear

25
Syntax of NFA
• Definition ?? ? ?
• Well use this frequently enough
• Differences on state-transition diagram
• ?(1,a) 1 (not ?(1,a) 1)
• ?(1,?) 1, 2
• ?(3, c) 2, 3
• ?(2,a)
• ?(3,?) 3

a
b
c
?
?
1
2
3
c
Example M8
26
NFA computation
• This next definition is different from but
equivalent to the one in the text
• Books definition may be easier to understand at
first, but that makes its version of Theorem 1.39
(subset construction) harder
• Goal a function ?Q?! P(Q) where ?(q,x) is
the set of all states reachable in the machine
after starting in state q and reading the entire
string x
• Then for an NFA M, we will define something like
L(M) x2? ?(q0,x) contains some
accepting state

27
NFA computation
• Let M(Q,?,?,q0,F) be an NFA. We define some
auxiliary functions
• E Q ! P(Q) by ("?-closure")
• E(q) p2 Q p is reachable from q by
following a chain of 0 or more ?
transitions
• Although E takes elements of Q as input, we'll
also use it as a function that takes subsets of Q
as input (that is, elements of P(Q)). SoE P(Q)
! P(Q) by

In other words, given a set as input, just
process each element independently...
28
NFA computation
• Thus E(q) is the set of all states you can get to
from q without reading any input
• In M8, E(3) ? E(2,1) ?
• We define a simple extension of ? that takes a
set of states as input
• ? Q ??! P(Q) (this comes with the NFA)
• ?P(Q)?? ! P(Q) defined by

Again, given a set as input, just process each
element independently...
29
NFA computation
• We have a function E() that follows ?-transitions
and a function ? that behaves like ? but takes
sets as input
• ?Q?! P(Q) is defined inductively For all q2
Q, ?(q,?) E( q )
• If w2? and c2?, let
• ?(q,wc) E(?(?(q,w),c))

30
NFA computation
• Finally, we defineL(M) x2? ?(q0,x)
contains some accepting state
x2?
• ?(1,ac) E(?(?(1,a),c))
• ?(1,a)E(?(?(1,?),a))
• ?(1,?) ?
• ?(1,ac) ?

?(q0,x) Å F ?
31
Question
• "How do I know when to follow ? transitions and
when not to?"
• If you're talking about ?, then don't--it's the
program itself. ? can express that "there is an
? transition here" but you never go any further
than that one hop.
• If you're talking about ?, then do--because it
includes E() as part of its definition, which is
there precisely in order to follow ? transitions

32
NFAs are good at union (or)
• L2x20,1 the binary number x is a multiple
of 2
• L3x20,1 the binary number x is a multiple
of 3
• Let A L2 L3
• NFA for A using guess-and-verify strategy
• Preview of Theorem 1.45

33
The Subset Construction
• Theorem 1.39 For every NFA M1 there exists a DFA
M2 such that L(M1) L(M2)
• Proof idea Well, how does fork() work on a
uniprocessor machine?

34
The Subset Construction
• Proof Let M1(Q1,?,?1,init1,F1) be the NFA and
define the DFA M2(Q2,?,?2,init2,F2) as follows
• Q2 P(Q1).
• Each state of the DFA records the set of states
that the NFA can simultaneously be in
• Can compare DFA states for equality but also look
"inside" the state name to find a set of NFA
state names
• Define ?2 Q2 ? ! Q2 ?2 P(Q1)? !
P(Q1) by
• ?2(S,a) E1(?1(S,a)) Go to whatever states
are reachable from the states in S and reading
the character a

Remember in an NFA,?1 Q1 ?? ! P(Q1) from
def ?1P(Q1)?? ! P(Q1) extend to sets E1P(Q1)
!P(Q1) ?-closure
35
The Subset Construction
• init2 E(init1)
• F2q 2 Q2 q Å F1? , in other wordsF2S µ
Q1 S Å F1?
• The effect is that the DFA knows all states that
are reachable in the NFA after reading the string
so far. If any one of them is accepting, then
the current DFA state is accepting too, otherwise
it's not.
• If you believe this then that's all it takes to
see that the construction is correct. So,
convince yourself with an example. QED

36
Subset construction example
• Q2 ,1,2,3,1,2,1,3,2,3,1,2,3
• (On board)
• init21,2,3
• F23,1,3,2,3,1,2,3

a
b
c
?
?
3
1
2
c
Example M8 (think of this as M1 in the
construction)
37
Be methodical
• Need to compute ?2(1,2,3,c)
E1(?1(1,2,3,c))
• By definition, ?1(1,2,3,c) ?1(1,c) ?1(2,c)
?1(3,c)

• 2,3
• Then take E1( 2,3 ) 2,3
• Save intermediate results for reuse
• It's OK to eliminate unreachable states in
practice, even though that's not what the
construction really does

38
Subset construction conclusion
• Adding nondeterminism makes programs shorter but
not able to do new things
• Remember regular languages are defined to be
those "recognized by a DFA"
• We now have a result that says that every
language that is recognized by an NFA is regular
too
• So if you are asked to show that a language is
regular, you can exhibit a DFA or NFA for it and
rely on the subset construction theorem
• Sometimes questions are specifically about DFAs
or NFAs, though... pay attention to the precise
wording

39
More NFA examples
• Write an NFA for ab,abc with 3 states
• NFA and DFA for ? over ?0,1
• Rule ? 2 L(M) , ?
• NFA and DFA for over ?0,1

40
Closure properties
• The presence or absence of closure properties
says something about how well a set tolerates an
operation
• Definition. Let S µ U be a set in some universe
U and be an operation on elements of U. We say
that S is closed under if applying to
element(s) of S produces another element of S.
• For example, if is a binary operation UU!U,
then we're saying that (8 x2S and y2S) x y 2 S

41
Closure properties illustrated
U
Applying the operation to elements of S never
takes you ouside of S. S is closed with respect
to This example shows unary operations

S
42
Closure properties
• Having a closure property usually means there is
some type of "natural fit" between the operation
and the set
• Examples
• N is closed under and and but not - and
• Z is closed under and - and and unary -
(negation) but not or
• Q-0 is closed under and but not or -

43
More examples
• L1x2 0,1 x is a multiple of 3
• is closed under string reversal and concatenation
• L3x20,1 the binary number x is a multiple
of 3
• is also closed under string reversal and
concatenation, harder to see though
• L4x2a,b x contains an odd of bs and an
even of as
• is closed under string reversal
• is not closed under string concatenation

44
Closure higher abstraction
• We will usually be concerned with closure of
language classes under language operations
• Previous examples were closure of sets containing
non-set elements under various familiar
operations
• We consider DFAs and NFAs to be programs and we
want assurance that their outputs can be combined
in desired ways just by manipulating their
programs (like using one as a subroutine for the
other)
• Representative question is REG closed under
(language) concatenation?

45
The regular operations
• The regular operations on languages are
• (union)
• (concatenation)
• (Kleene star)
• The name "regular operations" is not that
important
• Too bad we use the word "regular" for so much
• REG is closed under these regular operations
• That's why they're called "regular" operations
• This does not mean that each regular language is
closed under each of these operations!

46
The regular operations
• REG is closed under union Theorem 1.25 (using
DFAs), Theorem 1.45 (using NFAs)
• REG is closed under concatenation Theorem 1.47
(NFAs)
• REG is closed under Theorem 1.49 (NFAs)
• Study these constructions!!
• REG is also closed under complement and reversal
(not in book)

47
Regular expressions
• You are probably familiar with these
• Example "int .\(.\)" is a (flex format)
regular expression that appears to match C
function prototypes that return ints
• In our treatment, a regular expression is a
program that generates a language of matching
strings when you "run it"
• We will use a very compact definition that
simplifies things later

48
Regular expressions
• Definition. Let ? be an alphabet not containing
any of the special characters in this list ?
) ( We define the syntax of the
(programming) language REX(?), abbreviated as
REX, inductively
• Base cases
• For all a2?, a2REX. In other words, each single
character from ? is a regular expression all by
itself.
• ?2REX. In other words, the literal symbol ? is a
regular expression. In this context it is not
the empty string but rather the single-character
name for the empty string.
• 2REX. Similarly, the literal symbol is a
regular expression.

49
Regular expressions
• Definition continued
• Induction cases
• For all r1, r22 REX,( r1 r2 ) 2 REX
also
• For all r1, r22 REX,( r1 r2 ) 2 REX also

literal symbols
variables
50
Regular expressions
• Definition continued
• Induction cases continued
• For all r 2 REX,( r ) 2 REX also
• Examples over ?0,1
• ? and 0 and 1 and
• (((10)(?)))
• ?? is not a regular expression
• Remember, in the context of regular expressions,
? and are ordinary characters

51
Semantics of regular expressions
• Definition. We define the meaning of the
language REX(?) inductively using the L()
operator so that L(r) denotes the language
generated by r as follows
• Base cases
• For all a2?, L(a) a . A single-character
regular expression generates the corresponding
single-character string.
• L(?) ? . The symbol for the empty string
actually generates the empty string.
• L() . The symbol for the empty language
actually generates the empty language.

52
Regular expressions
• Definition continued
• Induction cases
• For all r1, r22 REX,L( (r1 r2) ) L(r1)
L(r2)
• For all r1, r22 REX,L( (r1 r2) ) L(r1)
L(r2)
• For all r 2 REX,L( ( r ) ) (L(r))
• No other string is in REX(?)
• Example
• L( ( ((10)(?)) ) ) includes
• ?,10,1010,101010,10101010,...

53
Orientation
• We used highly flexible mathematical notation and
state-transition diagrams to specify DFAs and
NFAs
• Now we have a precise programming language REX
that generates languages
• REX is designed to close the simplest languages
under , ,

54
Abbreviations
• Instead of parentheses, we use precedence to
indicate grouping when possible.
• (highest)
• (lowest)
• Instead of , we just write elements next to
each other
• Example (((10)(?))) can be written as
(10(?)) but there is no further abbreviation
• (Not in text) If r2 REX(?), instead of writing
rr, we write r

55
Abbreviations
• Instead of writing a union of all characters from
? together to mean "any character", we just write
?
• In a flex/grep regular expression this would be
called "."
• Instead of writing L(r) when r is a regular
expression, we consider r alone to simultaneously
mean both the expression r and the language it
generates, relying on context to disambiguate

56
Abbreviations
• Caution regular expressions are strings
(programs). They are equal only when they
contain exactly the same sequence of characters.
• (((10)(?))) can be abbreviated (10(?))
• however (((10)(?))) ? (10(?)) as strings
• but (((10)(?))) (10(?)) when they are
considered to be the generated languages
• more accurately then, L( (((10)(?))) )
L( (10(?)) )
• L( (10) )

57
Facts
• REX(?) is itself a language over an alphabet ?
that is
• ? ? ) , ( , , , ? ,
• For every ?, REX(?) 1
• ,(),(()),...
• even without knowing ? there are infinitely many
elements in REX(?)
• Question Can we find a DFA or NFA M with L(M)
REX(?)?

58
Examples
• Find a regular expression for w20,1 w ?
10
• Find a regular expression for x20,1 the
6th digit counting from the rightmost
character of x is 1
• Find a regular expression forL3x20,1 the
binary number x is a multiple of 3

59
The DFA for L3
1
0
1
0
1
0
2
0
1
(0 1 0)
Regular expression(0 1 _____________ 1 )
60
Regular expression for L3
• (0 1 (0 1 0) 1 )
• L3 is closed under concatenation, because of the
overall form ( )
• Now suppose x2L3. Is xR 2 L3?
• Yes see this is by reversing the regular
expression and observing that the same regular
expression results
• So L3 is also closed under reversal

61
Regular expressions generate regular languages
• Lemma 1.55 For every regular expression r, L(r)
is a regular language.
• Proof by induction on regular expressions.
• We used induction to create all of the regular
expressions and then to define their languages,
so we can use induction to visit each one and

62
L(REX) µ REG
• Base cases
• For every a2 ?, L(a) a is obviously
regular
• L(?) ? 2 REG also
• L() 2 REG

a
63
L(REX) µ REG
• Induction cases
• Suppose the induction hypothesis holds for r1 and
r2. Namely, L(r1) 2 REG and L(r2) 2 REG. We
want to show that L( (r1 r2) ) 2 REG also. But
look by definition, L( (r1 r2) ) L(r1)
L(r2)
• Since both of these languages are regular, we
can apply Theorem 1.45 (closure of REG under )
to conclude that their union is regular.

64
L(REX) µ REG
• Induction cases
• Now suppose L(r1)2 REG and L(r2)2 REG. By
definition, L( (r1 r2) ) L(r1) L(r2)
• By Theorem 1.47, this concatenation is regular
too.
• Finally, suppose L(r)2 REG. Then by
definition, L( (r) ) (L(r))
• By Theorem 1.49, this language is also regular.
QED

65
On to REG µ L(REX)
• Now we'll show that each regular language (one
accepted by an automaton) also can be described
by a regular expression
• Hence REG L(REX)
• In other words, regular expressions are
equivalent in power to finite automata
• This equivalence is called Kleene's Theorem (1.54
in book)

66
Converting DFAs to REX
• Lemma 1.60 in textbook
• This approach uses yet another form of finite
automaton called a GNFA (generalized NFA)
• The technique is easier to understand by working
an example than by studying the proof

67
Syntax of GNFA
• A generalized NFA is a 5-tuple (Q,?,?,qs,qa) such
that
• Q is a finite set of states
• ? is an alphabet
• ?(Q-qa)(Q-qs)! REX(?) is the transition
function
• qs2 Q is the start state
• qa2 Q is the (one) accepting state

68
GNFA syntax summary
• Arcs are labeled with regular expressions
• Meaning is that "input matching the label moves
from old state to new state" -- just like NFA,
but not just a single character at a time
• Start state has no incoming transitions, accept
has no outgoing
• Every pair of states (except start accept) has
two arcs between them
• Every state has a self-loop (except start
accept)

69
Construction strategy
• Will convert a DFA into a GNFA then iteratively
shrink the GNFA until we end up with a diagram
like thismeaning that exactly that input
that matches the giant regular expression is in
the langauge

giant regular expression
qa
qs
70
Converting DFA to GNFA
1
0
1
0
DFA
1
0
2
0
1
qa
1
0
Adding new start state qs is straightforward Then
make each DFA accepting state have an ?
transition to the single accepting state qa
1
0
?
1
2
0
0
1
?
qs
GNFA
71
Interpreting arcs
• ?(Q-qa)(Q-qs)! REX(?)In this diagram,
• ?(0,1)1 ?(2,0) ?(2,qa)
• ?(1,1) ?(2,2)1 ?(0,qa)?

qa
1
0
1
0
?
1
2
0
0
1
?
qs
72
Eliminating a GNFA state
• We arbitrarily choose an interior state (not qs
or qa) to rip out of the machine

Question how is the ability of state i to get to
state j affected when we remove rip? Only the
solid and labeled states and transitions are
relevant to that question
R4
i
j
R1
R3
rip
R2
73
Eliminating a GNFA state
• We produce a new GNFA that omits rip
• Its i-to-j label will compensate for the missing
state
• We will do this for every (i,j) 2
(Q-qa)(Q-qs)
• So we have to rewrite every label in order to
eliminate this one state
• New label for i-to-j is
• R4 (R1 (R2) R3)

R4
i
j
R1
R3
rip
R2
74
Don't overlook
• The case (i,i) 2 (Q-qa)(Q-qs)
• New label for i-to-i is still
• R4 (R1 (R2) R3)
• Example proceeds on whiteboard, or see textbook
for a different one

R4
i
R3
R1
rip
R2
75
g/re/p
• What does grep do?
• (int float)_rec.emp becomes
• (?)(int float)_rec(?)emp(?)
• What does it mean?
• How does it work?
• Regular expression ! NFA ! DFA ! state reduction
• Then run DFA against each line of input, printing
out the lines that it accepts

76
State machines
• Very common programming technique
• while (true)
• switch (state)
• case NEW_CONNECTION
• break
• if (process_cmd() CMD_QUIT)
• stateSHUTDOWN
• break
• case SHUTDOWN

77
This course so far
• 1.1 Introduction to languages DFAs
• 1.2 NFAs and DFAs recognize the same class of
languages
• 1.3 REX generates the same class of languages
• Three different programming "languages" specified
in different levels of formality that solve the
same types of computational problems
• Four, if you count GNFAs
• Five, if you count UFAs

78
Strategies
• If you're investigating a property of regular
languages, then as soon as you know L 2 REG, you
know there are DFAs, NFAs, Regexes that describe
it. Use whatever representation is convenient
• But sometimes you're investigating the properties
of the programs themselves changing states,
adding a to a regex, etc. Then the knowledge
that other representations exist might be
relevant and might not

79
All finite languages are regular
• Theorem (not in book) FIN µ REG
• Proof Suppose L 2 FIN.
• Then either L , or L s1, s2, ?, sn where
n2N and each si2?.
• A regular expression describing L is, therefore,
either or
• s1 s2 ? sn QED
• Note that this proof does not work for n1

80
Picture so far
ALL
Each point is a language in this Venn
diagram REG L(DFA) L(NFA) L(REX)
L(UFA) L(GNFA) ? FIN
REG
is there a language out here?
FIN
"the class of languages generated by DFAs"
81
1.4 Nonregular languages
• For each possible language L,
• µ L. So is the smallest language. And is
regular
• L µ ?. So ? is the largest language. And ? is
regular
• Yet there are languages in between these two
extremes that are not regular

82
A nonregular language
• B 0n 1n n 0
• ?, 01, 0011, 000111, ?
• is not regular
• Why?
• Q how many bits of memory would a DFA need in
order to recognize B?
• A there appears to be no single number of bits
that's big enough to work for every element of B
• Remember, the DFA needs to reject all strings
that are not in B

83
Other examples
• C w20,1 n0(w) n1(w)
• Needs to count a potentially unbounded number of
'0's... so nonregular
• D w20,1 n01(w) n10(w)
• Needs to count a potentially unbounded number of
'01' substrings... so ??
• Need a technique for establishing nonregularity
that is more formal and... less intuitive?

84
Proving nonregularity
• To prove a language that a language is
nonregular, you have to show that no DFA
whatsoever recognizes the language
• Not just the DFA that is your best effort at
recognizing the language
• The pumping lemma can be used to do that
• The pumping lemma says that every regular
language satisfies the "regular pumping property"
(RPP)
• Given this, if we can show that a language like B
doesn't satisfy the RPP, then it's not regular

85
Pumping lemma, informally
• Roughly "if a regular language contains any
'long' strings, then it contains infinitely many
strings"
some DFA M(Q,?,?,q0,F) for it has Q10 states.
• What if M accepts some particular string s where
sc1c2?c15 so that s15?

q0
86
Pigeonhole principle
• With 15 input characters, the machine will visit
at most 16 states
• But there are only 10 states in this machine
• So clearly it will visit at least one of its
states more than once
• Let rpt be our name for the first state that is
visited multiple times on that particular input s
• Let acc be our name for the accepting state that
s leads to, namely, ?(q0,s) acc
• Let y be our name for the leftmost substring of s
for which ?(rpt, y)rpt
• Since there are no ? transitions in a DFA, a
state being "visited multiple times" means that
it read at least one character. Therefore, y gt
0

87
sequence of states that M visits after
gt0
10
After reading c1? c10 (first 10 chars of s), M
must have already been to state rpt and returned
to it at least once... because there are only 10
states in M. Of course the repetition could have
been encountered earlier than 10 characters too...
88
sequence of states that M visits after
gt0
10
Assigning new names to the pieces of s...
89
sequence of states that M visits after
gt0
10
Assigning new names to the pieces of s... So s
xyz as shown above. With these names, the other
constraints can be written y gt 0 xy 10
90
M accepts other strings too
• Consider the string xz

91
M accepts other strings too
• Consider the string xz
• ?(q0,x) rpt
• ?(rpt,z) acc (from previous slide)
• So xz 2 L(M) too

92
M accepts other strings too
• Consider the string xyyz
• ?(q0,xy)rpt (from 2 slides ago)
• ? (rpt,y)rpt (from same previous result)
• ? (rpt,z)acc (from same previous result)
• So xyyz2 L(M) also
• Apparently we can repeat y as many times as we
want

93
p-regular-pumpable strings
• Definition (not in textbook) A string s is said
to be p-regular-pumpable in a language L µ ? if
there exist x,y,z 2 ? such that
• sxyz ("x,y,z are a decomposition of s")
• ygt0
• xy p
• For all i 0,
• x yi z 2 L ("the y part of s can be pumped
to produce other strings in the language")
• It follows that s must be a member of L for it to
be p-pumpable
• The 15-character string s in the previous example
was 10-pumpable in L(M)

94
p-regular-pumpable languages
• Definition A language L is p-regular-pumpable if
• for every s 2 L such that s p, the string s
is p-pumpable in L
• in other words, "every long enough string in L is
pumpable"
• Our previous example language was
15-regular-pumpable

95
RPP(p) and RPP
• Definition RPP(p) is the class of languages that
are p-regular-pumpable. In other words,RPP(p)
Lµ? L is p-regular-pumpable
• Definition RPP is the class of languages that are
p-regular pumpable for some p. In other
words,
• Lots of notation and apparent complexity, but the
idea is simple RPP is the class of languages in
which every long string is pumpable

96
Pumping lemma
• Theorem 1.70 (rephrased) If Lµ? is recognized
by a p-state DFA, then L 2 RPP(p)
• Proof Just like our example, but use p instead of
the constant 15 (number of states)
• Corollaries
• REG µ RPP

Primary application of Pumping Lemma
97
Proving a language nonregular
• First unravel these definitions, but it amounts
to proving that L is not a member of RPP. Then
it follows that L isn't regular
• Proving that L isn't in RPP allows you to
concentrate on the language rather than
considering all possible proposed programs that
might recognize it

98
Unraveling RPP a direct rephrasing
• Rephrasing L is a member of RPP if
• There exists p0 such that
• For every s2L satisfying s p,
• There exist x,y,z 2 ? such that
• sxyz
• ygt0
• xy p
• For all i 0,
• x yi z 2 L

(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated
99
Question from last time
• (Question) Didn't you earlier say "regular
languages are closed under concatenation"?
• (Answer) No, I wrote that REG is closed under
concatenation
• Subtle but important distinction. REG (the class
of all regular languages) is closed under
language concatenation
• If A,B2REG then AB2REG
• That does not mean that each regular language is
itself closed under string concatenation
• 10, 1 2 REG but 101 10, 1

100
• Claim Let B 0n 1n n 0 . Then B is not
regular
• Proof We show that B is not a member of RPP by
• So assume that B 2 RPP (and hope to reach a
contradiction soon). Then there exists p 0
associated with the definition in RPP.
• We let s 0p 1p. (Not the exact same variable
as in the RPP property, but an example of one
such possible setting of it.) Now we know that s
2 B because it has the right form.

101
Proof continued
• Now s 2p p. By assumption that B 2 RPP,
there exist x,y,z such that
• sxyz ( 0p 1p, remember)
• ygt0
• xy p
• For all i 0,
• x yi z 2 B
• Part (3) implies that xy 2 0 because the first
p-many characters of sxyz are all 0
• So y consists solely of '0' characters
• ... at least one of them, according to (2)

102
Proof continued
• But consider
• s xyz xy1z 0p 1p (where we started)
• y consists of one or more '0' characters
• so xy2z contains more '0' characters than '1'
characters. In other words,
• xy2z 0py 1p
• so xy2z B 0n 1n n 0 .
• Since the contradiction followed merely from the
assumption that B2RPP (and right and meet and
true reasoning about which we have no doubt),
that assumption must be wrong QED

103
Observations
• We needed (and got) a contradiction that was a
necessary consequence of the assumption that B 2
RPP and then relied on the Theorem 1.70
corollaries
• RPP mainly concerns strings that are longer than
p
• So you should concentrate on strings longer than
p...
• even though p is a variable. But clearly
0p1pgtp
• In our example we didn't "do" much after our
initial choice of s and thinking about the
implications we found a contradiction right away
• Many other choices of s would work, but many
don't, and even some that do work require more
complex argumentsfor example, s0bp/2c1
1bp/2c1
• Choosing s wisely is usually the most important
thing

104
Picture so far
ALL
Each point is a language in this Venn diagram
RPP
We'll see anexample later
0(101)
REG
0101, ?
FIN
B 0n 1n n 0
105
• Consider this shortcut attempt to prove that B
0n 1n n 0 is not regular
• Proof Suppose B2 RPP. By RPP,
• There exists p0 such that
• For every s2B satisfying s p,
• There exist x,y,z 2 ? such that
• sxyz
• ygt0
• xy p
• For all i 0,
• x yi z 2 B
• So let s (1010)p. Then s B, which is
inconsistent with the RPP statement.

NO
106
Simplifying RPP proofs
proofs and instead prove directly that a language
is not in RPP
• So we need a direct, formal version of of the
statement that L RPP

107
Unraveling RPP (repeat)
• Rephrasing L is a member of RPP if
• There exists p0 such that
• For every s2L satisfying s p,
• There exist x,y,z 2 ? such that
• sxyz
• ygt0
• xy p
• For all i 0,
• x yi z 2 L

(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated
108
Unraveling non-RPP
• Rephrasing L is not in RPP if
• For every p0
• There exists some s2L satisfying s p such
that
• For every x,y,z 2 ? satisfying 1-3
• sxyz,
• ygt0, and
• xy p
• There exists some i 0 for which
• x yi z L

(8 p) (9 s) (8 x,y,z) (9 i) Still complicated
but you don't have to use contradiction now
109
A direct proof of nonregularity
• Let Dan2 n0 ?,a1,a4,a9, ? ('a' is just
some character). Then D is not regular.
• Proof idea The pumping lemma says there's a
fixed-size loop in any DFA that accepts long
strings. You can repeat the characters in that
loop as many times as you want to get longer
strings that the machine accepts. Each time you
add a repetition you grow the pumped string by a
constant length.
• But the spacing between strings in D above keeps
changing it's never constant. So D doesn't have
the pumping property.

110
A direct proof of nonregularity
• Let Dan2 n0 ?,a1,a4,a9, ?. Then D is
not in RPP and thus not regular.
• Proof Let p0 and set sa(p1)2. Then s2D and
sgtp (so such an s certainly exists).
• Now let x,y,z2? be any strings satisfying
• xyz s a(p1)2
• ygt0, and
• xy p
• Our goal is to produce some i such that xyiz D

111
Direct proof continued
• (We'll actually show that xy0z D)
• Observe that yaj for some 1 j p, so
• xy0z a(p1)2-j lt (p1)2
• Since j p we know that -j -p and thus
• xy0z (p1)2 - j
• (p1)2 - p
• p2 p 1
• gt p2
• In other words, xy0z has gt p2 characters and lt
(p1)2 characters. So xy0z is not a perfect
square and thus xy0z D. QED

112
• Both work fine... it's your choice
• But you must clearly state what you are doing
• If proof by contradiction, say so
• If direct proof, say so

113
Game theory formulation
• The direct proof technique can be formulated as a
two-player game
• You are the player who wants to establish that L
is not pumpable
• Your opponent wants to make it difficult for you
to succeed
• Both of you have to play by the rules

114
Game theory continued
• The game has just four steps.
• You pick s2L such that s p
• Your opponent chooses x,y,z 2 ? such that sxyz,
ygt0, and xy p
• You produce some i 0 such that xyiz L

115
Game theory continued
• If you are able to succeed through step 4, then
you have won only one round of the game
• Like winning one round of Tic-tac-toe
• Do example for a member of D
• To show that a language is not in RPP you must
show that you can always win, regardless of your
opponent's legal moves
• Realize that the opponent is free to choose the
most inconvenient or difficult p and x,y,z
imaginable that are consistent with the rules

116
Game theory continued
• So you have to present a strategy for always
winning and convincingly argue that it will
always win
• So your choices in steps 2 4 have to depend on
the opponent's choices in steps 1 3
• And you don't know what the opponent will choose
• So your choices need to be framed in terms of the
variables p, x, y, z

117
Game theory continued
• Ultimately it is not very different from the
direct proof
• But it states clearly what choices you may make
and what you may not a common cause of errors
in proofs
• Repeat previous proof in this framework

118
A direct proof of nonregularity
Step 1, opponent's choice
Step 2, your choice and reasoning
• Let Dan2 n0 ?,a1,a4,a9, ?. Then D is
not in RPP and thus not regular.
• Proof Let p0 and set sa(p1)2. Then s2D and
sgtp (so such an s certainly exists).
• Now let x,y,z2? be any strings satisfying
• xyz s a(p1)2
• ygt0, and
• xy p
• Our goal is to produce some i such that xyiz D

Step 3, opponent's choice
119
Direct proof continued
• (We'll actually show that xy0z D)
• Observe that yaj for some 1 j p, so
• xy0z a(p1)2-j lt (p1)2
• Since j p we know that -j -p and thus
• xy0z (p1)2 - j
• (p1)2 - p
• p2 p 1
• gt p2
• In other words, xy0z has gt p2 characters and lt
(p1)2 characters. So xy0z is not a perfect
square and thus xy0z D. QED

120
Unraveling RPP (repeat)
• Rephrasing L is a member of RPP if
• There exists p0 such that
• For every s2L satisfying s p,
• There exist x,y,z 2 ? such that
• sxyz
• ygt0
• xy p
• For all i 0,
• x yi z 2 L
• Theorem REG µ RPP

121
• If L 2 RPP(p) (meaning "strings in L with length
p are pumpable") and qgtp then L 2 RPP(q)
• If L RPP(q) and qgtp then L RPP(p)
(contrapositive of 1)
• Thus if you have a proof that establishes L
RPP(q) only when q5, that's good enough it
follows that L is not regular
• Relevant for C is not regular problem

122
• If L 2 FIN and the longest string in L has length
n, then
• L 2 RPP(n1)
• L RPP(q) for all q lt n1
• Note RPP is a class of languages that's only
interesting because of its relation to REG. It
is not a reasonable proposal for a computation
model!

123
Unraveling non-RPP (repeat)
• L is not in RPP if
• For every p0 (opponent choice)
• There exists some s2L satisfying s p such
• For every x,y,z 2 ? satisfying 1-3
• sxyz,
• ygt0, and
• xy p
• There exists some i 0 for which
• x yi z L

(opponent's)
(yours)
124
Another example
• Let C 0m 1n m ? n . Is C regular? Try to
prove it isn't
• Set s0p 12p. If opponent chooses x?, y0p,
z12p, then we can set i2 and win because
xy2z02p 12p C.
• What if opponent chooses a shorter y?
• Looks like it's relatively easy to be a member of
C and hard to not be a member of C
• Can force opponent to choose y 2 0
• So try to arrange it so that no matter what y
is, some number of repetitions of it will match
the target number of '1's

125
Direct proof?
• Hmmm

126
Using closure properties
• Can simplify argument a great deal
• Fact If L is not regular then Lc is not regular
either.
• Proof If L is not regular but Lc were regular,
then (Lc)c would also be regular because REG is
closed under complement. But (Lc)c L QED
• Recall the languagesB 0m 1n m n C
0m 1n m ? n C is similar to Bc...

127
Using closure properties
• Start over
• B 0m 1n m n (known nonreg)C 0m 1n
m ? n (suspected nonreg)
• Certainly B µ Cc
• If mn then it's true that (not m ? n)
• But B ? Cc
• Find example x 2 Cc - B...
• On the other hand, B 01 Å Cc

128
Using closure properties
• Fact If L1ÅL2 REG and L1 2 REG, then L2 REG
• Proof Suppose (a) L1Å L2 REG and L12 REG and
(b) L22REG. Since REG is closed under Å we know
that L1ÅL2 2 REG, but that contradicts assumption
(a). Thus (a) and (b) can't both be true. QED

129
Topics for Exam 1
• Basic objects
• The main hierarchy alphabets, strings,
languages, classes
• Functions
• Relations
• Sets and operations on sets
• , Å, complement, , P(S), A-B, S
• µ, 2
• element predicate(element)
• Propositional and predicate logic
• 8 and 9

130
Topics for Exam 1
• Strings
• ? versus
• Operations on strings concatenation,
exponentiation, reversal
• Languages
• Operations concatenation, exponentiation,
reversal, , Å, , complement, everything
applicable to sets, ? versus
• Language classes
• FIN, REG, ALL

131
Topics for Exam 1
• REG and its many formulations
• DFA, NFA, GNFA, UFA, REX
• Syntax and semantics of each model
• L() as program-to-language operator
• Conversions between models
• Subset construction for NFA, UFA
• DFA ! GNFA ! REX
• REX ! NFA

132
Topics for Exam 1
• Closure properties of language classes
• REG as a reasonable model of computation
• Arguments for, against
• Homework problems through homework 3
• Lectures reading up through section 1.3
(excluding nonregularity)

133
Exam 1
• You may bring and consult a single-sided,
handwritten sheet of notes, which you must turn
in with the exam (and will get back later)

134
Applying these closure properties
• B 0m 1n m n C 0m 1n m ? n
• 01 Å Cc B
• Thus C is nonregular too

obviously regular
known to be nonregular
therefore nonregular
135
Another closure properties attempt
• B 0m 1n m n 0n 1n n 0
(known nonreg)
• BB 0n1n 0m1m n,m 0
• Want to show that BB REG
• We know that REG is closed under language
concatenation. What does that say about whether
BB is regular or not?
• Is the class of non-regular languages (REGc)
closed under language concatenation too?

136
No
• Let ? a and D an2 n 2
• Then Dc ak k 1 or k is not a square
?, a1,a2,a3,a5,a6,a7,a8,a10,?
• We previously proved that D REG
• Thus Dc REG (by "fact" we proved)
• But Dc Dc a 2 REG !!!
• Thus REGc is not closed under language
concatenation

137
Back to problem
• B 0m 1n m n (known nonreg)BB 0n1n
0m1m n,m 0
• Want to show that BB REG
• But there's no general result for that
• When applying a closure property, you have to
make sure it's true!
• Nonetheless, it is true that BB REG
• Because (BB) Å 01 B

138
Chapter 1 closing considerations
• We don't and won't have many results about the
class REGc
• Being nonregular says that the language lacks a
certain type of structure it's more complicated
than a DFA can handle
• All real computers are finite devices and all
finite languages are regular
• Yet the programming models are brittle the
program has to change for larger and larger
inputs
• We've seen some easy-to-specify languages that
aren't regular
• So REG is not a good general-purpose programming
model...?