1 / 137

91.304 Foundations of (Theoretical) Computer

Science

- Chapter 1 Lecture Notes
- David Martin
- dm_at_cs.uml.edu

This work is licensed under the Creative Commons

Attribution-ShareAlike License. To view a copy of

this license, visit http//creativecommons.org/lic

enses/by-sa/2.0/ or send a letter to Creative

Commons, 559 Nathan Abbott Way, Stanford,

California 94305, USA.

Chapter 1 Regular Languages

- Simple model of computation
- Input a string, and either accept or reject it
- Models a very simple type of function, a

predicate on strings f ? ! 0,1 - See example of a state-transition diagram

Syntax of DFA

- A deterministic finite automaton (DFA) is a

5-tuple (Q,?,delta,q0,F) such that - Q is a finite set of states
- ? (sigma) is an alphabet
- ?Q?!Q (delta) is the transition function
- q02 Q (q naught) is the start state
- F µ Q is the set of accepting states
- Usually these names are used, but others are

possible as long as the role is clear

DFA syntax

- It is deterministic because for every input

(q,c), the next state is a uniquely determined

member of Q - because the codomain of ? is Q
- Fix the previous example to fit these constraints
- The same example DFA, specified formally

DFA computation

- This definition is different from but equivalent

to the one in the text - Let M(Q,?,?,q0,F) be a DFA. We define the

extended transition function ?Q?!

Qinductively as follows. For all q2 Q,

?(q,?) q.If w2? and c2?, let ?(q,wc)

?(?(q,w),c) - According to this definition, ?(q,x) is the

state of the machine after starting in state q

and reading the entire string x - See example

Language recognized by DFA

- The language recognized by the DFA M is written

L(M) and defined as L(M)x2? ?(q0,x) 2 F - Think of L() as an operator that turns a program

into the language it specifies - We will use L() for other types of machines and

grammars too

Example

- Let L2x20,1 the binary number x is a

multiple of 2 and build a DFA M2 such that

L(M2) L2 - Remember this means L(M2) µ L2 and L2 L2 µ L(M2)

Definition of regular languages

- A language L is regular if there exists a DFA M

such that L L(M) - The class of regular languages over the alphabet

? is called REG and defined REG L µ ? L

is regular L(M) M is a DFA

over ? - Now we know 4 classes of languages , FIN, REG,

and ALL

Problems

- For all k1, let Ak0kn n0. Prove that (8

k1) Ak 2 REG - Solution is a scheme, not a single DFA
- (Harder) Build a DFA for L3x20,1 the binary

number x is a multiple of 3 - Build a DFA for L4x2a,b x contains an odd

of bs and an even of as

Measuring DFA complexity

- Suppose
- you have a DFA with states named 00000000 ..

11111111 (28 256 unique states) - an LCD attached to the thing showing the current

state name - ? c (for clock pulse)
- ?(q, c) (q 1) 0xFF
- This is a simple counter machine feed it clocks

and it counts upwards

Measuring DFA complexity

- Time complexity
- A DFA always takes one transition per input

character - So time complexity is not useful here
- Program complexity
- A DFAs program is (mostly) its ?
- The model specifies no particular programming

language for ? its just a table mapping

(state, input) pairs to (state) outputs - Though it can sometimes be specified concisely,

as in ?(q, c) (q 1) 0xFF - Reprogram the clock for any permutation of 0,18

and ?s table remains just as big

Measuring DFA complexity

- Space complexity the amount of memory used
- But a DFA has no extra memory it only remembers

what state it is in - Cant look back or forward
- So a DFA always uses the same amount of memory,

namely the amount of memory required to remember

what state its in - Needs to remember current element of Q
- Can write down that number in log2 Q bits

DFAs as real computers

- Consider a 256 MB computer that takes a finite

input and produces a finite output - Inputs clock pulses, interrupts, hard drive,

keyboard, mouse, network, etc. - Outputs video, hard drive, network, etc.
- Can code everything in binary
- But DFA only accepts or rejects input

Recognition model for functions

- Can still sort of be modeled by a DFA
- PC x y x,y 20,1 and the input x

produces the output y - Note character is just a separator
- DFA plays the role of equipment verifier
- Verifying correctness seems easier than computing

the output, but at least its related

Are DFAs reasonable?

- One issue is that the programs dont seem to

reflect much about the problem being solved - If you can figure out how many bits of memory are

needed for the solution, then you can always

build a DFA based on that knowledge could be

tedious and really large - No difference in program complexity between same

amount of memory means DFAs dont help us see the

difference between programs very easily - Neural nets??

Are DFAs reasonable?

- Similarly An 8-bit counter is structurally very

different than a 9-bit counter - More memory needed ) totally different ? program

needed - Not very modular!

Are DFAs reasonable?

- Another issue is that DFAs prefer the beginning

of their inputs to the end of their inputs - L5 x20,1 the fifth digit from the left

of x is 0 - L6 x20,1 the fifth digit from the right

of x is 0 - DFAs know where the input begins but not where it

ends

Is REG reasonable?

- We should be able to combine computations as

subroutines in simple ways - logical OR (A B)
- logical AND (A Å B)
- concatenation (A B) and star (A)
- hard to prove!! motivation for NFA
- compl?ment (Ac)
- reversal (AR)
- All above are easy to do as logic circuits
- Will discuss further as closure under language

operations

Nondeterministic Finite Automata

- Will relax two of these DFA rules
- Each (state, char) input must produce exactly one

(state) output - Must consume one character in order to advance

state - Example L6 ?bob?
- See M6
- The NFA accepts the input if there exists any way

of reading the input that winds up in an

accepting state at the end of the string - Otherwise it rejects the input

NFAs

- Thus the NFA rejects the input if there doesnt

exist any way of reading the input that winds up

in an accepting state at the end of the string - In other words every way of reading the input

leads to a nonaccepting state - Example M7
- L7 ?

a

b

c

?

?

1

2

3

Ways to think of NFAs

- NFAs want to accept inputs and will always take

the most advantageous alternative(s) - Because they will accept if there exists any way

to get to an accepting state at the end of the

string - The quickest way there may be just one of many

ways, but it doesnt matter - http//www.chompchomp.com/frag05/frag05.01.a.htm

Ways to think of NFAs

a

a

a

- fork() model
- Input string is in a variable
- fork() at every nondeterministic choice point
- subprocess 1 (parent) follows first transition
- subprocess 2 (child) follows second
- subprocess 3 (child) follows third (if any), etc.
- A process that cant follow any transition calls

exit() -- and gives up its ability to accept - A process that makes it through the whole string

and is in an accepting state prints out ACCEPT - A single ACCEPT is enough

Syntax of DFA (repeat)

- A deterministic finite automaton (DFA) is a

5-tuple (Q,?,delta,q0,F) such that - Q is a finite set of states
- ? is an alphabet
- ?Q ? ! Q is the transition

function - q02 Q is the start state
- F µ Q is the set of accepting states
- Usually these names are used, but others are

possible as long as the role is clear

Syntax of NFA

- A nondeterministic finite automaton (NFA) is a

5-tuple (Q,?,delta,q0,F) such that - Q is a finite set of states
- ? is an alphabet
- ?Q(? ?)!P(Q) is the transition function
- q02 Q is the start state
- F µ Q is the set of accepting states
- Usually these names are used, but others are

possible as long as the role is clear

Syntax of NFA

- Definition ?? ? ?
- Well use this frequently enough
- Differences on state-transition diagram
- ?(1,a) 1 (not ?(1,a) 1)
- ?(1,?) 1, 2
- ?(3, c) 2, 3
- ?(2,a)
- ?(3,?) 3

a

b

c

?

?

1

2

3

c

Example M8

NFA computation

- This next definition is different from but

equivalent to the one in the text - Books definition may be easier to understand at

first, but that makes its version of Theorem 1.39

(subset construction) harder - Goal a function ?Q?! P(Q) where ?(q,x) is

the set of all states reachable in the machine

after starting in state q and reading the entire

string x - Then for an NFA M, we will define something like

L(M) x2? ?(q0,x) contains some

accepting state

NFA computation

- Let M(Q,?,?,q0,F) be an NFA. We define some

auxiliary functions - E Q ! P(Q) by ("?-closure")
- E(q) p2 Q p is reachable from q by

following a chain of 0 or more ?

transitions - Although E takes elements of Q as input, we'll

also use it as a function that takes subsets of Q

as input (that is, elements of P(Q)). SoE P(Q)

! P(Q) by

In other words, given a set as input, just

process each element independently...

NFA computation

- Thus E(q) is the set of all states you can get to

from q without reading any input - In M8, E(3) ? E(2,1) ?
- We define a simple extension of ? that takes a

set of states as input - ? Q ??! P(Q) (this comes with the NFA)
- ?P(Q)?? ! P(Q) defined by

Again, given a set as input, just process each

element independently...

NFA computation

- We have a function E() that follows ?-transitions

and a function ? that behaves like ? but takes

sets as input - ?Q?! P(Q) is defined inductively For all q2

Q, ?(q,?) E( q ) - If w2? and c2?, let
- ?(q,wc) E(?(?(q,w),c))

NFA computation

- Finally, we defineL(M) x2? ?(q0,x)

contains some accepting state

x2? - ?(1,ac) E(?(?(1,a),c))
- ?(1,a)E(?(?(1,?),a))
- ?(1,?) ?
- ?(1,ac) ?

?(q0,x) Å F ?

Question

- "How do I know when to follow ? transitions and

when not to?" - If you're talking about ?, then don't--it's the

program itself. ? can express that "there is an

? transition here" but you never go any further

than that one hop. - If you're talking about ?, then do--because it

includes E() as part of its definition, which is

there precisely in order to follow ? transitions

NFAs are good at union (or)

- L2x20,1 the binary number x is a multiple

of 2 - L3x20,1 the binary number x is a multiple

of 3 - Let A L2 L3
- NFA for A using guess-and-verify strategy
- Preview of Theorem 1.45

The Subset Construction

- Theorem 1.39 For every NFA M1 there exists a DFA

M2 such that L(M1) L(M2) - Proof idea Well, how does fork() work on a

uniprocessor machine?

The Subset Construction

- Proof Let M1(Q1,?,?1,init1,F1) be the NFA and

define the DFA M2(Q2,?,?2,init2,F2) as follows - Q2 P(Q1).
- Each state of the DFA records the set of states

that the NFA can simultaneously be in - Can compare DFA states for equality but also look

"inside" the state name to find a set of NFA

state names - Define ?2 Q2 ? ! Q2 ?2 P(Q1)? !

P(Q1) by - ?2(S,a) E1(?1(S,a)) Go to whatever states

are reachable from the states in S and reading

the character a

Remember in an NFA,?1 Q1 ?? ! P(Q1) from

def ?1P(Q1)?? ! P(Q1) extend to sets E1P(Q1)

!P(Q1) ?-closure

The Subset Construction

- init2 E(init1)
- F2q 2 Q2 q Å F1? , in other wordsF2S µ

Q1 S Å F1? - The effect is that the DFA knows all states that

are reachable in the NFA after reading the string

so far. If any one of them is accepting, then

the current DFA state is accepting too, otherwise

it's not. - If you believe this then that's all it takes to

see that the construction is correct. So,

convince yourself with an example. QED

Subset construction example

- Q2 ,1,2,3,1,2,1,3,2,3,1,2,3
- (On board)
- init21,2,3
- F23,1,3,2,3,1,2,3

a

b

c

?

?

3

1

2

c

Example M8 (think of this as M1 in the

construction)

Be methodical

- Need to compute ?2(1,2,3,c)

E1(?1(1,2,3,c)) - By definition, ?1(1,2,3,c) ?1(1,c) ?1(2,c)

?1(3,c) -

2,3 - Then take E1( 2,3 ) 2,3
- Save intermediate results for reuse
- It's OK to eliminate unreachable states in

practice, even though that's not what the

construction really does

Subset construction conclusion

- Adding nondeterminism makes programs shorter but

not able to do new things - Remember regular languages are defined to be

those "recognized by a DFA" - We now have a result that says that every

language that is recognized by an NFA is regular

too - So if you are asked to show that a language is

regular, you can exhibit a DFA or NFA for it and

rely on the subset construction theorem - Sometimes questions are specifically about DFAs

or NFAs, though... pay attention to the precise

wording

More NFA examples

- Write an NFA for ab,abc with 3 states
- NFA and DFA for ? over ?0,1
- Rule ? 2 L(M) , ?
- NFA and DFA for over ?0,1

Closure properties

- The presence or absence of closure properties

says something about how well a set tolerates an

operation - Definition. Let S µ U be a set in some universe

U and be an operation on elements of U. We say

that S is closed under if applying to

element(s) of S produces another element of S. - For example, if is a binary operation UU!U,

then we're saying that (8 x2S and y2S) x y 2 S

Closure properties illustrated

U

Applying the operation to elements of S never

takes you ouside of S. S is closed with respect

to This example shows unary operations

S

Closure properties

- Having a closure property usually means there is

some type of "natural fit" between the operation

and the set - Examples
- N is closed under and and but not - and
- Z is closed under and - and and unary -

(negation) but not or - Q-0 is closed under and but not or -

More examples

- L1x2 0,1 x is a multiple of 3
- is closed under string reversal and concatenation
- L3x20,1 the binary number x is a multiple

of 3 - is also closed under string reversal and

concatenation, harder to see though - L4x2a,b x contains an odd of bs and an

even of as - is closed under string reversal
- is not closed under string concatenation

Closure higher abstraction

- We will usually be concerned with closure of

language classes under language operations - Previous examples were closure of sets containing

non-set elements under various familiar

operations - We consider DFAs and NFAs to be programs and we

want assurance that their outputs can be combined

in desired ways just by manipulating their

programs (like using one as a subroutine for the

other) - Representative question is REG closed under

(language) concatenation?

The regular operations

- The regular operations on languages are
- (union)
- (concatenation)
- (Kleene star)
- The name "regular operations" is not that

important - Too bad we use the word "regular" for so much
- REG is closed under these regular operations
- That's why they're called "regular" operations
- This does not mean that each regular language is

closed under each of these operations!

The regular operations

- REG is closed under union Theorem 1.25 (using

DFAs), Theorem 1.45 (using NFAs) - REG is closed under concatenation Theorem 1.47

(NFAs) - REG is closed under Theorem 1.49 (NFAs)
- Study these constructions!!
- REG is also closed under complement and reversal

(not in book)

Regular expressions

- You are probably familiar with these
- Example "int .\(.\)" is a (flex format)

regular expression that appears to match C

function prototypes that return ints - In our treatment, a regular expression is a

program that generates a language of matching

strings when you "run it" - We will use a very compact definition that

simplifies things later

Regular expressions

- Definition. Let ? be an alphabet not containing

any of the special characters in this list ?

) ( We define the syntax of the

(programming) language REX(?), abbreviated as

REX, inductively - Base cases
- For all a2?, a2REX. In other words, each single

character from ? is a regular expression all by

itself. - ?2REX. In other words, the literal symbol ? is a

regular expression. In this context it is not

the empty string but rather the single-character

name for the empty string. - 2REX. Similarly, the literal symbol is a

regular expression.

Regular expressions

- Definition continued
- Induction cases
- For all r1, r22 REX,( r1 r2 ) 2 REX

also - For all r1, r22 REX,( r1 r2 ) 2 REX also

literal symbols

variables

Regular expressions

- Definition continued
- Induction cases continued
- For all r 2 REX,( r ) 2 REX also
- Examples over ?0,1
- ? and 0 and 1 and
- (((10)(?)))
- ?? is not a regular expression
- Remember, in the context of regular expressions,

? and are ordinary characters

Semantics of regular expressions

- Definition. We define the meaning of the

language REX(?) inductively using the L()

operator so that L(r) denotes the language

generated by r as follows - Base cases
- For all a2?, L(a) a . A single-character

regular expression generates the corresponding

single-character string. - L(?) ? . The symbol for the empty string

actually generates the empty string. - L() . The symbol for the empty language

actually generates the empty language.

Regular expressions

- Definition continued
- Induction cases
- For all r1, r22 REX,L( (r1 r2) ) L(r1)

L(r2) - For all r1, r22 REX,L( (r1 r2) ) L(r1)

L(r2) - For all r 2 REX,L( ( r ) ) (L(r))
- No other string is in REX(?)
- Example
- L( ( ((10)(?)) ) ) includes
- ?,10,1010,101010,10101010,...

Orientation

- We used highly flexible mathematical notation and

state-transition diagrams to specify DFAs and

NFAs - Now we have a precise programming language REX

that generates languages - REX is designed to close the simplest languages

under , ,

Abbreviations

- Instead of parentheses, we use precedence to

indicate grouping when possible. - (highest)
- (lowest)
- Instead of , we just write elements next to

each other - Example (((10)(?))) can be written as

(10(?)) but there is no further abbreviation - (Not in text) If r2 REX(?), instead of writing

rr, we write r

Abbreviations

- Instead of writing a union of all characters from

? together to mean "any character", we just write

? - In a flex/grep regular expression this would be

called "." - Instead of writing L(r) when r is a regular

expression, we consider r alone to simultaneously

mean both the expression r and the language it

generates, relying on context to disambiguate

Abbreviations

- Caution regular expressions are strings

(programs). They are equal only when they

contain exactly the same sequence of characters. - (((10)(?))) can be abbreviated (10(?))
- however (((10)(?))) ? (10(?)) as strings
- but (((10)(?))) (10(?)) when they are

considered to be the generated languages - more accurately then, L( (((10)(?))) )

L( (10(?)) ) - L( (10) )

Facts

- REX(?) is itself a language over an alphabet ?

that is - ? ? ) , ( , , , ? ,
- For every ?, REX(?) 1
- ,(),(()),...
- even without knowing ? there are infinitely many

elements in REX(?) - Question Can we find a DFA or NFA M with L(M)

REX(?)?

Examples

- Find a regular expression for w20,1 w ?

10 - Find a regular expression for x20,1 the

6th digit counting from the rightmost

character of x is 1 - Find a regular expression forL3x20,1 the

binary number x is a multiple of 3

The DFA for L3

1

0

1

0

1

0

2

0

1

(0 1 0)

Regular expression(0 1 _____________ 1 )

Regular expression for L3

- (0 1 (0 1 0) 1 )
- L3 is closed under concatenation, because of the

overall form ( ) - Now suppose x2L3. Is xR 2 L3?
- Yes see this is by reversing the regular

expression and observing that the same regular

expression results - So L3 is also closed under reversal

Regular expressions generate regular languages

- Lemma 1.55 For every regular expression r, L(r)

is a regular language. - Proof by induction on regular expressions.
- We used induction to create all of the regular

expressions and then to define their languages,

so we can use induction to visit each one and

prove a property about it

L(REX) µ REG

- Base cases
- For every a2 ?, L(a) a is obviously

regular - L(?) ? 2 REG also
- L() 2 REG

a

L(REX) µ REG

- Induction cases
- Suppose the induction hypothesis holds for r1 and

r2. Namely, L(r1) 2 REG and L(r2) 2 REG. We

want to show that L( (r1 r2) ) 2 REG also. But

look by definition, L( (r1 r2) ) L(r1)

L(r2) - Since both of these languages are regular, we

can apply Theorem 1.45 (closure of REG under )

to conclude that their union is regular.

L(REX) µ REG

- Induction cases
- Now suppose L(r1)2 REG and L(r2)2 REG. By

definition, L( (r1 r2) ) L(r1) L(r2) - By Theorem 1.47, this concatenation is regular

too. - Finally, suppose L(r)2 REG. Then by

definition, L( (r) ) (L(r)) - By Theorem 1.49, this language is also regular.

QED

On to REG µ L(REX)

- Now we'll show that each regular language (one

accepted by an automaton) also can be described

by a regular expression - Hence REG L(REX)
- In other words, regular expressions are

equivalent in power to finite automata - This equivalence is called Kleene's Theorem (1.54

in book)

Converting DFAs to REX

- Lemma 1.60 in textbook
- This approach uses yet another form of finite

automaton called a GNFA (generalized NFA) - The technique is easier to understand by working

an example than by studying the proof

Syntax of GNFA

- A generalized NFA is a 5-tuple (Q,?,?,qs,qa) such

that - Q is a finite set of states
- ? is an alphabet
- ?(Q-qa)(Q-qs)! REX(?) is the transition

function - qs2 Q is the start state
- qa2 Q is the (one) accepting state

GNFA syntax summary

- Arcs are labeled with regular expressions
- Meaning is that "input matching the label moves

from old state to new state" -- just like NFA,

but not just a single character at a time - Start state has no incoming transitions, accept

has no outgoing - Every pair of states (except start accept) has

two arcs between them - Every state has a self-loop (except start

accept)

Construction strategy

- Will convert a DFA into a GNFA then iteratively

shrink the GNFA until we end up with a diagram

like thismeaning that exactly that input

that matches the giant regular expression is in

the langauge

giant regular expression

qa

qs

Converting DFA to GNFA

1

0

1

0

DFA

1

0

2

0

1

qa

1

0

Adding new start state qs is straightforward Then

make each DFA accepting state have an ?

transition to the single accepting state qa

1

0

?

1

2

0

0

1

?

qs

GNFA

Interpreting arcs

- ?(Q-qa)(Q-qs)! REX(?)In this diagram,
- ?(0,1)1 ?(2,0) ?(2,qa)
- ?(1,1) ?(2,2)1 ?(0,qa)?

qa

1

0

1

0

?

1

2

0

0

1

?

qs

Eliminating a GNFA state

- We arbitrarily choose an interior state (not qs

or qa) to rip out of the machine

Question how is the ability of state i to get to

state j affected when we remove rip? Only the

solid and labeled states and transitions are

relevant to that question

R4

i

j

R1

R3

rip

R2

Eliminating a GNFA state

- We produce a new GNFA that omits rip
- Its i-to-j label will compensate for the missing

state - We will do this for every (i,j) 2

(Q-qa)(Q-qs) - So we have to rewrite every label in order to

eliminate this one state - New label for i-to-j is
- R4 (R1 (R2) R3)

R4

i

j

R1

R3

rip

R2

Don't overlook

- The case (i,i) 2 (Q-qa)(Q-qs)
- New label for i-to-i is still
- R4 (R1 (R2) R3)
- Example proceeds on whiteboard, or see textbook

for a different one

R4

i

R3

R1

rip

R2

g/re/p

- What does grep do?
- (int float)_rec.emp becomes
- (?)(int float)_rec(?)emp(?)
- What does it mean?
- How does it work?
- Regular expression ! NFA ! DFA ! state reduction
- Then run DFA against each line of input, printing

out the lines that it accepts

State machines

- Very common programming technique
- while (true)
- switch (state)
- case NEW_CONNECTION
- process_login()
- stateRECEIVE_CMD
- break
- case RECEIVE_CMD
- if (process_cmd() CMD_QUIT)
- stateSHUTDOWN
- break
- case SHUTDOWN

This course so far

- 1.1 Introduction to languages DFAs
- 1.2 NFAs and DFAs recognize the same class of

languages - 1.3 REX generates the same class of languages
- Three different programming "languages" specified

in different levels of formality that solve the

same types of computational problems - Four, if you count GNFAs
- Five, if you count UFAs

Strategies

- If you're investigating a property of regular

languages, then as soon as you know L 2 REG, you

know there are DFAs, NFAs, Regexes that describe

it. Use whatever representation is convenient - But sometimes you're investigating the properties

of the programs themselves changing states,

adding a to a regex, etc. Then the knowledge

that other representations exist might be

relevant and might not

All finite languages are regular

- Theorem (not in book) FIN µ REG
- Proof Suppose L 2 FIN.
- Then either L , or L s1, s2, ?, sn where

n2N and each si2?. - A regular expression describing L is, therefore,

either or - s1 s2 ? sn QED
- Note that this proof does not work for n1

Picture so far

ALL

Each point is a language in this Venn

diagram REG L(DFA) L(NFA) L(REX)

L(UFA) L(GNFA) ? FIN

REG

is there a language out here?

FIN

"the class of languages generated by DFAs"

1.4 Nonregular languages

- For each possible language L,
- µ L. So is the smallest language. And is

regular - L µ ?. So ? is the largest language. And ? is

regular - Yet there are languages in between these two

extremes that are not regular

A nonregular language

- B 0n 1n n 0
- ?, 01, 0011, 000111, ?
- is not regular
- Why?
- Q how many bits of memory would a DFA need in

order to recognize B? - A there appears to be no single number of bits

that's big enough to work for every element of B - Remember, the DFA needs to reject all strings

that are not in B

Other examples

- C w20,1 n0(w) n1(w)
- Needs to count a potentially unbounded number of

'0's... so nonregular - D w20,1 n01(w) n10(w)
- Needs to count a potentially unbounded number of

'01' substrings... so ?? - Need a technique for establishing nonregularity

that is more formal and... less intuitive?

Proving nonregularity

- To prove a language that a language is

nonregular, you have to show that no DFA

whatsoever recognizes the language - Not just the DFA that is your best effort at

recognizing the language - The pumping lemma can be used to do that
- The pumping lemma says that every regular

language satisfies the "regular pumping property"

(RPP) - Given this, if we can show that a language like B

doesn't satisfy the RPP, then it's not regular

Pumping lemma, informally

- Roughly "if a regular language contains any

'long' strings, then it contains infinitely many

strings" - Start with a regular language and suppose that

some DFA M(Q,?,?,q0,F) for it has Q10 states. - What if M accepts some particular string s where

sc1c2?c15 so that s15?

q0

Pigeonhole principle

- With 15 input characters, the machine will visit

at most 16 states - But there are only 10 states in this machine
- So clearly it will visit at least one of its

states more than once - Let rpt be our name for the first state that is

visited multiple times on that particular input s - Let acc be our name for the accepting state that

s leads to, namely, ?(q0,s) acc - Let y be our name for the leftmost substring of s

for which ?(rpt, y)rpt - Since there are no ? transitions in a DFA, a

state being "visited multiple times" means that

it read at least one character. Therefore, y gt

0

sequence of states that M visits after

readingthe characters below

gt0

10

After reading c1? c10 (first 10 chars of s), M

must have already been to state rpt and returned

to it at least once... because there are only 10

states in M. Of course the repetition could have

been encountered earlier than 10 characters too...

sequence of states that M visits after

readingthe characters below

gt0

10

Assigning new names to the pieces of s...

sequence of states that M visits after

readingthe characters below

gt0

10

Assigning new names to the pieces of s... So s

xyz as shown above. With these names, the other

constraints can be written y gt 0 xy 10

M accepts other strings too

- Consider the string xz

M accepts other strings too

- Consider the string xz
- ?(q0,x) rpt
- ?(rpt,z) acc (from previous slide)
- So xz 2 L(M) too

M accepts other strings too

- Consider the string xyyz
- ?(q0,xy)rpt (from 2 slides ago)
- ? (rpt,y)rpt (from same previous result)
- ? (rpt,z)acc (from same previous result)
- So xyyz2 L(M) also
- Apparently we can repeat y as many times as we

want

p-regular-pumpable strings

- Definition (not in textbook) A string s is said

to be p-regular-pumpable in a language L µ ? if

there exist x,y,z 2 ? such that - sxyz ("x,y,z are a decomposition of s")
- ygt0
- xy p
- For all i 0,
- x yi z 2 L ("the y part of s can be pumped

to produce other strings in the language") - It follows that s must be a member of L for it to

be p-pumpable - The 15-character string s in the previous example

was 10-pumpable in L(M)

p-regular-pumpable languages

- Definition A language L is p-regular-pumpable if
- for every s 2 L such that s p, the string s

is p-pumpable in L - in other words, "every long enough string in L is

pumpable" - Our previous example language was

15-regular-pumpable

RPP(p) and RPP

- Definition RPP(p) is the class of languages that

are p-regular-pumpable. In other words,RPP(p)

Lµ? L is p-regular-pumpable - Definition RPP is the class of languages that are

p-regular pumpable for some p. In other

words, - Lots of notation and apparent complexity, but the

idea is simple RPP is the class of languages in

which every long string is pumpable

Pumping lemma

- Theorem 1.70 (rephrased) If Lµ? is recognized

by a p-state DFA, then L 2 RPP(p) - Proof Just like our example, but use p instead of

the constant 15 (number of states) - Corollaries
- REG µ RPP

Primary application of Pumping Lemma

Proving a language nonregular

- First unravel these definitions, but it amounts

to proving that L is not a member of RPP. Then

it follows that L isn't regular - Proving that L isn't in RPP allows you to

concentrate on the language rather than

considering all possible proposed programs that

might recognize it

Unraveling RPP a direct rephrasing

- Rephrasing L is a member of RPP if
- There exists p0 such that
- For every s2L satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 L

(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated

Question from last time

- (Question) Didn't you earlier say "regular

languages are closed under concatenation"? - (Answer) No, I wrote that REG is closed under

concatenation - Subtle but important distinction. REG (the class

of all regular languages) is closed under

language concatenation - If A,B2REG then AB2REG
- That does not mean that each regular language is

itself closed under string concatenation - 10, 1 2 REG but 101 10, 1

Nonregularity proof by contradiction

- Claim Let B 0n 1n n 0 . Then B is not

regular - Proof We show that B is not a member of RPP by

contradiction. - So assume that B 2 RPP (and hope to reach a

contradiction soon). Then there exists p 0

associated with the definition in RPP. - We let s 0p 1p. (Not the exact same variable

as in the RPP property, but an example of one

such possible setting of it.) Now we know that s

2 B because it has the right form.

Proof continued

- Now s 2p p. By assumption that B 2 RPP,

there exist x,y,z such that - sxyz ( 0p 1p, remember)
- ygt0
- xy p
- For all i 0,
- x yi z 2 B
- Part (3) implies that xy 2 0 because the first

p-many characters of sxyz are all 0 - So y consists solely of '0' characters
- ... at least one of them, according to (2)

Proof continued

- But consider
- s xyz xy1z 0p 1p (where we started)
- y consists of one or more '0' characters
- so xy2z contains more '0' characters than '1'

characters. In other words, - xy2z 0py 1p
- so xy2z B 0n 1n n 0 .
- This contradicts part (4)!!
- Since the contradiction followed merely from the

assumption that B2RPP (and right and meet and

true reasoning about which we have no doubt),

that assumption must be wrong QED

Observations

- We needed (and got) a contradiction that was a

necessary consequence of the assumption that B 2

RPP and then relied on the Theorem 1.70

corollaries - RPP mainly concerns strings that are longer than

p - So you should concentrate on strings longer than

p... - even though p is a variable. But clearly

0p1pgtp - In our example we didn't "do" much after our

initial choice of s and thinking about the

implications we found a contradiction right away - Many other choices of s would work, but many

don't, and even some that do work require more

complex argumentsfor example, s0bp/2c1

1bp/2c1 - Choosing s wisely is usually the most important

thing

Picture so far

ALL

Each point is a language in this Venn diagram

RPP

We'll see anexample later

0(101)

REG

0101, ?

FIN

B 0n 1n n 0

More on contradictions

- Consider this shortcut attempt to prove that B

0n 1n n 0 is not regular - Proof Suppose B2 RPP. By RPP,
- There exists p0 such that
- For every s2B satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 B
- So let s (1010)p. Then s B, which is

inconsistent with the RPP statement.

Contradiction??

NO

Simplifying RPP proofs

- I find it easier to forget about contradiction

proofs and instead prove directly that a language

is not in RPP - So we need a direct, formal version of of the

statement that L RPP

Unraveling RPP (repeat)

- Rephrasing L is a member of RPP if
- There exists p0 such that
- For every s2L satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 L

(9 p) (8 s) (9 x,y,z) (8 i) !!!Pretty complicated

Unraveling non-RPP

- Rephrasing L is not in RPP if
- For every p0
- There exists some s2L satisfying s p such

that - For every x,y,z 2 ? satisfying 1-3
- sxyz,
- ygt0, and
- xy p
- There exists some i 0 for which
- x yi z L

(8 p) (9 s) (8 x,y,z) (9 i) Still complicated

but you don't have to use contradiction now

A direct proof of nonregularity

- Let Dan2 n0 ?,a1,a4,a9, ? ('a' is just

some character). Then D is not regular. - Proof idea The pumping lemma says there's a

fixed-size loop in any DFA that accepts long

strings. You can repeat the characters in that

loop as many times as you want to get longer

strings that the machine accepts. Each time you

add a repetition you grow the pumped string by a

constant length. - But the spacing between strings in D above keeps

changing it's never constant. So D doesn't have

the pumping property.

A direct proof of nonregularity

- Let Dan2 n0 ?,a1,a4,a9, ?. Then D is

not in RPP and thus not regular. - Proof Let p0 and set sa(p1)2. Then s2D and

sgtp (so such an s certainly exists). - Now let x,y,z2? be any strings satisfying
- xyz s a(p1)2
- ygt0, and
- xy p
- Our goal is to produce some i such that xyiz D

Direct proof continued

- (We'll actually show that xy0z D)
- Observe that yaj for some 1 j p, so
- xy0z a(p1)2-j lt (p1)2
- Since j p we know that -j -p and thus
- xy0z (p1)2 - j
- (p1)2 - p
- p2 p 1
- gt p2
- In other words, xy0z has gt p2 characters and lt

(p1)2 characters. So xy0z is not a perfect

square and thus xy0z D. QED

Direct or contradiction proof?

- Both work fine... it's your choice
- But you must clearly state what you are doing
- If proof by contradiction, say so
- If direct proof, say so

Game theory formulation

- The direct proof technique can be formulated as a

two-player game - You are the player who wants to establish that L

is not pumpable - Your opponent wants to make it difficult for you

to succeed - Both of you have to play by the rules

Game theory continued

- The game has just four steps.
- Your opponent picks p0
- You pick s2L such that s p
- Your opponent chooses x,y,z 2 ? such that sxyz,

ygt0, and xy p - You produce some i 0 such that xyiz L

Game theory continued

- If you are able to succeed through step 4, then

you have won only one round of the game - Like winning one round of Tic-tac-toe
- Do example for a member of D
- To show that a language is not in RPP you must

show that you can always win, regardless of your

opponent's legal moves - Realize that the opponent is free to choose the

most inconvenient or difficult p and x,y,z

imaginable that are consistent with the rules

Game theory continued

- So you have to present a strategy for always

winning and convincingly argue that it will

always win - So your choices in steps 2 4 have to depend on

the opponent's choices in steps 1 3 - And you don't know what the opponent will choose
- So your choices need to be framed in terms of the

variables p, x, y, z

Game theory continued

- Ultimately it is not very different from the

direct proof - But it states clearly what choices you may make

and what you may not a common cause of errors

in proofs - Repeat previous proof in this framework

A direct proof of nonregularity

Step 1, opponent's choice

Step 2, your choice and reasoning

- Let Dan2 n0 ?,a1,a4,a9, ?. Then D is

not in RPP and thus not regular. - Proof Let p0 and set sa(p1)2. Then s2D and

sgtp (so such an s certainly exists). - Now let x,y,z2? be any strings satisfying
- xyz s a(p1)2
- ygt0, and
- xy p
- Our goal is to produce some i such that xyiz D

Step 3, opponent's choice

Direct proof continued

- (We'll actually show that xy0z D)
- Observe that yaj for some 1 j p, so
- xy0z a(p1)2-j lt (p1)2
- Since j p we know that -j -p and thus
- xy0z (p1)2 - j
- (p1)2 - p
- p2 p 1
- gt p2
- In other words, xy0z has gt p2 characters and lt

(p1)2 characters. So xy0z is not a perfect

square and thus xy0z D. QED

Step 4, your choice

Step 4, your reasoning

Unraveling RPP (repeat)

- Rephrasing L is a member of RPP if
- There exists p0 such that
- For every s2L satisfying s p,
- There exist x,y,z 2 ? such that
- sxyz
- ygt0
- xy p
- For all i 0,
- x yi z 2 L
- Theorem REG µ RPP

Structural facts about RPP

- If L 2 RPP(p) (meaning "strings in L with length

p are pumpable") and qgtp then L 2 RPP(q) - If L RPP(q) and qgtp then L RPP(p)

(contrapositive of 1) - Thus if you have a proof that establishes L

RPP(q) only when q5, that's good enough it

follows that L is not regular - Relevant for C is not regular problem

Structural facts about RPP

- If L 2 FIN and the longest string in L has length

n, then - L 2 RPP(n1)
- L RPP(q) for all q lt n1
- Note RPP is a class of languages that's only

interesting because of its relation to REG. It

is not a reasonable proposal for a computation

model!

Unraveling non-RPP (repeat)

- L is not in RPP if
- For every p0 (opponent choice)
- There exists some s2L satisfying s p such

that (your choice) - For every x,y,z 2 ? satisfying 1-3
- sxyz,
- ygt0, and
- xy p
- There exists some i 0 for which
- x yi z L

(opponent's)

(yours)

Another example

- Let C 0m 1n m ? n . Is C regular? Try to

prove it isn't - Set s0p 12p. If opponent chooses x?, y0p,

z12p, then we can set i2 and win because

xy2z02p 12p C. - What if opponent chooses a shorter y?
- Looks like it's relatively easy to be a member of

C and hard to not be a member of C - Can force opponent to choose y 2 0
- So try to arrange it so that no matter what y

is, some number of repetitions of it will match

the target number of '1's

Direct proof?

- Hmmm

Using closure properties

- Can simplify argument a great deal
- Fact If L is not regular then Lc is not regular

either. - Proof If L is not regular but Lc were regular,

then (Lc)c would also be regular because REG is

closed under complement. But (Lc)c L QED - Recall the languagesB 0m 1n m n C

0m 1n m ? n C is similar to Bc...

Using closure properties

- Start over
- B 0m 1n m n (known nonreg)C 0m 1n

m ? n (suspected nonreg) - Certainly B µ Cc
- If mn then it's true that (not m ? n)
- But B ? Cc
- Find example x 2 Cc - B...
- On the other hand, B 01 Å Cc

Using closure properties

- Fact If L1ÅL2 REG and L1 2 REG, then L2 REG
- Proof Suppose (a) L1Å L2 REG and L12 REG and

(b) L22REG. Since REG is closed under Å we know

that L1ÅL2 2 REG, but that contradicts assumption

(a). Thus (a) and (b) can't both be true. QED

Topics for Exam 1

- Basic objects
- The main hierarchy alphabets, strings,

languages, classes - Functions
- Relations
- Sets and operations on sets
- , Å, complement, , P(S), A-B, S
- µ, 2
- element predicate(element)
- Propositional and predicate logic
- 8 and 9

Topics for Exam 1

- Strings
- ? versus
- Operations on strings concatenation,

exponentiation, reversal - Languages
- Operations concatenation, exponentiation,

reversal, , Å, , complement, everything

applicable to sets, ? versus - Language classes
- FIN, REG, ALL

Topics for Exam 1

- REG and its many formulations
- DFA, NFA, GNFA, UFA, REX
- Syntax and semantics of each model
- L() as program-to-language operator
- Conversions between models
- Subset construction for NFA, UFA
- DFA ! GNFA ! REX
- REX ! NFA

Topics for Exam 1

- Closure properties of language classes
- REG as a reasonable model of computation
- Arguments for, against
- Homework problems through homework 3
- Lectures reading up through section 1.3

(excluding nonregularity)

Exam 1

- You may bring and consult a single-sided,

handwritten sheet of notes, which you must turn

in with the exam (and will get back later)

Applying these closure properties

- B 0m 1n m n C 0m 1n m ? n
- 01 Å Cc B
- Thus C is nonregular too

obviously regular

known to be nonregular

therefore nonregular

Another closure properties attempt

- B 0m 1n m n 0n 1n n 0

(known nonreg) - BB 0n1n 0m1m n,m 0
- Want to show that BB REG
- We know that REG is closed under language

concatenation. What does that say about whether

BB is regular or not? - Is the class of non-regular languages (REGc)

closed under language concatenation too?

No

- Let ? a and D an2 n 2
- Then Dc ak k 1 or k is not a square

?, a1,a2,a3,a5,a6,a7,a8,a10,? - We previously proved that D REG
- Thus Dc REG (by "fact" we proved)
- But Dc Dc a 2 REG !!!
- Thus REGc is not closed under language

concatenation

Back to problem

- B 0m 1n m n (known nonreg)BB 0n1n

0m1m n,m 0 - Want to show that BB REG
- But there's no general result for that
- When applying a closure property, you have to

make sure it's true! - Nonetheless, it is true that BB REG
- Because (BB) Å 01 B

Chapter 1 closing considerations

- We don't and won't have many results about the

class REGc - Being nonregular says that the language lacks a

certain type of structure it's more complicated

than a DFA can handle - All real computers are finite devices and all

finite languages are regular - Yet the programming models are brittle the

program has to change for larger and larger

inputs - We've seen some easy-to-specify languages that

aren't regular - So REG is not a good general-purpose programming

model...?