CSE 3813 Introduction to Formal Languages and Automata - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

CSE 3813 Introduction to Formal Languages and Automata

Description:

The length of a path is the number of nodes it contains (for this class, we will ... The middle string x can be of arbitrary length. ... – PowerPoint PPT presentation

Number of Views:205
Avg rating:3.0/5.0
Slides: 45
Provided by: genebo
Category:

less

Transcript and Presenter's Notes

Title: CSE 3813 Introduction to Formal Languages and Automata


1
CSE 3813Introduction to Formal Languages and
Automata
  • Chapter 8
  • Properties of Context-free Languages
  • These class notes are based on material from our
    textbook, An Introduction to Formal Languages and
    Automata, 4th ed., by Peter Linz, published by
    Jones and Bartlett Publishers, Inc., Sudbury, MA,
    2006. They are intended for classroom use only
    and are not a substitute for reading the textbook.

2
The pumping lemma for context-free languages
  • Suppose you have a CFG G in which the variable
    A is used in two different rules, to derive two
    different strings, e.g.,
  • (1) S ? vAz
  • (2) A ? wAy
  • (3) A ? x
  • We can use these rules, applying rule 2
    recursively, to generate the following string
  • S ? vAz ? vwAyz ? vwwAyyz ? vwwwAyyyz ? ...
    ? vwnxynz.

3
The pumping lemma for CFLs
  • Of course, we can apply rule 3 at any point along
    the way to bring the process to a halt. Thus,
    the following strings are all legitimate strings
    in the language
  • vwxyz, vwwxyyz, vwwwxyyyz, etc.
  • In fact, with rules 2 and 3 in the language,
    there is no way to prevent the language from
    containing an infinite number of strings of the
    form vwnxynz.

4
The pumping lemma for CFLs
  • Remember the definition of Chomsky Normal Form
    grammars A CFG is in Chomsky Normal Form if
    every production is of one of these two types
  • A ? BC
  • A ? a
  • Remember also that we can put any CFG grammar
    into CNF (omitting the null string, if it belongs
    to the original language).

5
The pumping lemma for CFLs
  • If a grammar is in CNF, then its derivation tree
    will be binary that is, every node will have at
    most two children. Why? There are only 3
    possibilities
  • (1) The node represents the first type of rule
    above, in which a single variable produces two
    variables.
  • (2) The node represents the second type of rule
    above, in which a single variable produces a
    single terminal.
  • (3) The node is a terminal node and so has no
    children.

6
The pumping lemma for CFLs
  • A path in a binary tree is either empty, or
    consists of a node, one of its descendants, and
    all of the nodes in between.
  • The length of a path is the number of nodes it
    contains (for this class, we will us this
    definition however, most of the time length and
    height are in terms of the number of edges, not
    number of nodes).
  • The height of a binary tree is the length of its
    longest path.

7
The pumping lemma for CFLs
  • You could create a very tall binary tree by
    having all branches be unary.
  • You can create the shortest possible binary
    tree by having all of its branches be binary,
    except possibly for some or all of the branches
    at the bottom level of the tree.

8
The pumping lemma for CFLs
  • What is the smallest height possible in a
    binary tree of 7 nodes? How many leaf nodes does
    it have?

height 3 num. leaves 4
9
The pumping lemma for CFLs
  • What is the smallest height possible in a binary
    tree of 15 nodes? How many leaf nodes does it
    have?

height 4 num. leaves 8
10
The pumping lemma for CFLs
  • What is the smallest height possible in a
    binary tree of 31 nodes? How many leaf nodes
    does it have?

height 5 num. leaves 16
11
The pumping lemma for CFLs
  • What is the smallest height possible in a binary
    tree of (2n) - 1 nodes? How many leaf nodes does
    it have?
  • height n
  • num. leaves 2n-1

12
The pumping lemma for CFLs
  • Note the pattern here
  • In a completely filled binary tree with (2n) 1
    nodes, half of the nodes (rounding up) will be
    leaves. That is, (2n) / 2 nodes will be leaf
    nodes. And we can rewrite (2n) / 2 as 2n-1.
  • This leads us to the following lemma

13
The pumping lemma for CFLs
  • Lemma
  • For any h ? 1, a binary tree which has more than
    2h-1 leaf nodes must have a height greater than
    h.
  • Example
  • If a binary tree has 17 leaf nodes, can it have
    a height of 5?
  • No a complete binary tree of height 5 has only
    16 leaf nodes. A binary tree with 17 leaves must
    have a height greater than 5.

14
The pumping lemma for CFLs
  • Here is the point of all this
  • If the height of the derivation tree for a given
    string in the language is h, and there are fewer
    than h production rules in the grammar, then at
    least one rule must recur on the same path in the
    derivation of this string.

15
The pumping lemma for CFLs
  • For a variable to recur farther down in the same
    path, it must be either
  • self-recursive (e.g., A ? aA)
  • or
  • path-recursive (e.g., A ? aB, and B ? bA )
  • In either case, this variable may be pumped an
    unrestricted number of times.

16
Theorem 8.1
  • Let L be a CFL. Then there is an integer m so
    that for any w ? L satisfying w ? m, there are
    strings u, v, x, y, and z satisfying
  • w uvxyz
  • vy gt 0
  • vxy ? m
  • for any i gt 0, uvixyiz ? L

17
The pumping lemma for CFLs
  • We can use the pumping lemma for context-free
    languages to prove that there must exist some
    language that is not context-free.
  • We do this by assuming that the language is
    context free this means that there must be an m
    satisfying the conditions given above.
  • If we find that this causes a contradiction,
    then we know the language cant be a CFL.

18
Proof
  • Given the language L aibici i ? 1, assume
    that L is context-free.
  • Let w ambmcm, with w ? m.
  • According to theorem 8.1, vy gt 0. Thus, v
    and y together must contain at least one type of
    symbol.
  • According to theorem 8.1, vxy ? m. Thus, the
    string vxy can contain at most two distinct types
    of symbols.

19
Proof
  • The string vxy cant contain all three symbols,
    a, b, and c. (Why? Because vxy ? m.)
  • The string uv2xy2z contains additional
    occurrences of the symbols in v and y.
  • Therefore, uv2xy2z cannot contain equal numbers
    of all three symbols.
  • But the pumping lemma says that uv2xy2z must be
    a legitimate string in L. Obviously, this is a
    contradiction.
  • Consequently, L cannot be a context-free
    language.

20
Example
  • Given the language L aibici i ? 1, how
    would you try to process this language using a
    push-down automaton?
  • We can insure that we have an equal number of as
    and bs, by pushing the as onto the stack one at
    a time, then popping them off and matching them
    up with the bs one by one.

21
Example
  • However, once we have done that, we dont have
    anything left to match the cs with, so we cant
    guarantee that we have the same number of cs as
    as and bs.
  • We cant solve this problem by pushing as or
    bs back onto the stack.
  • This is due to the limitations of the type of
    memory we have in a PDA.

22
Pumping lemma (again)
  • The pumping lemma for regular languages states
    every sufficiently long string in a regular
    language contains a short substring that can be
    pumped.
  • The pumping lemma for context-free languages
    states every sufficiently long string in a
    context-free language contains two short (and
    close-together) substrings that can be pumped
    (the same number of times).

23
Formal statement (again)
Let L be a context-free language. Then there
exists some positive integer m such that any
string w ? L of length w ? m can be decomposed
into substrings, u, v, x, y, z, such that w
uvxyz, and vxy ? m, v gt 0 or y gt
0, uvkxykz ? L, for k ? 0
24
Informal statement
Every context-free language has a pumping
length such that every string in the language
that is longer than this can be pumped to yield
another string in the language. The string can
be divided into five parts such that the second
and fourth parts can be repeated together, or
pumped, any number of times, and the resulting
string remains in the language.
25
What is m?
In the pumping lemma for regular languages, the
pumping length m reflects the number of states
of the finite automaton. In the pumping lemma
for context-free languages, what does m reflect?
Roughly, it is the length of the longest string
that can be generated by a parse tree in which
the same nonterminal never occurs twice on the
same path through the tree.
26
In a sufficiently large parse tree, some
nonterminal must repeat along some path from the
root. This follows from the pigeonhole principle.
S
A
A
u v x y
z
27
Proof Idea
  • The repetition of some nonterminal along a path
    through the parse tree allows us to replace the
    subtree under the last occurrence of the
    nonterminal with the subtree under an earlier
    occurrence of the nonterminal and still get a
    valid parse tree
  • This corresponds to pumping v and y
  • Note that the parse tree of the previous slide
    corresponds to the following derivation

28
Important to remember
You can use a pumping lemma to prove that a
language is not context-free (or regular). You
cannot use a pumping lemma to prove that a
language is context-free (or regular).
29
Exercise
The language L ww w ? a, b is not
context-free. Pick a string in L. Try ambmambm.
Then note that you must consider three cases.
It must be the case that vxy is a substring of
the prefix ambm, or the middle bmam, or the
suffix ambm. Intuitively, why cant a PDA accept
this language, although it can accept the
language wwR w ? a, b?
30
Definition 8.1 Linear Languages
A context-free language L is said to be linear if
there exists a linear context-free grammar G such
that L L(G). (Remember that a linear grammar
has at most one variable on the right side of
each production rule.)
31
Theorem 8.2 Pumping Lemma for Linear Languages
Let L be an infinite linear language. Then there
exists some positive integer m, such that any w ?
L, with w ? m can be decomposed as w uvxyz
with uvyz ? m vy ? 1 such that uvixyiz
? L for all i 0,1,2
32
Pumping Lemma for Linear Languages
Note that the conclusion for this theorem is
different from Theorem 8.1, since in 8.1 we
have vxy ? m and in Theorem 8.2 we
have uvyz ? m This implies that the strings v
and y to be pumped must now be within m symbols
of the left and right ends of w, respectively.
The middle string x can be of arbitrary
length. Theorem 8.2 helps establish the fact that
the family of linear languages is a proper subset
of the family of context-free languages.
33
Closure properties for context-free languages
The family of context-free languages is closed
under the operations of Union Concatenation K
leene closure but not under the operations
of Intersection Complementation
34
Definition
  • A context-free grammar (CFG) is a 4-tuple
  • G (V, T, S, P) where V and T are disjoint
    sets, S ? V, and P is a finite set of rules of
    the form A ? x, where A ? V and x ? (V ? T).
  • V non-terminals or variables
  • T terminals
  • S Start symbol
  • P Productions or grammar rules

35
Closure properties of CFGs
  • CFLs are closed under Union, Concatenation and
    Kleene closure.
  • Proof by construction
  • Let
  • G1 (V1, T1, S1, P1) and
  • G2 (V2, T2, S2, P2)
  • with
  • L1 L(G1) and
  • L2 L(G2)

36
Union
  • We create grammar Gu (Vu, T1 ? T2, Su, Pu)
    generating
  • L1 ? L2
  • 1. Rename the elements of V2 if necessary so
    that V1 ? V2 ?.
  • 2. Create a new start symbol Su, not already in
    V1 or V2.
  • 3. Set Vu V1 ? V2 ? Su
  • 4. Set Pu P1 ? P2 ? Su ? S1 S2
  • Construction completed.

37
Concatenation
  • We create grammar Gc (Vc, T1 ? T2, Sc, Pc)
    generating L1L2
  • 1. Rename the elements of V2 if necessary so
    that V1 ? V2 ?.
  • 2. Create a new start symbol Sc, not already in
    V1 or V2.
  • 3. Set Vc V1 ? V2 ? Sc
  • 4. Set Pc P1 ? P2 ? Sc ? S1S2
  • Construction completed.

38
Closure under Kleene star
  • Let G1 be any context-free grammar with the
    starting symbol S. Adding the rules
  • S ? ? and
  • S ? SS
  • creates a new context-free grammar G2 such that
    L(G2) is the result of applying the Kleene star
    operator to L(G1).

39
Kleene Closure
  • We create grammar G (V, T, S, P) generating
    L1
  • 1. Create a new start symbol S, not already in
    V1.
  • 2. Set V V1 ? S
  • 3. Set P P1 ? S ? S1S l
  • Construction completed. (See text for
    justification.)

40
Not closed under intersection
  • The context-free languages are not closed under
    Intersection. However, the intersection of a
    context-free language with a regular language is
    always a context-free language.
  • The context-free languages are not closed under
    Complementation

41
Corollary
  • Are Regular Languages context free?
  • Yes.
  • Why?
  • We can express any Regular language in the form
    of a CFG.
  • Regular languages are a proper subset of CFGs.

42
Are Regular Languages context free?
  • Proof
  • According to your textbook, the set of regular
    languages is the smallest set that contains all
    languages ?, l, and a (for every a ? S) and
    is closed under the operations of union,
    concatenation, and Kleene. We just demonstrated
    that the operations of union, concatenation, and
    Kleene on CFGs produce CFGs, so all we need to
    do is show that the languages ?, l, and a
    have CFGs.

43
Are Regular Languages context free?
  • The empty language can be written
  • S ? S
  • The language consisting of a null string can be
    written
  • S ? l
  • The language consisting of single characters can
    be written
  • S ? a
  • QED

44
Decision properties of context-free languages
Can decide Membership Empty Infinite But there
is no algorithm for deciding whether two CFGs
generate the same language!
Write a Comment
User Comments (0)
About PowerShow.com