Title: COGN1001: Introduction to Cognitive Science Topics in Computer Science Formal Languages and Models of Computation
1COGN1001 Introduction to Cognitive
ScienceTopics in Computer Science Formal
Languages and Models of Computation
- Qiang HUO
- Department of Computer Science
- The University of Hong Kong
- (E-mail qhuo_at_cs.hku.hk)
2Outline
- What is a Formal Language?
- Phrase-Structure Grammars
- Finite State Automata
- Formal languages and Models of Computation
3Natural Language vs. Formal Language
- Natural language
- written and/or spoken languages in the world,
such as Chinese, English, Japanese, German,
French, Spanish, etc. - Syntax
- Semantics
- Formal language
- a language specified by a well-defined set of
rules of syntax. - A study of formal languages is important to
computer science. - For example, we need to understand what kind of
statements are acceptable in the C programming
language. This is the task of a compiler of a
programming language.
4Formal Language
- We will describe the sentences of a formal
language using a grammar. - How can we determine whether a combination of
words is a valid sentence in a formal language? - How can we generate the valid sentences of a
formal language? - We will only be interested in the syntax, not the
semantics (meaning), of a language.
5- a sentence is made up of a noun-phrase followed
by a verb-phrase - a noun-phrase is made up of an article followed
by an adjective followed by a noun, or - a noun-phrase is made up of an article followed
by a noun - a verb-phrase is made up of a verb followed by an
adverb, or - a verb-phrase is made up of a verb
- an article is a, or
- an article is the
- an adjective is large,
- an adjective is hungry
- a noun is rabbit, or
- a noun is mathematician
- a verb is eats, or
- a verb is hops
- an adverb is quickly, or
- an adverb is wildly.
If we define a subset of English using the list
of rules shown here that describe how a valid
sentence can be produced, how the language looks
like?
6Example a Subset of English
- From the previous rules we can form valid
sentences using a series of replacements until no
more rules can be used. - For instance, the valid sentence the large rabbit
hops quickly can be obtained by the following
sequence of replacements - sentence
- noun-phrase verb-phrase
- article adjective noun verb-phrase
- article adjective noun verb adverb
- the adjective noun verb adverb
- the large noun verb adverb
- the large rabbit verb adverb
- the large rabbit hops adverb
- the large rabbit hops quickly
- Some other valid sentences
- a hungry mathematician eats wildly
- the rabbit eats quickly
- An invalid sentence the quickly eats
mathematician
7Some Terminologies
- A vocabulary (or alphabet) V is a finite,
nonempty set of elements called symbols. - A word (or sentence) over V is a string of finite
length of elements of V . - The empty string or null string, denoted by ?, is
the string containing no symbols. - The set of all words (or sentences) over V is
denoted by V. - A language over V is a subset of V .
- Example In English,
- The alphabet V consists of English letters and
other symbols. - A word (or sentence) over V is a finite string of
symbols. - The meaningful word (or sentence) of English is a
subset of V .
8How to specify a language?
- to list all the words (or sentences) in the
language or - to give some criteria that a word (or a sentence)
must satisfy to be in the language or - to specify a language through the use of a
grammar, such as the set of rules we gave in the
previous example of English subset.
9Outline
- What is a Formal Language?
- Phrase-Structure Grammars
- Finite State Automata
- Formal languages and Models of Computation
10What is a Phrase-Structure Grammar?
- A phrase-structure grammar is G (V,T,S,P),
where - V is a vocabulary
- T is a subset of V consisting of terminal
elements (i.e., the elements of V which can not
be replaced by other symbols) - The elements of N VT are called nonterminal
symbols (i.e., the elements of V which can be
replaced by other symbols) - S is a start symbol from V (i.e., the element of
the V that we always begin with - P is a set of productions.
- We denote by w0?w1 the production that specifies
that w0 can be replaced by w1. - Every production in P must contain at least one
nonterminal on its left side.
11Example a Phrase-Structure Grammar
- G (V,T,S,P), where
- V a, the, large, hungry, rabbit,
mathematician, eats, hops, quickly,
wildly sentence, noun-phrase,
verb-phrase, article, adjective, noun, verb, - adverb
- T a, the, large, hungry, rabbit,
mathematician, eats, hops, quickly, wildly
- VT sentence, noun-phrase, verb-phrase,
article, adjective, noun, verb, adverb - S sentence
- Production rules P
12- P
- sentence ? noun-phrase verb-phrase,
- noun-phrase ? article adjective noun,
- noun-phrase ? article noun,
- verb-phrase ? verb adverb,
- verb-phrase ? verb,
- article ? a,
- article ? the,
- adjective ? large,
- adjective ? hungry,
- noun ? rabbit,
- noun ? mathematician,
- verb ? eats,
- verb ? hops,
- adverb ? quickly,
- adverb ? wildly
13Some Terminologies
- Let G (V,T,S,P) be a phrase-structure
grammar. - Let w0 lz0r and w1 lz1r be strings over V
. - If z0 ? z1 is a production of G, we say that w1
is directly derivable from w0 and we write w0?w1. - Example
- the adjective noun verb adverb ? the large
noun verb adverb because - adjective ? large
- If w0,w1, ,wn, n ? 0, are strings over V such
that - w0?w1, w1?w2, ,wn-1?wn, then
- we say that wn is derivable from w0, and
- we write w0 ?wn.
- The sequence of steps used to obtain wn from w0
is called a derivation.
14- Example sentence ? the large rabbit hops quickly
- via the following derivation
- sentence ? noun-phrase verb-phrase,
- noun-phrase verb-phrase ? article adjective noun
verb-phrase, - article adjective noun verb-phrase ? article
adjective noun verb - adverb,
- article adjective noun verb adverb ? the
adjective noun verb adverb, - the adjective noun verb adverb ? the large noun
verb adverb, - the large noun verb adverb ? the large rabbit
verb adverb, - the large rabbit verb adverb ? the large rabbit
hops adverb, - the large rabbit hops adverb ? the large rabbit
hops quickly.
15What is the language generated by a
Phrase-Structure Grammar?
- Let G (V,T,S,P) be a phrase-structure grammar.
- The language generated by G (or the language of
G), denoted by L(G), is the set of all strings of
terminals that are derivable from the starting
symbol S. - L(G) w ?T S?w
16- Example Suppose G (V,T,S,P), where V
a,b,A,B,S, - T a,b, S is the start symbol, and
- P S?ABa, A?BB, B?ab, AB?b .
- All the sentences" (words) generated by this
grammar are - abababa, ba , since
- S ? ABa ? BBBa ? abababa
- S ? ABa ? ba
- Example Let G be the grammar with V S,0,1, T
0,1, - starting symbol S, and production rules
- P S?11S, S?0 .
- L(G) (11)n 0 n 0,1,2, .
17How to construct a grammar that generates a given
language?
- Example
- Find a phrase-structure grammar to generate
the set - 0n1n n 0,1,2,
- Solution
- G (V,T,S,P), where
- V S, 0, 1 ,
- T 0,1 ,
- S is the start symbol, and
- P S?0S1,S?? .
18How to construct a grammar that generates a given
language??
- Example Find a phrase-structure grammar to
generate the set 0m1n m,n 0,1,2, - Solution 1 G1 (V,T,S,P), where
- V S,0,1, T 0,1, S is the start symbol,
and - P S?0S, S?S1, S??
- Solution 2 G2 (V,T,S,P), where
- V S,A,0,1, T 0,1, S is the start
symbol, and - P S?0S, S?1A, S?1, A?1A, A?1, S??
- ?
Two grammars can generate the same language!
19How to construct a grammar that generates a given
language???
- There are many techniques from the theory of
computation which can be used to systematically
construct a grammar for a given formal language,
but - This is beyond the scope of this course.
20Types of Phrase-Structure Grammars (1)
- Phrase-structure grammars can be classified
according to the types of productions that are
allowed. - Such a classification scheme introduced by Noam
Chomsky is as follows - Type 0 grammar has no restrictions on its
production. - Type 1, or context-sensitive, grammar can have
productions only of the form - w1 ? w2, where l(w1) ? l(w2), or of the form
- w1 ? ?.
- Type 2, or context-free grammar can have
productions only of the form - A? w2, where A is a nonterminal symbol.
21Types of Phrase-Structure Grammars (2)
- Type 3, or regular grammar can have productions
only of the form - A ? aB,
- A ? a,
- S ? ? ,
- where A and B are nonterminal symbols, S is
the start symbol, and a is a terminal symbol. - A language generated by a
- type 1 grammar is called a context-sensitive
language - type 2 grammar is called a context-free
language - type 3 grammar is called a regular language.
22Examples
- 0m1n m,n 0,1,2, is a regular language,
since it can be generated by a regular grammar G
with P - P S?0S, S?1A, S?1, A?1A, A?1, S??
- 0n1n n 0,1,2, is a context-free
language, since it can be generated by a
context-free grammar G with P - P S?0S1, S??
- 0n1n2n n 0,1,2, is a context-sensitive
language, since it can be generated by a type 1
grammar - G (V,T,S,P) with V 0,1,2,S,A,B, T
0,1,2, starting symbol S, and productions - P S?0SAB, S??, BA?AB, 0A?01, 1A?11, 1B?12,
2B?22 - but not by any type 2 grammar.
23Example a Phrase-Structure Grammar
- G (V,T,S,P), where
- V a, the, large, hungry, rabbit,
mathematician, eats, hops, quickly,
wildly sentence, noun-phrase,
verb-phrase, article, adjective, noun, verb, - adverb
- T a, the, large, hungry, rabbit,
mathematician, eats, hops, quickly, wildly
- VT sentence, noun-phrase, verb-phrase,
article, adjective, noun, verb, adverb - S sentence
- Production rules P
24- P
- sentence ? noun-phrase verb-phrase,
- noun-phrase ? article adjective noun,
- noun-phrase ? article noun,
- verb-phrase ? verb adverb,
- verb-phrase ? verb,
- article ? a,
- article ? the,
- adjective ? large,
- adjective ? hungry,
- noun ? rabbit,
- noun ? mathematician,
- verb ? eats,
- verb ? hops,
- adverb ? quickly,
- adverb ? wildly
25Example Backus-Naur Form
- What is the Backus-Naur Form of the grammar for a
subset of English described before? - ltsentencegt ltnoun phrasegtltverb phrasegt
- ltnoun phrasegt ltarticlegtltadjectivegtltnoungtltart
iclegtltnoungt - ltverb phrasegt ltverbgtltadverbgtltverbgt
- ltarticlegt a the
- ltadjectivegt large hungry
- ltnoungt rabbit mathematician
- ltverbgt eats hops
- ltadverbgt quickly wildly
26What is Backus-Naur Form (BNF)?
- There is another notation that is used to specify
a type 2 (context-free) grammar, called the
Backus-Naur Form - all productions having the same nonterminal as
their left-hand side are combined with the
different right-hand sides of these productions,
each separated by a bar ( ), with - nonterminal symbols enclosed in angular brackets
(ltgt), and - the symbol ? replaced by
- Example The Backus-Naur form for a grammar that
produces signed integers is as follows - ltsigned integergt ltsigngtltintegergt
- ltsigngt -
- ltintegergt ltdigitgtltdigitgtltintegergt
- ltdigitgt 0123456789
27What is a Derivation (or Parse) Tree?
- A derivation in the language generated by a
context-free grammar can be represented
graphically using an ordered rooted tree, called
a derivation (or parse) tree - the root represents the starting symbol,
- internal vertices represent nonterminals,
- leaves represent terminals, and
- the children of a vertex are the symbols on the
right side of a production, in order from left to
right, where the symbol represented by the parent
is on the left-hand side.
28Example
- Construct a derivation tree for the derivation of
the sentence, the hungry rabbit eats quickly,
discussed previously.
29How to determine whether a string is in the
language generated by a context-free grammar?
- Top-down parsing
- begins with the starting symbol and proceeds by
successively applying productions to see if the
given string can be derived. - Bottom-up parsing
- work backwards.
30- Example Determine whether the word cbab belongs
to the L(G), where, G (V,T,S,P) with - V a,b,c,A,B,C,S,
- T a,b,c,
- S is the starting symbol, and the productions
are - S ? AB
- A ? Ca
- B ? Ba
- B ? Cb
- B ? b
- C ? cb
- C ? b
- Top-down parsing
- S ? AB
- S ? AB ? CaB
- S ? AB ? CaB ? cbaB
- S ? AB ? CaB ? cbaB ? cbab
- Bottom-up parsing
- Cab ? cbab
- Ab ? Cab ? cbab
- AB ? Ab ? Cab ? cbab
- S ? AB ? Ab ? Cab ? cbab
31Outline
- What is a Formal Language?
- Phrase-Structure Grammars
- Finite State Automata
- Formal languages and Models of Computation
32Finite State Machines with No Output
- Finite-state machines with no output are also
called finite-state automata. - Finite-state automata do not generate output. But
they have a set of special states, called final
states. - A finite-state automaton is often used for
language recognition. - This application plays a fundamental role in the
design and construction of compliers for
programming languages.
33What is a Deterministic Finite-State Automaton?
- A finite-state automaton M (S,I,f,s0,F)
consists of - a finite set S of states,
- a finite input alphabet I,
- a transition function f that assigns a state to
every pair of state and input, - an initial state s0, and
- a subset F of S consisting of final states.
34How to represent a Finite-State Automaton?
- We can represent a finite-state automaton using
either a state table or a state diagram. Final
states are indicated in the state diagram by
using double circles.
- What is the state table of the above finite-state
automaton?
35What is the language recognized by a given
Finite-State Automaton?
- An input string is recognized or accepted by an
automaton M if the string takes the automaton to
one of its final states. - The language recognized by an automaton M,
denoted by L(M), is the set of all strings that
are recognized by M.
The language recognized by the above finite-state
automaton M is L(M) 0n,0n10x n0,1,2, ,
and x is any string .
36Deterministic vs Nondeterministic Finite-State
Automata
- The finite-state automata discussed so far are
deterministic, since for each pair of state and
input value there is a unique next state given by
the transition function. - There is another important type of finite-state
automaton in which there may be several possible
next states for each pair of state and input
value. - Such machines are called nondeterministic.
- Nondeterministic finite-state automata are
important in determining which languages can be
recognized by a finite-state automaton.
37What is a Nondeterministic Finite-State Automaton?
- A nondeterministic finite-state automaton
- M (S,I,f,s0,F) consists of
- a finite set S of states,
- a finite input alphabet I,
- a transition function f that assigns a set of
states to each pair of state and input, - an initial state s0, and
- a subset F of S consisting of final states.
38How to represent a Nondeterministic Finite-State
Automaton?
- Using a state table for each pair of state and
input value we give a list of possible next
states. - Using a state diagram include an edge from each
state to all possible next states, labelling
edges with the input(s) that lead to this
transition.
39What is the language recognized by a given
Nondeterministic Finite-State Automaton?
- What does it mean for a nondeterministic
finite-state automaton to recognize a string x
x1x2 xk? - x1 takes the starting state s0 to a set S1 of
states - x2 takes each of the states in S1 to a set of
states. - Let S2 be the union of these sets
- Continue this process, including at a stage all
states that can be obtained using - a state obtained at the previous stage and
- the current input symbol
- The string x is recognized or accepted if there
is a final state in the set of all states that
can be obtained from s0 using x. - The language recognized by a nondeterministic
finite-state automaton is the set of all strings
recognized by this automaton.
40Example
- Determine the language recognized by the
nondeterministic finite-state automaton M shown
in the following figure.
- Solution L(M) 0n, 0n01, 0n11 n0,1 ,2,
.
41An Important Fact
- Theorem
- If the language L is recognized by a
nondeterministic finite-state automaton M0, then
L is also recognized by a deterministic
finite-state automaton M1. - Two finite-state automata are called equivalent
if they recognize the same language.
42Outline
- What is a Formal Language?
- Phrase-Structure Grammars
- Finite State Automata
- Formal languages and Models of Computation
43Build an FSA from a Regular Grammar
- Suppose that G (V,T,S,P) is a regular grammar
generating the set L(G), where each production is
of the form - S ? ? , A ? a, or A ? aB, with a being a
terminal symbol, A and B are nonterminal symbols.
- We can build a nondeterministic finite-state
machine - M (S,I,f,s0,F) that recognizes L(G).
44- M (S,I,f,s0,F)
- S contains a state sA for each nonterminal
symbol A of G, and an additional final state sF - The start state s0 is the state formed from the
start symbol S - A transition from sA to sF on input of a is
included if - A ? a is a production
- A transition from sA to sB on input of a is
included if - A ? aB is a production
- s0 will also be a final state if S ? ? is a
production. - It can be shown that L(M) L(G).
45Example
- Construct a nondeterministic finite-state
automaton that recognizes the language generated
by the regular grammar G (V,T,S,P) where - V 0,1,A,S,
- T 0,1, and
- the productions in P are
- S ?1A, S ? 0, S ? ?,
- A ? 0A, A ? 1A, and
- A ? 1.
46Construct a Regular Grammar from an FSA
- Suppose that M (S,I,f,s0,F) is a finite-state
machine with the property that s0 is never the
next state for a transition. - A regular grammar G (V,T,S,P) can be defined as
follows - V is formed by assigning a symbol to each state
of S and each input symbol in I - T is formed from the input symbols in I
- S is the symbol formed from the start state s0
- The set P of productions is formed from the
transitions in M - As ? a is included if the state s goes to a final
state under input a, where As is the nonterminal
symbol formed from s - As ? aAt is included if the state s goes to t
under input a. - S ? ? is included if and only if ? ? L(M).
- It can be shown that L(G) L(M).
47Example
- Find a regular grammar that generates the
language recognized by the finite-state automaton
shown in the following figure
- Soultion G (V,T,S,P) where
- V S,A,B,0,1, the symbols S,A, and B
correspond to the states S0,S1, and S2,
respectively - T 0,1
- S is the start symbol and
- The productions are
- S ? 0A, S ? 1B, S ? 1, S ? ?,
- A ? 0A, A ? 1B, A ? 1,
- B ? 0A, B ? 1B, B ? 1.
48More Powerful Types of Machines (1)
- The main limitation of finite-state automata is
their finite amount of memory. This prevents them
from recognizing languages that are not regular,
such as 0n1nn 0,1,2,. - A more powerful model of computation called
pushdown automaton can be used to recognize the
above language. - Theorem A set is recognized by a pushdown
automaton if and only if it is the language
generated by a context-free grammar. - However, there are sets that cannot be expressed
as the language generated by a context-free
grammar. One such set is 0n1n2nn 0,1,2, .
49More Powerful Types of Machines (2)
- Actually, there exists an even more powerful
machine than pushdown automata, called linear
bounded automata which - can recognize context-sensitive languages such as
the sets - 0n1n2n n0,1,2, but they
- cannot recognize all the languages generated by
phrase-structure grammars. - The most general model of a computing machine is
the so-called Turing Machine which can - recognize all languages generated by
phrase-structure grammars - model all the computations that can be performed
on a computing machine.
50Future Scientists vs Engineers
- Scientists try to understand what is .
- Engineers try to create what has never been !
- The really great engineers have a strong
background in science so that they thoroughly
understand what is. - These special people also have to have the
imagination to create what has never been, and
this is what really sets them apart ! - The methodology of engineering research
- There exists some phenomenon of nature for which
a model should be found - The mathematical analysis is just a tool that
helps one to find this model - The results of any analysis should be confirmed
by experiments. - Future What you make it to be !
51Reference
- Sections 11.1, 11.3, 11.4 of the following book
- Kenneth H. Rosen, Discrete Mathematics and Its
Applications, Fifth Edition, McGraw-Hill
International Editions, 2004 or - The relevant sections of the above book in
earlier editions.