Title: Introduction%20to%20Computational%20Natural%20Language%20Learning%20Linguistics%2079400%20(Under:%20Topics%20in%20Natural%20Language%20Processing)%20Computer%20Science%2083000%20(Under:%20Topics%20in%20Artificial%20Intelligence)%20The%20Graduate%20School%20of%20the%20City%20University%20of%20New%20York%20Fall%202001
1Introduction to Computational Natural Language
LearningLinguistics 79400 (Under Topics in
Natural Language Processing)Computer Science
83000 (Under Topics in Artificial
Intelligence)The Graduate School of the City
University of New YorkFall 2001
- William Gregory Sakas
- Hunter College, Department of Computer Science
- Graduate Center, PhD Programs in Computer Science
and Linguistics - The City University of New York
2Meeting 1 (Overview) Todays agenda Why
computationally model language learning? Linguisti
cs, state space search and definitions Early
(classic) computational approaches Gold -
language cant be learned theorem Angluin - Oh
yes it can Artificial Neural Networks an
Introduction Tlearn software demonstration (if
time)
3Explicitness of the computational model can
ground linguistic theories - "...it may be
necessary to find out how language learning could
work in order for the developmental data to tell
us how is does work." (Pinker, 1979)
Can natural language grammar be modeled by
X? only if X is both descriptively adequate
(predicts perceived linguistic phenomena) and
explanatorily adequate (explains how the
phenomena come to be) (Bertolo, MIT Encyclopedia
of Cognitive Science)
If a computational model demonstrates that some
formally defined class of models cannot be
learned, X had better fall outside of that class
regardless of its descriptive adequacy.
4Generative Linguistics
phrase structure rule (PS) grammar - a formalism
based on rewrite rules which are recursively
applied to yield the structure of an utterance.
transformational grammar - sentences have (at
least) two phrase structures an original or
base-generated structure and the final or surface
structure. A transformation is a mapping from
one phrase structure to another.
principles and parameters - all languages share
the same principles with a finite number of
sharply delineated differences or parameters.
NON-generative linguistics. See Elman, Language
as a dynamical system.
5- Syntax acquisition can be viewed as a state
space search - nodes represent grammars including a start state
and a target state. - arcs represent a possible change from one
hypothesized grammar to another.
6Golds grammar enumeration learner (1967)
where s is a function that returns the next
sentence from the input sample being fed to the
learner, and L(Gi) is the language generated by
grammar Gi.
- Two points
- The learner is error-driven
- error-driven learners converge on the target in
the limit
7Learnability - Under what conditions is learning
possible? Feasibility - Is acquisition
possible within a reasonable amount of time
and/or with a reasonable amount of work?
A class of grammars H) is learnable iff ??a
learner such that ? G ? H, ? ? (fair) generable
by G, the learner converges on G.
8An early learnability result (Gold, 1967) Exposed
to input strings of an arbitrary target language
Ltarg L(Gtarg ) where Gtarg ? H , it is
impossible to guarantee that a learner can
converge on Gtarg if H is any class in the
Chomsky hierarchy. Moreover, no learner is
uniformly faster than one that executes simple
error-driven enumeration of languages.
H - The hypothesis space is the set of grammars
that may be hypothesized by the learner
9The Overgeneralization Hazard
10If H
An infinite language
L(Gk)
? an infinite setof included finite languages
then H is unlearnable
L(Gi)
L(Gi)
H ? Lreg ? Lreg is unlearnable
Lreg? Lcf? Lcs? Lre? No class of
languages in the chomsky hierarchy is
learnable
11Golds Enumeration Learner is as fast as any
other learner
Assume there exists a rival learner that
converges earlier than the enumeration learner.
The rival arrives at the target at time i, The
enumerator at time j (i lt j). At time j, the
enumeration learner had to be conjecturingSOME
grammar consistent with the input up to that
point.If the target had happened to be that
grammar, the enumeratorwould have been correct
and the rival incorrect. Thus, for every
language that the rival converges on fasterthan
the enumerator, there is a language for which the
reverse is true.
12Corollary Language just can't be learned -)
13The class of human languages must intersect the
Chomsky Hierarchy so that it does not coincide
with any other class that properly includes any
class in the hierarchy.
14Angluins Theorem (1980) A class of grammars H
is learnable iff for every languageLi L(Gi),
Gi ? H there exists a finite subset D such
thatno other language L(G?), G?? H includes D
and is included in Li.
if this language can be generated bya grammar in
H, H is not learnable!
L(Gi)
L(G?)
D
15Artificial Neural Networks A brief introduction
a) fully recurrent b) feedforward c)
multi-component
16bias node
Input activations
Threshold node
If these inputs are great enough, the unit fires.
That is to say, a positive activation occurs here.
How can we implement the AND function?
17How can we implement the AND function?
First we must decide on representation possible
inputs 1,0 possible outputs 1,0
We want an artificial neuron to implement this
function.
unit inputs unit output
Boolean AND
1 1 1 0 1 0 1 0 0 0 0 0
18-1
net ?activations arriving at threshold node
1
net
1
1
unit inputs unit output
1 1 1 0 1 0 1 0 0 0 0 -1
Oooops
19-1
0
0
net
f(netS)
0
STEP activation function f(x) 1 if x gt 0 f(x)
0 if x lt 0
20a7
a7 (w79) 1(.75) .75
1
w79
a1
.75
.75
w91
f(net9)
.6216
1.25
.777
.8
a8
.5
w89
a9
1.6667
.3
1 / (1e (-net)) . 777
a8 (w89) .3(1.6667) .5
net9 Sj aj (wj9) .3(1.6667) 1(.75) 1.25