Language Learning Week 11 presentation

About This Presentation

Transcript and Presenter's Notes

Title: Language Learning Week 11

1
Language Learning Week 11
Pieter Adriaans pietera_at_science.uva.nl Sophia
Katrenko katrenko_at_science.uva.nl
2
Contents Week 11

Learning Human Languages
Learning context-free grammars
Emile

3
GI Research Questions

Research Question What is the complexity of
human language?
Research Question Can we make a formal model of
language development of young children that
allows us to understand
Why the process is efficient?
Why the process is discontinuous?
Underlying Research Question Can we learn
natural language efficiently from text? How much
text is needed? How much processing is needed?
Research Question Semantic learning e.g. can we
construct ontologies for specific domains from
(scientific) text?

4
Chomsky Hierarchy and the complexity of Human
Language
5
Complexity of Natural Language Zipf distribution
Heavy Low Frequency Tail
Structured High Frequency Core
6
Observations

Word Frequencies in human utterances dominated by
powerlaws
High Frequency core
Low Frequency heavy tail
Open versus closed wordclasses (function words)
Natural Language is open. Grammar is elastic.
Occurence of new words is natural phenomenon.
Syntactic/semantic bootstrapping must play an
important role in language learning.
Bootstrapping will be important for ontology
learning as well as child language acquisition
Better understanding of NL distributions is
necessary

7
Learn NL from text Probabilistic versus
Recursion theoretic approach

1967 Gold. Any language more complex than
super-finite sets (including regular and up the
Chomsky hierarchy) can not be learned from
positive data.
1969 Horning Probabilistic context-free
grammars can be learned from positive data. Given
a text T and two grammars G1 and G2 we are able
to approximate max(P(G1T), P(G2T))
ICGI gt 1990 empirical approach. Just build
algorithms and try them. Approximate NL from
below Finite ? Regular ? Context-free ?
Context-sensitive

8
Situation lt 2004

GI seems to be hard
No identification in the limit
Ill-understood Powerlaws dominate (word)
frequencies in human communication
Machine learning algorithms have difficulties in
these domains
PAC learning does not converge on these domains
Nowhere near learning natural languages
We were running out of ideas

9
Situation lt 2004 Learning Regular Languages

Reasonable success in learning Regular languages
of moderate complexity (Evidence Based State
Merging, Blue-Fringe)
Transparant representation Deterministic Finite
Automata (DFA)
DEMO

10
Situation lt 2004 Learning Context-free Languages

A number of approaches Learning Probabilistic
CFG, Inside-outside Algorithm, Emile, ABL.
No transparant representation Push Down Automata
(PDA) are not really helpful to model the
learning process.
No adequate convergence on interesting real life
corpora
Problem of sparse data sets.
Complexity issues ill-understood.

11
Emile natural language allows bootstrapping

Lewis Caroll's famous poem Jabberwocky' starts
with
'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe
All mimsy were the borogoves
and the mome raths outgrabe.

12
Emile Characteristic Expressions and Contexts

An expression of a type T is characteristic for T
if it only appears with contexts of type T
Similarly, a context of a type T is
characteristic for T if it only appears with
expressions of type T.
Let G be a grammar (context-free or otherwise) of
a language L. G has context separability if each
type of G has a characteristic context, and
expression separability if each type of G has a
characteristic expression.
Natural languages seem to be context- and
expression-separable.
This is nothing but stating that languages can
define their own concepts internally (...is a
noun, ...is a verb).

13
Emile Natural languages are shallow

A class of languages C is shallow if for each
language L it is possible to find a context- and
expression-separable grammar G, and a set of
sentences S inducing characteristic contexts and
expressions for all the types of G, such that the
size of S and the length of the sentences of S
are logarithmic in the descriptive length of
L(relative to C).
Seems to hold for natural languages ? Large
dictionaries, low thickness

14
Regular versus context-free merging-clustering
?
?
?
?
? ? ?
?
? ? ? ?
? ?
15
The EMILE learning algorithm

One can prove that, using clustering techniques,
shallow CFGs can be learned efficiently from
positive examples drawn under m.
General idea ????? ? ???\?/??
sentence? expression?\?/? context

?
?
? ? ? ?
? ?
16
Grammar Formalisms Context_free

Context_free GrammarSentence ? Name Verb
Sentence ? Name T_Verb Name Name ? Mary
JohnVerb ? WalksT_Verb ? Loves
Sentences John loves Mary Mary walks

17
Grammar Formalisms Categorial Grammars

Categorial Grammar (Lexicalistic)loves ?Name \
Sentence / Name walks Runs ? Name \
SentenceMary John ? Name
Parsing as deduction ? ? ?\????/? ? ???

Sentence
Name Name\Sentence
Name (Name\Sentence)/Name Name
John loves
Mary
18
Categorial Grammar Propositional calculus
without structural rules

Interchange x, A, y, B, z ? C x, B, y, A, z
? C
Contraction x, A, A, y ? C x, A, y ? C
Thinning x, y ? C x, A, y ? C
Logic A (A ? B) ? B (A ? B) A ? B
Grammar A ? (A \ B) ? B (A / B) ? A ? B

19
Categorial Grammar Formalism Algebraic
specification

M is a multiplicative system
A ? B x ? y ? M (x ?A) (y ?B)
C / B x ?M ? y?B (x ? y ?C)
A \ C y ?M ? x?A (x ? y ?C)

20
Categorial Grammar Formalism Algebraic
specification Data base operations

Name John, Mary
Verb walks, runs
S Name ? Verb John, Mary ? walks,
runs John walks John runs Mary
walks Mary runs

21
Categorial Grammar Formalism Algebraic
specification Data base operations

Name \ S
John, Mary \ John walks, John runs, Mary
Walks, Mary runs John walks Mary
walks John runs Mary runs walks,
runs
S / Verb
John walks, John runs, Mary Walks, Mary runs /
walks, runs John. Mary

22
EMILE 3.0 stages Take Sample
John loves Mary Mary walks
23
EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
S/loves Mary ? John S/Mary ? John loves S ?
John loves Mary John\S/Mary ? loves John\S ?
loves Mary John loves\S ? Mary S/walks ?
Mary S ? Mary walks Mary\S ? walks
24
EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
25
EMILE 3.0 stages Complete First order Explosion
John loves Mary Mary walks
26
EMILE 3.0 stages Clustering
John loves Mary Mary walks
27
EMILE 3.0 stages Clustering
John loves Mary Mary walks
28
EMILE 3.0 stages Clusters ? non-terminal names
John loves Mary Mary walks
A
B
C
D
E
29
EMILE 3.0 stages protorules
S/loves Mary ? A S/Mary ? B S ? C John\S/Mary
? D John\S ? E John loves\S ? A S/walks ?
A Mary\S ? E
A ? John B ? John loves C ? John loves
Mary D ? loves E ? loves Mary A ?
Mary C ? Mary walks E ? walks
30
Emile 3.0 stages generalize into Context-free
rules
John\S/Mary ? D _____________________________ S
? John D Mary A ? John
(Chacteristic expression) A ?
Mary (Chacteristic expression) __________________
___________ S ? A D A
Grammar S ? A D A S ? B A S ? C S ? A E A ?
John Mary B ? A D C ? A D A A E D ?
loves E ? A D walks
31
Theorem (Adriaans 92)

If a language L has a context_free grammar
Gis shallow is sampled according to the
Universal Distributionand there is a
member-check function availablethen then it can
be learned efficiently from text
Assumptions Natural language is
shallow Distributions of sentences in a text is
simple

32
EMILE 3.0 (1992) Problems, not very practical

Take Sample Positive examples
First Order explosion Deduction
Complete first order explosion Positive
Negative examples
Clustering Deduction
Non-terminal names Deduction
Proto-rules Induction
Context-free rules Induction

33
EMILE 3.0 (1992) Problems

Take Sample Positive examples
First Order explosion Deduction
Complete first order explosion Positive
Negative examples
Clustering Deduction
Non-terminal names Deduction
Proto-rules Induction
Context-free rules Induction
Supervised, no text, speakers do not give
negative examples

34
EMILE 3.0 (1992) Problems

Take Sample Positive examples
First Order explosion Deduction
Complete first order explosion Positive
Negative examples
Clustering Deduction
Non-terminal names Deduction
Proto-rules Induction
Context-free rules Induction
Supervised, no text, speakers do not give
negative examples
Polynomial, but very complex due to overlapping
clusters

35
EMILE 3.0 (1992) Only theoretical value

Take Sample Positive examples
First Order explosion Deduction
Complete first order explosion Positive
Negative examples
Clustering Deduction
Non-terminal names Deduction
Proto-rules Induction
Context-free rules Induction
Supervised, no text, speakers do not give
negative examples
Polynomial, but very complex due to overlapping
clusters
Batch oriented, not incremental

36
EMILE 4.1 (2000) Vervoort

Unsupervised
Two dimensional clustering random search for
maximized blocks in the matrix
Incremental thresholds for filling degree of
blocks
Simple (but sloppy) rule induction using
characteristic expressions

37
Clustering (2-dimensional)

John makes tea
John likes tea
John likes eating
? ? ? \ ? / ?

John makes coffee
John likes coffee
John is eating

38
Emile 4.1 Clustering Sparse Matrices of Contexts
and Expressions
Contexts
Characteristic Expression
Expressions
Characteristic Context
39
Emile guaranteed to find types with right settings

Let T be a type with a characteristic context cch
and a characteristic expression ech. Suppose that
the maximum lengths for primary contexts and
expressions are set to at least cch and ech
and suppose that the total_support ,
expression_support and context_support
settings are all set to 100 . Let TltmaxC and
TltmaxE be the sets of contexts and expressions of
T that are small enough to be used as primary
contexts and expressions. If EMILE is given a
sample containing all combinations of contexts
from TltmaxC and expressions from TltmaxE, then
EMILE will find type T. (Vervoort 2000)

40
Original grammar

S ? NP V_i ADV
NP_a VP_a
NP_a V_s that S
NP ? NP_a
NP_p
VP_a ? V_t NP
V_t NP P NP_p
NP_a ? John Mary the man the child
NP_p ? the car the city the house the shop
P ? with near in from
V_i ? appears is seems looks
V_s ? thinks hopes tells says
V_t ? knows likes misses sees
ADV ? large small ugly beautiful

41
Learned Grammar after 100.000 examples

0 ?17 6
0 ?17 22 17 6
0 ?17 22 17 22 17 22 17 6
6 ? misses 17 likes 17 knows 17
sees 17
6 ?22 17 6
6 ? appears 34 looks 34 is 34 seems
34
6 ?6 near 17 6 from 17 6 in 17
6 ?6 with 17
17 ? the child Mary the city the man
John the car the house the shop
22 ? tells that thinks that hopes that
says that
22 ?22 17 22
34 ? small beautiful large ugly

42
Bible books

King James version
31102 verses of 82935 lines
4,8 Mb of English text
001001 In the beginning God created the heaven
and the earth.
66 Experiments with increasing sample size
Initially Book Genesis, Book Exodus,
Full run 40 minutes, 500 Mb on Ultra-2 Sparc

43
Bible books
44
GI on the bible

0 ? Thou shall not 582
0 ? Neither shalt thou 582
582 ? eat it
582 ? kill .
582 ? commit adultery .
582 ? steal .
582 ? bear false witness against thy neighbour
.
582 ? abhor an Edomite

45
Knowledge base in Bible

Dictionary Type 76
Esau, Isaac, Abraham, Rachel, Leah, Levi, Judah,
Naphtali, Asher, Benjamin, Eliphaz, Reuel, Anah,
Shobal, Ezer, Dishan, Pharez, Manasseh, Gershon,
Kohath, Merari, Aaron, Amram, Mushi, Shimei,
Mahli,Joel, Shemaiah, Shem, Ham, Salma, Laadan,
Zophah, Elpaal, Jehieli
Dictionary Type 362
plague, leprosy
Dictionary Type 414
Simeon, Judah, Dan, Naphtali, Gad, Asher,
Issachar, Zebulun, Benjamin, Gershom
Dictionary Type 812
two, three, four

Dictionary Type 1056
priests, Levites, porters, singers, Nethinims
Dictionary Type 978
afraid, glad, smitten, subdued
Dictionary Type 2465
holy, rich, weak, prudent
Dictionary Type 3086
Egypt, Moab, Dumah, Tyre, Damascus
Dictionary Type 4082
heaven, Jerusalem

46
Evaluation

Works efficiently on large corpora
learns (partial) grammars
unsupervised
- EMILE 4.1 needs a lot of input.
- Convergence to meaningful syntactic type rarely
observed.
- Types seem to be semantic rather than
syntactic.
Why?
Hypothesis distribution in real life text is
semantic, not syntactic.
But, most of all Sparse data!!!

Language Learning Week 11 PowerPoint PPT Presentation