Human Computer Studies - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Human Computer Studies

Description:

NP_p the car | the city | the house | the shop. P with | near | in | from ... Zach Solan, David Horn, Eytan Ruppin (Tel Aviv University) & Shimon Edelman (Cornell) ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 41

Provided by: padr1

Category:

more less

Transcript and Presenter's Notes

Title: Human Computer Studies

1
Human Computer Studies
2004 a good year for Computational Grammar
Induction January 2005 Pieter Adriaans
Universiteit van Amsterdam pietera_at_science.uva.nl
http//turing.wins.uva.nl/pietera/ALS/
2
GI Research Questions

Research Question What is the complexity of
human language?
Research Question Can we make a formal model of
language development of young children that
allows us to understand
Why the process is efficient?
Why the process is discontinuous?
Underlying Research Question Can we learn
natural language efficiently from text? How much
text is needed? How much processing is needed?
Research Question Semantic learning e.g. can we
construct ontologies for specific domains from
(scientific) text?

3
Chomsky Hierarchy and the complexity of Human
Language
4
Complexity of Natural Language Zipf distribution
Heavy Low Frequency Tail
Structured High Frequency Core
5
Observations

Word Frequencies in human utterances dominated by
powerlaws
High Frequency core
Low Frequency heavy tail
Open versus closed wordclasses (function words)
Natural Language is open. Grammar is elastic.
Occurence of new words is natural phenomenon.
Syntactic/semantic bootstrapping must play an
important role in language learning.
Bootstrapping will be important for ontology
learning as well as child language acquisition
Better understanding of NL distributions is
necessary

6
Learn NL from text Probabilistic versus
Recursion theoretic approach

1967 Gold. Any language more complex than
super-finite sets (including regular and up the
Chomsky hierarchy) can not be learned from
positive data.
1969 Horning Probabilistic context-free
grammars can be learned from positive data. Given
a text T and two grammars G1 and G2 we are able
to approximate max(P(G1T), P(G2T))
ICGI gt 1990 empirical approach. Just build
algorithms and try them. Approximate NL from
below Finite ? Regular ? Context-free ?
Context-sensitive

7
Situation lt 2004

GI seems to be hard
No identification in the limit
Ill-understood Powerlaws dominate (word)
frequencies in human communication
Machine learning algorithms have difficulties in
these domains
PAC learning does not converge on these domains
Nowhere near learning natural languages
We were running out of ideas

8
Situation lt 2004 Learning Regular Languages

Reasonable success in learning Regular languages
of moderate complexity (Evidence Based State
Merging, Blue-Fringe)
Transparant representation Deterministic Finite
Automata (DFA)
DEMO

9
Situation lt 2004 Learning Context-free Languages

A number of approaches Learning Probabilistic
CFG, Inside-outside Algorithm, Emile, ABL.
No transparant representation Push Down Automata
(PDA) are not really helpful to model the
learning process.
No adequate convergence on interesting real life
corpora
Problem of sparse data sets.
Complexity issues ill-understood.

10
Emile natural language allows bootstrapping

Lewis Caroll's famous poem Jabberwocky' starts
with
'Twas brillig, and the slithy toves
Did gyre and gimble in the wabe
All mimsy were the borogoves
and the mome raths outgrabe.

11
Emile Characteristic Expressions and Contexts

An expression of a type T is characteristic for T
if it only appears with contexts of type T
Similarly, a context of a type T is
characteristic for T if it only appears with
expressions of type T.
Let G be a grammar (context-free or otherwise) of
a language L. G has context separability if each
type of G has a characteristic context, and
expression separability if each type of G has a
characteristic expression.
Natural languages seem to be context- and
expression-separable.
This is nothing but stating that languages can
define their own concepts internally (...is a
noun, ...is a verb).

12
Emile Natural languages are shallow

A class of languages C is shallow if for each
language L it is possible to find a context- and
expression-separable grammar G, and a set of
sentences S inducing characteristic contexts and
expressions for all the types of G, such that the
size of S and the length of the sentences of S
are logarithmic in the descriptive length of
L(relative to C).
Seems to hold for natural languages ? Large
dictionaries, low thickness

13
Regular versus context-free merging-clustering
?
?
?
?
? ? ?
?
? ? ? ?
? ?
14
The EMILE learning algorithm

One can prove that, using clustering techniques,
shallow CFGs can be learned efficiently from
positive examples drawn under m.
General idea ????? ? ???\?/??
sentence? expression?\?/? context

?
?
? ? ? ?
? ?
15
EMILE 4.1 (2000) Vervoort

Unsupervised
Two dimensional clustering random search for
maximized blocks in the matrix
Incremental thresholds for filling degree of
blocks
Simple (but sloppy) rule induction using
characteristic expressions

16
Clustering (2-dimensional)

John makes tea
John likes tea
John likes eating
? ? ? \ ? / ?

John makes coffee
John likes coffee
John is eating

17
Emile 4.1 Clustering Sparse Matrices of Contexts
and Expressions
Contexts
Characteristic Expression
Expressions
Characteristic Context
18
Emile guaranteed to find types with right settings

Let T be a type with a characteristic context cch
and a characteristic expression ech. Suppose that
the maximum lengths for primary contexts and
expressions are set to at least cch and ech
and suppose that the total_support ,
expression_support and context_support
settings are all set to 100 . Let TltmaxC and
TltmaxE be the sets of contexts and expressions of
T that are small enough to be used as primary
contexts and expressions. If EMILE is given a
sample containing all combinations of contexts
from TltmaxC and expressions from TltmaxE, then
EMILE will find type T. (Vervoort 2000)

19
Original grammar

S ? NP V_i ADV
NP_a VP_a
NP_a V_s that S
NP ? NP_a
NP_p
VP_a ? V_t NP
V_t NP P NP_p
NP_a ? John Mary the man the child
NP_p ? the car the city the house the shop
P ? with near in from
V_i ? appears is seems looks
V_s ? thinks hopes tells says
V_t ? knows likes misses sees
ADV ? large small ugly beautiful

20
Learned Grammar after 100.000 examples

0 ?17 6
0 ?17 22 17 6
0 ?17 22 17 22 17 22 17 6
6 ? misses 17 likes 17 knows 17
sees 17
6 ?22 17 6
6 ? appears 34 looks 34 is 34 seems
34
6 ?6 near 17 6 from 17 6 in 17
6 ?6 with 17
17 ? the child Mary the city the man
John the car the house the shop
22 ? tells that thinks that hopes that
says that
22 ?22 17 22
34 ? small beautiful large ugly

21
Bible books

King James version
31102 verses of 82935 lines
4,8 Mb of English text
001001 In the beginning God created the heaven
and the earth.
66 Experiments with increasing sample size
Initially Book Genesis, Book Exodus,
Full run 40 minutes, 500 Mb on Ultra-2 Sparc

22
Bible books
23
GI on the bible

0 ? Thou shall not 582
0 ? Neither shalt thou 582
582 ? eat it
582 ? kill .
582 ? commit adultery .
582 ? steal .
582 ? bear false witness against thy neighbour
.
582 ? abhor an Edomite

24
Knowledge base in Bible

Dictionary Type 76
Esau, Isaac, Abraham, Rachel, Leah, Levi, Judah,
Naphtali, Asher, Benjamin, Eliphaz, Reuel, Anah,
Shobal, Ezer, Dishan, Pharez, Manasseh, Gershon,
Kohath, Merari, Aaron, Amram, Mushi, Shimei,
Mahli,Joel, Shemaiah, Shem, Ham, Salma, Laadan,
Zophah, Elpaal, Jehieli
Dictionary Type 362
plague, leprosy
Dictionary Type 414
Simeon, Judah, Dan, Naphtali, Gad, Asher,
Issachar, Zebulun, Benjamin, Gershom
Dictionary Type 812
two, three, four

Dictionary Type 1056
priests, Levites, porters, singers, Nethinims
Dictionary Type 978
afraid, glad, smitten, subdued
Dictionary Type 2465
holy, rich, weak, prudent
Dictionary Type 3086
Egypt, Moab, Dumah, Tyre, Damascus
Dictionary Type 4082
heaven, Jerusalem

25
Evaluation

Works efficiently on large corpora
learns (partial) grammars
unsupervised
- EMILE 4.1 needs a lot of input.
- Convergence to meaningful syntactic type rarely
observed.
- Types seem to be semantic rather than
syntactic.
Why?
Hypothesis distribution in real life text is
semantic, not syntactic.
But, most of all Sparse data!!!

26
2004 Omphalos Competition Starkie van Zaanen

Unsupervised learning of context-free grammars
Deliberately constructed to be beyond current
state of the art
A theoretical brute force learner that constructs
all possible CFG consistent with a certain set of
positive examples O.
Complexity measure for CFGs.
There are only2?i(2Oj -2) 1) (?i(2Oj
-2) 1) ) T(O)of these grammars, where T(O)
is the number of terminals!!

27
2004 Omphalos Competition Starkie van Zaanen
Let ? be an alphabet, ?? the set of all
strings over ? L(G) S ? ?? is the language
generated by a grammar G CG ? S is a
characteristic sample for G
?? (infinite)
S (infinite)
O (finite Omphalos sample)
CG
O lt 20 CG
28
Bad news for distributional analysis (Emile, ABL,
Inside out)
a,S ? push X a,X ? push X
w ?an bn
aeb aaebb aaaebbb
b,X ? pop X
0
1
1
e,X ? no-op
?,X ? pop
w ?a,cn b,dn
a,S ? push X a,X ? push X
aeb ceb aed ced aaebb aaebd caebb
caebd acebb aaaebbb
b,X ? pop X
1
0
1
e,X ? no-op
?,X ? pop
c,S ? push X c,X ? push X
d,X ? pop X
29
Bad news for distributional analysis (Emile, ABL,
Inside out)
a aeb b a
aeb d c aeb
b c aeb d a
ceb b a ceb
d c ceb b c
ceb d
a aeb b a
aaebb b
We need large corpora to make distributional
analysis working. Omphalos samples are way to
small!!
30
Omphalos won by Alexander Clark some good ideas!

Approach
Exploit useful properties that randomly generated
grammars are likely to have
Identifying constituents Measure local mutual
information between symbol before and symbol
after. Clark, 2001. More reliable than other
information theoretic constituent boundary tests.
Lamb, 1961 Brill et al., 1990
Under benign distributions non-constituents will
have zero mutual information crossing constituent
boundaries. Structures that do not cross
constituent boundaries will have non-zero mutual
information.
Analysis of cliques of strings that might be
constituents (Much like clusters in EMILE).
Most hard problems in Omphalos still open!!

31
But, is Omphalos the right challenge? What about
NL?
Natural Languages
Need for larger samples
Shallow languages
Harder to learn
Log of terminals ?
Omphalos
Complexity of the grammar P/N
32
ADIOS (Automatic DIstillation Of Structure) Solan
et al. 2004

Representation of a corpus (of sentences) as
paths over a graph whose vertices are lexical
elements (words)
Motif Extraction (MEX) procedure for establishing
new vertices thus progressively redefining the
graph in an unsupervised fashion
Recursive Generalization
Zach Solan, David Horn, Eytan Ruppin (Tel Aviv
University) Shimon Edelman (Cornell)
http//www.tau.ac.il/zsolan

33
The Model (Solan et al. 2004)

Graph representation with words as vertices and
sentences as paths.

Is that a dog?
Is that a cat?
Where is the dog?
And is that a horse?
34
The MEX (motif extraction) procedure (Solan et
al. 2004)
35
Generalization (Solan et al. 2004)
36
From MEX to ADIOS (Solan et al. 2004)

Apply MEX to search-path consisting of a given
data-path.
On same search-path, within a given window size,
allow for the occurrence of an equivalence class,
i.e. define a generalized search-path of the type
e1-gt e2-gt-gt E -gt-gtek. Apply MEX to this
window.
Choose patterns P, including equivalence classes
E according to MEX ranking. Add nodes.
Repeat the above for all search-paths.
Repeat the procedure to obtain higher level
generalizations.
Express structures in syntactic trees.

37
First pattern formation
Higher hierarchies patterns (P) constructed of
other Ps, equivalence classes (E) and terminals
(T)
Trees to be read from top to bottom and from left
to right
Final stage root pattern
CFG context free grammar
38
Solan et al. 2004

The ADIOS algorithm has been evaluated using
artificial grammars containing thousands of
rules, natural languages as diverse as English
and Chinese, regulatory and coding regions in DNA
sequences and functionally relevant structures in
protein data.
Complexity of ADIOS on large NL corpora seems to
be linear in the size of the corpus.
Allows mild context sensitive learning
This is the first time an unsupervised algorithm
is shown capable of learning complex syntax, and
score well in standard language proficiency
tests!! (Trainingset 300.000 sentences from
CHILDES, ADIOS scoring intermediate level (58)
in Göteborg/ESL test).

39
ADIOS learning from ATIS-CFG (4592 rules)using
different numbers of learners, and different
window length L
40
Where does ADIOS fit in?
Natural Languages
ADIOS
Need for larger samples
Shallow languages
Harder to learn
of terminals ?
Omphalos
Complexity of the grammar P/N
41
GI Research Questions

Research Question What is the complexity of
human language?
Research Question Can we make a formal model of
language development of young children that
allows us to understand
Why the process is efficient?
Why the process is discontinuous?
Underlying Research Question Can we learn
natural language efficiently from text? How much
text is needed? How much processing is needed?
Research Question Semantic learning e.g. can we
construct ontologies for specific domains from
(scientific) text?

42
Conclusions Further work

We start to crack the code of unsupervised
learning of human languages
ADIOS is the first algorithm capable of learning
complex syntax, and scoring well in standard
language proficiency tests
We have better statistical techniques to separate
constituents form non-constituents.
Good ideas pseudo graph representation, MEX,
sliding windows.
To be done
Can MEX help us in DFA induction?
Better understanding of the complexity issues.
When does MEX collapse?
Better understanding of Semantic Learning
Incremental Learning with background knowledge
Use GI to learn ontologies