Human Computer Studies - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Human Computer Studies

Description:

NP_p the car | the city | the house | the shop. P with | near | in | from ... Zach Solan, David Horn, Eytan Ruppin (Tel Aviv University) & Shimon Edelman (Cornell) ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 41
Provided by: padr1
Category:

less

Transcript and Presenter's Notes

Title: Human Computer Studies


1
Human Computer Studies
2004 a good year for Computational Grammar
Induction January 2005 Pieter Adriaans
Universiteit van Amsterdam pietera_at_science.uva.nl
http//turing.wins.uva.nl/pietera/ALS/
2
GI Research Questions
  • Research Question What is the complexity of
    human language?
  • Research Question Can we make a formal model of
    language development of young children that
    allows us to understand
  • Why the process is efficient?
  • Why the process is discontinuous?
  • Underlying Research Question Can we learn
    natural language efficiently from text? How much
    text is needed? How much processing is needed?
  • Research Question Semantic learning e.g. can we
    construct ontologies for specific domains from
    (scientific) text?

3
Chomsky Hierarchy and the complexity of Human
Language
4
Complexity of Natural Language Zipf distribution
Heavy Low Frequency Tail
Structured High Frequency Core
5
Observations
  • Word Frequencies in human utterances dominated by
    powerlaws
  • High Frequency core
  • Low Frequency heavy tail
  • Open versus closed wordclasses (function words)
  • Natural Language is open. Grammar is elastic.
    Occurence of new words is natural phenomenon.
    Syntactic/semantic bootstrapping must play an
    important role in language learning.
  • Bootstrapping will be important for ontology
    learning as well as child language acquisition
  • Better understanding of NL distributions is
    necessary

6
Learn NL from text Probabilistic versus
Recursion theoretic approach
  • 1967 Gold. Any language more complex than
    super-finite sets (including regular and up the
    Chomsky hierarchy) can not be learned from
    positive data.
  • 1969 Horning Probabilistic context-free
    grammars can be learned from positive data. Given
    a text T and two grammars G1 and G2 we are able
    to approximate max(P(G1T), P(G2T))
  • ICGI gt 1990 empirical approach. Just build
    algorithms and try them. Approximate NL from
    below Finite ? Regular ? Context-free ?
    Context-sensitive

7
Situation lt 2004
  • GI seems to be hard
  • No identification in the limit
  • Ill-understood Powerlaws dominate (word)
    frequencies in human communication
  • Machine learning algorithms have difficulties in
    these domains
  • PAC learning does not converge on these domains
  • Nowhere near learning natural languages
  • We were running out of ideas

8
Situation lt 2004 Learning Regular Languages
  • Reasonable success in learning Regular languages
    of moderate complexity (Evidence Based State
    Merging, Blue-Fringe)
  • Transparant representation Deterministic Finite
    Automata (DFA)
  • DEMO

9
Situation lt 2004 Learning Context-free Languages
  • A number of approaches Learning Probabilistic
    CFG, Inside-outside Algorithm, Emile, ABL.
  • No transparant representation Push Down Automata
    (PDA) are not really helpful to model the
    learning process.
  • No adequate convergence on interesting real life
    corpora
  • Problem of sparse data sets.
  • Complexity issues ill-understood.

10
Emile natural language allows bootstrapping
  • Lewis Caroll's famous poem Jabberwocky' starts
    with
  • 'Twas brillig, and the slithy toves
  • Did gyre and gimble in the wabe
  • All mimsy were the borogoves
  • and the mome raths outgrabe.

11
Emile Characteristic Expressions and Contexts
  • An expression of a type T is characteristic for T
    if it only appears with contexts of type T
  • Similarly, a context of a type T is
    characteristic for T if it only appears with
    expressions of type T.
  • Let G be a grammar (context-free or otherwise) of
    a language L. G has context separability if each
    type of G has a characteristic context, and
    expression separability if each type of G has a
    characteristic expression.
  • Natural languages seem to be context- and
    expression-separable.
  • This is nothing but stating that languages can
    define their own concepts internally (...is a
    noun, ...is a verb).

12
Emile Natural languages are shallow
  • A class of languages C is shallow if for each
    language L it is possible to find a context- and
    expression-separable grammar G, and a set of
    sentences S inducing characteristic contexts and
    expressions for all the types of G, such that the
    size of S and the length of the sentences of S
    are logarithmic in the descriptive length of
    L(relative to C).
  • Seems to hold for natural languages ? Large
    dictionaries, low thickness

13
Regular versus context-free merging-clustering
?
?
?
?
? ? ?
?
? ? ? ?
? ?
14
The EMILE learning algorithm
  • One can prove that, using clustering techniques,
    shallow CFGs can be learned efficiently from
    positive examples drawn under m.
  • General idea ????? ? ???\?/??
    sentence? expression?\?/? context

?
?
? ? ? ?
? ?
15
EMILE 4.1 (2000) Vervoort
  • Unsupervised
  • Two dimensional clustering random search for
    maximized blocks in the matrix
  • Incremental thresholds for filling degree of
    blocks
  • Simple (but sloppy) rule induction using
    characteristic expressions

16
Clustering (2-dimensional)
  • John makes tea
  • John likes tea
  • John likes eating
  • ? ? ? \ ? / ?
  • John makes coffee
  • John likes coffee
  • John is eating

17
Emile 4.1 Clustering Sparse Matrices of Contexts
and Expressions
Contexts
Characteristic Expression
Expressions
Characteristic Context
18
Emile guaranteed to find types with right settings
  • Let T be a type with a characteristic context cch
    and a characteristic expression ech. Suppose that
    the maximum lengths for primary contexts and
    expressions are set to at least cch and ech
    and suppose that the total_support ,
    expression_support and context_support
    settings are all set to 100 . Let TltmaxC and
    TltmaxE be the sets of contexts and expressions of
    T that are small enough to be used as primary
    contexts and expressions. If EMILE is given a
    sample containing all combinations of contexts
    from TltmaxC and expressions from TltmaxE, then
    EMILE will find type T. (Vervoort 2000)

19
Original grammar
  • S ? NP V_i ADV
  • NP_a VP_a
  • NP_a V_s that S
  • NP ? NP_a
  • NP_p
  • VP_a ? V_t NP
  • V_t NP P NP_p
  • NP_a ? John Mary the man the child
  • NP_p ? the car the city the house the shop
  • P ? with near in from
  • V_i ? appears is seems looks
  • V_s ? thinks hopes tells says
  • V_t ? knows likes misses sees
  • ADV ? large small ugly beautiful

20
Learned Grammar after 100.000 examples
  • 0 ?17 6
  • 0 ?17 22 17 6
  • 0 ?17 22 17 22 17 22 17 6
  • 6 ? misses 17 likes 17 knows 17
    sees 17
  • 6 ?22 17 6
  • 6 ? appears 34 looks 34 is 34 seems
    34
  • 6 ?6 near 17 6 from 17 6 in 17
    6 ?6 with 17
  • 17 ? the child Mary the city the man
    John the car the house the shop
  • 22 ? tells that thinks that hopes that
    says that
  • 22 ?22 17 22
  • 34 ? small beautiful large ugly

21
Bible books
  • King James version
  • 31102 verses of 82935 lines
  • 4,8 Mb of English text
  • 001001 In the beginning God created the heaven
    and the earth.
  • 66 Experiments with increasing sample size
  • Initially Book Genesis, Book Exodus,
  • Full run 40 minutes, 500 Mb on Ultra-2 Sparc

22
Bible books
23
GI on the bible
  • 0 ? Thou shall not 582
  • 0 ? Neither shalt thou 582
  • 582 ? eat it
  • 582 ? kill .
  • 582 ? commit adultery .
  • 582 ? steal .
  • 582 ? bear false witness against thy neighbour
    .
  • 582 ? abhor an Edomite

24
Knowledge base in Bible
  • Dictionary Type 76
  • Esau, Isaac, Abraham, Rachel, Leah, Levi, Judah,
    Naphtali, Asher, Benjamin, Eliphaz, Reuel, Anah,
    Shobal, Ezer, Dishan, Pharez, Manasseh, Gershon,
    Kohath, Merari, Aaron, Amram, Mushi, Shimei,
    Mahli,Joel, Shemaiah, Shem, Ham, Salma, Laadan,
    Zophah, Elpaal, Jehieli
  • Dictionary Type 362
  • plague, leprosy
  • Dictionary Type 414
  • Simeon, Judah, Dan, Naphtali, Gad, Asher,
    Issachar, Zebulun, Benjamin, Gershom
  • Dictionary Type 812
  • two, three, four
  • Dictionary Type 1056
  • priests, Levites, porters, singers, Nethinims
  • Dictionary Type 978
  • afraid, glad, smitten, subdued
  • Dictionary Type 2465
  • holy, rich, weak, prudent
  • Dictionary Type 3086
  • Egypt, Moab, Dumah, Tyre, Damascus
  • Dictionary Type 4082
  • heaven, Jerusalem

25
Evaluation
  • Works efficiently on large corpora
  • learns (partial) grammars
  • unsupervised
  • - EMILE 4.1 needs a lot of input.
  • - Convergence to meaningful syntactic type rarely
    observed.
  • - Types seem to be semantic rather than
    syntactic.
  • Why?
  • Hypothesis distribution in real life text is
    semantic, not syntactic.
  • But, most of all Sparse data!!!

26
2004 Omphalos Competition Starkie van Zaanen
  • Unsupervised learning of context-free grammars
  • Deliberately constructed to be beyond current
    state of the art
  • A theoretical brute force learner that constructs
    all possible CFG consistent with a certain set of
    positive examples O.
  • Complexity measure for CFGs.
  • There are only2?i(2Oj -2) 1) (?i(2Oj
    -2) 1) ) T(O)of these grammars, where T(O)
    is the number of terminals!!

27
2004 Omphalos Competition Starkie van Zaanen
Let ? be an alphabet, ?? the set of all
strings over ? L(G) S ? ?? is the language
generated by a grammar G CG ? S is a
characteristic sample for G
?? (infinite)
S (infinite)
O (finite Omphalos sample)
CG
O lt 20 CG
28
Bad news for distributional analysis (Emile, ABL,
Inside out)
a,S ? push X a,X ? push X
w ?an bn
aeb aaebb aaaebbb
b,X ? pop X
0
1
1
e,X ? no-op
?,X ? pop
w ?a,cn b,dn
a,S ? push X a,X ? push X
aeb ceb aed ced aaebb aaebd caebb
caebd acebb aaaebbb
b,X ? pop X
1
0
1
e,X ? no-op
?,X ? pop
c,S ? push X c,X ? push X
d,X ? pop X
29
Bad news for distributional analysis (Emile, ABL,
Inside out)
a aeb b a
aeb d c aeb
b c aeb d a
ceb b a ceb
d c ceb b c
ceb d
a aeb b a
aaebb b
We need large corpora to make distributional
analysis working. Omphalos samples are way to
small!!
30
Omphalos won by Alexander Clark some good ideas!
  • Approach
  • Exploit useful properties that randomly generated
    grammars are likely to have
  • Identifying constituents Measure local mutual
    information between symbol before and symbol
    after. Clark, 2001. More reliable than other
    information theoretic constituent boundary tests.
    Lamb, 1961 Brill et al., 1990
  • Under benign distributions non-constituents will
    have zero mutual information crossing constituent
    boundaries. Structures that do not cross
    constituent boundaries will have non-zero mutual
    information.
  • Analysis of cliques of strings that might be
    constituents (Much like clusters in EMILE).
  • Most hard problems in Omphalos still open!!

31
But, is Omphalos the right challenge? What about
NL?
Natural Languages
Need for larger samples
Shallow languages
Harder to learn
Log of terminals ?
Omphalos
Complexity of the grammar P/N
32
ADIOS (Automatic DIstillation Of Structure) Solan
et al. 2004
  • Representation of a corpus (of sentences) as
    paths over a graph whose vertices are lexical
    elements (words)
  • Motif Extraction (MEX) procedure for establishing
    new vertices thus progressively redefining the
    graph in an unsupervised fashion
  • Recursive Generalization
  • Zach Solan, David Horn, Eytan Ruppin (Tel Aviv
    University) Shimon Edelman (Cornell)
  • http//www.tau.ac.il/zsolan

33
The Model (Solan et al. 2004)
  • Graph representation with words as vertices and
    sentences as paths.

Is that a dog?
Is that a cat?
Where is the dog?
And is that a horse?
34
The MEX (motif extraction) procedure (Solan et
al. 2004)
35
Generalization (Solan et al. 2004)
36
From MEX to ADIOS (Solan et al. 2004)
  • Apply MEX to search-path consisting of a given
    data-path.
  • On same search-path, within a given window size,
    allow for the occurrence of an equivalence class,
    i.e. define a generalized search-path of the type
    e1-gt e2-gt-gt E -gt-gtek. Apply MEX to this
    window.
  • Choose patterns P, including equivalence classes
    E according to MEX ranking. Add nodes.
  • Repeat the above for all search-paths.
  • Repeat the procedure to obtain higher level
    generalizations.
  • Express structures in syntactic trees.

37
First pattern formation
Higher hierarchies patterns (P) constructed of
other Ps, equivalence classes (E) and terminals
(T)
Trees to be read from top to bottom and from left
to right
Final stage root pattern
CFG context free grammar
38
Solan et al. 2004
  • The ADIOS algorithm has been evaluated using
    artificial grammars containing thousands of
    rules, natural languages as diverse as English
    and Chinese, regulatory and coding regions in DNA
    sequences and functionally relevant structures in
    protein data.
  • Complexity of ADIOS on large NL corpora seems to
    be linear in the size of the corpus.
  • Allows mild context sensitive learning
  • This is the first time an unsupervised algorithm
    is shown capable of learning complex syntax, and
    score well in standard language proficiency
    tests!! (Trainingset 300.000 sentences from
    CHILDES, ADIOS scoring intermediate level (58)
    in Göteborg/ESL test).

39
ADIOS learning from ATIS-CFG (4592 rules)using
different numbers of learners, and different
window length L
40
Where does ADIOS fit in?
Natural Languages
ADIOS
Need for larger samples
Shallow languages
Harder to learn
of terminals ?
Omphalos
Complexity of the grammar P/N
41
GI Research Questions
  • Research Question What is the complexity of
    human language?
  • Research Question Can we make a formal model of
    language development of young children that
    allows us to understand
  • Why the process is efficient?
  • Why the process is discontinuous?
  • Underlying Research Question Can we learn
    natural language efficiently from text? How much
    text is needed? How much processing is needed?
  • Research Question Semantic learning e.g. can we
    construct ontologies for specific domains from
    (scientific) text?

42
Conclusions Further work
  • We start to crack the code of unsupervised
    learning of human languages
  • ADIOS is the first algorithm capable of learning
    complex syntax, and scoring well in standard
    language proficiency tests
  • We have better statistical techniques to separate
    constituents form non-constituents.
  • Good ideas pseudo graph representation, MEX,
    sliding windows.
  • To be done
  • Can MEX help us in DFA induction?
  • Better understanding of the complexity issues.
    When does MEX collapse?
  • Better understanding of Semantic Learning
  • Incremental Learning with background knowledge
  • Use GI to learn ontologies
Write a Comment
User Comments (0)
About PowerShow.com