Language Learning Week 11 - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Language Learning Week 11

Description:

Research Question: Can we make a formal model of language development of young ... Egypt, Moab, Dumah, Tyre, Damascus. Dictionary Type [4082] heaven, Jerusalem ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 41
Provided by: padr1
Category:
Tags: language | learning | moab | week

less

Transcript and Presenter's Notes

Title: Language Learning Week 11


1
Language Learning Week 11
Pieter Adriaans pietera_at_science.uva.nl Sophia
Katrenko katrenko_at_science.uva.nl
2
Contents Week 11
  • Learning Human Languages
  • Learning context-free grammars
  • Emile

3
GI Research Questions
  • Research Question What is the complexity of
    human language?
  • Research Question Can we make a formal model of
    language development of young children that
    allows us to understand
  • Why the process is efficient?
  • Why the process is discontinuous?
  • Underlying Research Question Can we learn
    natural language efficiently from text? How much
    text is needed? How much processing is needed?
  • Research Question Semantic learning e.g. can we
    construct ontologies for specific domains from
    (scientific) text?

4
Chomsky Hierarchy and the complexity of Human
Language
5
Complexity of Natural Language Zipf distribution
Heavy Low Frequency Tail
Structured High Frequency Core
6
Observations
  • Word Frequencies in human utterances dominated by
    powerlaws
  • High Frequency core
  • Low Frequency heavy tail
  • Open versus closed wordclasses (function words)
  • Natural Language is open. Grammar is elastic.
    Occurence of new words is natural phenomenon.
    Syntactic/semantic bootstrapping must play an
    important role in language learning.
  • Bootstrapping will be important for ontology
    learning as well as child language acquisition
  • Better understanding of NL distributions is
    necessary

7
Learn NL from text Probabilistic versus
Recursion theoretic approach
  • 1967 Gold. Any language more complex than
    super-finite sets (including regular and up the
    Chomsky hierarchy) can not be learned from
    positive data.
  • 1969 Horning Probabilistic context-free
    grammars can be learned from positive data. Given
    a text T and two grammars G1 and G2 we are able
    to approximate max(P(G1T), P(G2T))
  • ICGI gt 1990 empirical approach. Just build
    algorithms and try them. Approximate NL from
    below Finite ? Regular ? Context-free ?
    Context-sensitive

8
Situation lt 2004
  • GI seems to be hard
  • No identification in the limit
  • Ill-understood Powerlaws dominate (word)
    frequencies in human communication
  • Machine learning algorithms have difficulties in
    these domains
  • PAC learning does not converge on these domains
  • Nowhere near learning natural languages
  • We were running out of ideas

9
Situation lt 2004 Learning Regular Languages
  • Reasonable success in learning Regular languages
    of moderate complexity (Evidence Based State
    Merging, Blue-Fringe)
  • Transparant representation Deterministic Finite
    Automata (DFA)
  • DEMO

10
Situation lt 2004 Learning Context-free Languages
  • A number of approaches Learning Probabilistic
    CFG, Inside-outside Algorithm, Emile, ABL.
  • No transparant representation Push Down Automata
    (PDA) are not really helpful to model the
    learning process.
  • No adequate convergence on interesting real life
    corpora
  • Problem of sparse data sets.
  • Complexity issues ill-understood.

11
Emile natural language allows bootstrapping
  • Lewis Caroll's famous poem Jabberwocky' starts
    with
  • 'Twas brillig, and the slithy toves
  • Did gyre and gimble in the wabe
  • All mimsy were the borogoves
  • and the mome raths outgrabe.

12
Emile Characteristic Expressions and Contexts
  • An expression of a type T is characteristic for T
    if it only appears with contexts of type T
  • Similarly, a context of a type T is
    characteristic for T if it only appears with
    expressions of type T.
  • Let G be a grammar (context-free or otherwise) of
    a language L. G has context separability if each
    type of G has a characteristic context, and
    expression separability if each type of G has a
    characteristic expression.
  • Natural languages seem to be context- and
    expression-separable.
  • This is nothing but stating that languages can
    define their own concepts internally (...is a
    noun, ...is a verb).

13
Emile Natural languages are shallow
  • A class of languages C is shallow if for each
    language L it is possible to find a context- and
    expression-separable grammar G, and a set of
    sentences S inducing characteristic contexts and
    expressions for all the types of G, such that the
    size of S and the length of the sentences of S
    are logarithmic in the descriptive length of
    L(relative to C).
  • Seems to hold for natural languages ? Large
    dictionaries, low thickness

14
Regular versus context-free merging-clustering
?
?
?
?
? ? ?
?
? ? ? ?
? ?
15
The EMILE learning algorithm
  • One can prove that, using clustering techniques,
    shallow CFGs can be learned efficiently from
    positive examples drawn under m.
  • General idea ????? ? ???\?/??
    sentence? expression?\?/? context

?
?
? ? ? ?
? ?
16
Grammar Formalisms Context_free
  • Context_free GrammarSentence ? Name Verb
    Sentence ? Name T_Verb Name Name ? Mary
    JohnVerb ? WalksT_Verb ? Loves
  • Sentences John loves Mary Mary walks

17
Grammar Formalisms Categorial Grammars
  • Categorial Grammar (Lexicalistic)loves ?Name \
    Sentence / Name walks Runs ? Name \
    SentenceMary John ? Name
  • Parsing as deduction ? ? ?\????/? ? ???

Sentence
Name Name\Sentence
Name (Name\Sentence)/Name Name
John loves
Mary
18
Categorial Grammar Propositional calculus
without structural rules
  • Interchange x, A, y, B, z ? C x, B, y, A, z
    ? C
  • Contraction x, A, A, y ? C x, A, y ? C
  • Thinning x, y ? C x, A, y ? C
  • Logic A (A ? B) ? B (A ? B) A ? B
  • Grammar A ? (A \ B) ? B (A / B) ? A ? B

19
Categorial Grammar Formalism Algebraic
specification
  • M is a multiplicative system
  • A ? B x ? y ? M (x ?A) (y ?B)
  • C / B x ?M ? y?B (x ? y ?C)
  • A \ C y ?M ? x?A (x ? y ?C)

20
Categorial Grammar Formalism Algebraic
specification Data base operations
  • Name John, Mary
  • Verb walks, runs
  • S Name ? Verb John, Mary ? walks,
    runs John walks John runs Mary
    walks Mary runs

21
Categorial Grammar Formalism Algebraic
specification Data base operations
  • Name \ S
  • John, Mary \ John walks, John runs, Mary
    Walks, Mary runs John walks Mary
    walks John runs Mary runs walks,
    runs
  • S / Verb
  • John walks, John runs, Mary Walks, Mary runs /
    walks, runs John. Mary

22
EMILE 3.0 stages Take Sample
John loves Mary Mary walks
23
EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
S/loves Mary ? John S/Mary ? John loves S ?
John loves Mary John\S/Mary ? loves John\S ?
loves Mary John loves\S ? Mary S/walks ?
Mary S ? Mary walks Mary\S ? walks
24
EMILE 3.0 stages First order Explosion
John loves Mary Mary walks
25
EMILE 3.0 stages Complete First order Explosion
John loves Mary Mary walks
26
EMILE 3.0 stages Clustering
John loves Mary Mary walks
27
EMILE 3.0 stages Clustering
John loves Mary Mary walks
28
EMILE 3.0 stages Clusters ? non-terminal names
John loves Mary Mary walks
A
B
C
D
E
29
EMILE 3.0 stages protorules
S/loves Mary ? A S/Mary ? B S ? C John\S/Mary
? D John\S ? E John loves\S ? A S/walks ?
A Mary\S ? E
A ? John B ? John loves C ? John loves
Mary D ? loves E ? loves Mary A ?
Mary C ? Mary walks E ? walks
30
Emile 3.0 stages generalize into Context-free
rules
John\S/Mary ? D _____________________________ S
? John D Mary A ? John
(Chacteristic expression) A ?
Mary (Chacteristic expression) __________________
___________ S ? A D A
Grammar S ? A D A S ? B A S ? C S ? A E A ?
John Mary B ? A D C ? A D A A E D ?
loves E ? A D walks
31
Theorem (Adriaans 92)
  • If a language L has a context_free grammar
    Gis shallow is sampled according to the
    Universal Distributionand there is a
    member-check function availablethen then it can
    be learned efficiently from text
  • Assumptions Natural language is
    shallow Distributions of sentences in a text is
    simple

32
EMILE 3.0 (1992) Problems, not very practical
  • Take Sample Positive examples
  • First Order explosion Deduction
  • Complete first order explosion Positive
    Negative examples
  • Clustering Deduction
  • Non-terminal names Deduction
  • Proto-rules Induction
  • Context-free rules Induction

33
EMILE 3.0 (1992) Problems
  • Take Sample Positive examples
  • First Order explosion Deduction
  • Complete first order explosion Positive
    Negative examples
  • Clustering Deduction
  • Non-terminal names Deduction
  • Proto-rules Induction
  • Context-free rules Induction
  • Supervised, no text, speakers do not give
    negative examples

34
EMILE 3.0 (1992) Problems
  • Take Sample Positive examples
  • First Order explosion Deduction
  • Complete first order explosion Positive
    Negative examples
  • Clustering Deduction
  • Non-terminal names Deduction
  • Proto-rules Induction
  • Context-free rules Induction
  • Supervised, no text, speakers do not give
    negative examples
  • Polynomial, but very complex due to overlapping
    clusters

35
EMILE 3.0 (1992) Only theoretical value
  • Take Sample Positive examples
  • First Order explosion Deduction
  • Complete first order explosion Positive
    Negative examples
  • Clustering Deduction
  • Non-terminal names Deduction
  • Proto-rules Induction
  • Context-free rules Induction
  • Supervised, no text, speakers do not give
    negative examples
  • Polynomial, but very complex due to overlapping
    clusters
  • Batch oriented, not incremental

36
EMILE 4.1 (2000) Vervoort
  • Unsupervised
  • Two dimensional clustering random search for
    maximized blocks in the matrix
  • Incremental thresholds for filling degree of
    blocks
  • Simple (but sloppy) rule induction using
    characteristic expressions

37
Clustering (2-dimensional)
  • John makes tea
  • John likes tea
  • John likes eating
  • ? ? ? \ ? / ?
  • John makes coffee
  • John likes coffee
  • John is eating

38
Emile 4.1 Clustering Sparse Matrices of Contexts
and Expressions
Contexts
Characteristic Expression
Expressions
Characteristic Context
39
Emile guaranteed to find types with right settings
  • Let T be a type with a characteristic context cch
    and a characteristic expression ech. Suppose that
    the maximum lengths for primary contexts and
    expressions are set to at least cch and ech
    and suppose that the total_support ,
    expression_support and context_support
    settings are all set to 100 . Let TltmaxC and
    TltmaxE be the sets of contexts and expressions of
    T that are small enough to be used as primary
    contexts and expressions. If EMILE is given a
    sample containing all combinations of contexts
    from TltmaxC and expressions from TltmaxE, then
    EMILE will find type T. (Vervoort 2000)

40
Original grammar
  • S ? NP V_i ADV
  • NP_a VP_a
  • NP_a V_s that S
  • NP ? NP_a
  • NP_p
  • VP_a ? V_t NP
  • V_t NP P NP_p
  • NP_a ? John Mary the man the child
  • NP_p ? the car the city the house the shop
  • P ? with near in from
  • V_i ? appears is seems looks
  • V_s ? thinks hopes tells says
  • V_t ? knows likes misses sees
  • ADV ? large small ugly beautiful

41
Learned Grammar after 100.000 examples
  • 0 ?17 6
  • 0 ?17 22 17 6
  • 0 ?17 22 17 22 17 22 17 6
  • 6 ? misses 17 likes 17 knows 17
    sees 17
  • 6 ?22 17 6
  • 6 ? appears 34 looks 34 is 34 seems
    34
  • 6 ?6 near 17 6 from 17 6 in 17
    6 ?6 with 17
  • 17 ? the child Mary the city the man
    John the car the house the shop
  • 22 ? tells that thinks that hopes that
    says that
  • 22 ?22 17 22
  • 34 ? small beautiful large ugly

42
Bible books
  • King James version
  • 31102 verses of 82935 lines
  • 4,8 Mb of English text
  • 001001 In the beginning God created the heaven
    and the earth.
  • 66 Experiments with increasing sample size
  • Initially Book Genesis, Book Exodus,
  • Full run 40 minutes, 500 Mb on Ultra-2 Sparc

43
Bible books
44
GI on the bible
  • 0 ? Thou shall not 582
  • 0 ? Neither shalt thou 582
  • 582 ? eat it
  • 582 ? kill .
  • 582 ? commit adultery .
  • 582 ? steal .
  • 582 ? bear false witness against thy neighbour
    .
  • 582 ? abhor an Edomite

45
Knowledge base in Bible
  • Dictionary Type 76
  • Esau, Isaac, Abraham, Rachel, Leah, Levi, Judah,
    Naphtali, Asher, Benjamin, Eliphaz, Reuel, Anah,
    Shobal, Ezer, Dishan, Pharez, Manasseh, Gershon,
    Kohath, Merari, Aaron, Amram, Mushi, Shimei,
    Mahli,Joel, Shemaiah, Shem, Ham, Salma, Laadan,
    Zophah, Elpaal, Jehieli
  • Dictionary Type 362
  • plague, leprosy
  • Dictionary Type 414
  • Simeon, Judah, Dan, Naphtali, Gad, Asher,
    Issachar, Zebulun, Benjamin, Gershom
  • Dictionary Type 812
  • two, three, four
  • Dictionary Type 1056
  • priests, Levites, porters, singers, Nethinims
  • Dictionary Type 978
  • afraid, glad, smitten, subdued
  • Dictionary Type 2465
  • holy, rich, weak, prudent
  • Dictionary Type 3086
  • Egypt, Moab, Dumah, Tyre, Damascus
  • Dictionary Type 4082
  • heaven, Jerusalem

46
Evaluation
  • Works efficiently on large corpora
  • learns (partial) grammars
  • unsupervised
  • - EMILE 4.1 needs a lot of input.
  • - Convergence to meaningful syntactic type rarely
    observed.
  • - Types seem to be semantic rather than
    syntactic.
  • Why?
  • Hypothesis distribution in real life text is
    semantic, not syntactic.
  • But, most of all Sparse data!!!

47
Contents Week 11
  • Learning Human Languages
  • Learning context-free grammars
  • Emile
Write a Comment
User Comments (0)
About PowerShow.com