Next slide is the Neutral Avenue System Diagram with a Morphology Learning box added - PowerPoint PPT Presentation

About This Presentation
Title:

Next slide is the Neutral Avenue System Diagram with a Morphology Learning box added

Description:

blame blamed blames roamed roaming roams solve solves solving. Our Approach ... roam. solve. e.es. blam. solv. Example Vocabulary ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 36
Provided by: cmon4
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Next slide is the Neutral Avenue System Diagram with a Morphology Learning box added


1
Next slide is the Neutral Avenue System Diagram
with a Morphology Learning box added
2
Avenue Overview
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
Learning Module
Do NOT Use
Handcrafted rules
Run Time Transfer System
Learning Module
Transfer Rules
Rule Refinement Module
Elicitation Corpus
Morphology Analyzer
Lexical Resources
Lattice
Elicitation Tool
3
The next slide is for Ari. It has her sections
highlighted but also has the extra box that I
added for Morphology Learning
4
Rule Refinement
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
Learning Module
Do NOT Use
Handcrafted rules
Run Time Transfer System
Learning Module
Transfer Rules
Rule Refinement Module
Elicitation Corpus
Morphology Analyzer
Lexical Resources
Lattice
Elicitation Tool
5
Here is where Christians presentation begins
6
Avenue Overview
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
Learning Module
Do NOT Use
Handcrafted rules
Run Time Transfer System
Learning Module
Transfer Rules
Rule Refinement Module
Elicitation Corpus
Morphology Analyzer
Lexical Resources
Lattice
Elicitation Tool
7
The Challenge of Morphology
  • Mapudungun (Indigenous Language of Chile and
    Argentina, 1 Million Speakers)

Allkütulekefun
8
The Challenge of Morphology
  • Mapudungun

-ke
-fu
-n
-le
Allkütu
9
The Challenge of Morphology
  • Mapudungun

-ke
-fu
-n
-le
Allkütu
-past
-indic.1sg
-habitual
-prog.
Listen
10
The Challenge of Morphology
  • Mapudungun

-ke
-fu
-n
-le
Allkütu
-past
-indic.1sg
-habitual
-prog.
Listen
I
11
The Challenge of Morphology
  • Mapudungun

-ke
-fu
-n
-le
Allkütu
-past
-indic.1sg
-habitual
-prog.
Listen
I
used
to
12
The Challenge of Morphology
  • Mapudungun

-ke
-fu
-n
-le
Allkütu
-past
-indic.1sg
-habitual
-prog.
Listen
I
used
to
listen
13
The Challenge of Morphology
  • Mapudungun
  • Tasks for Morphology
  • Segment Words
  • Map Morphemes onto Features

-ke
-fu
-n
-le
Allkütu
-past
-indic.1sg
-habitual
-prog.
Listen
I
used
to
listen
14
The Challenge of Morphology
  • Tasks for Morphology
  • Segment Words
  • Map Morphemes onto Features
  • Learn these tasks
  • unsupervised
  • from data
  • for any language

15
Our Approach
  • Leverage the Natural Structure of Morphology
  • Paradigm
  • Set of affixes that interchangeably attach to a
    set of stems

16
Our Approach
Ø.s blame solve
  • Leverage the Natural Structure of Morphology
  • Paradigm
  • Set of affixes that interchangeably attach to a
    set of stems

Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
17
Our Approach
Ø.s.d blame
Ø.s blame solve
  • Leverage the Natural Structure of Morphology
  • Paradigm
  • Set of affixes that interchangeably attach to a
    set of stems

Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
18
Our Approach
Ø.s.d blame
Ø.s blame solve
  • Leverage the Natural Structure of Morphology
  • Paradigm
  • Set of affixes that interchangeably attach to a
    set of stems

Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
19
Our Approach
Ø.s.d blame
Ø.s blame solve
  • Leverage the Natural Structure of Morphology
  • Paradigm
  • Set of affixes that interchangeably attach to a
    set of stems

Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
s blame roam solve
20
Our Approach
Ø.s.d blame
Ø.s blame solve
  • Leverage the Natural Structure of Morphology
  • Paradigm
  • Set of affixes that interchangeably attach to a
    set of stems

Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
s blame roam solve
21
Our Approach
Ø.s.d blame
Ø.s blame solve
e.es blam solv
Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
s blame roam solve
22
Our Approach
Ø.s.d blame
Ø.s blame solve
e.es blam solv
Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
s blame roam solve
23
Ø.s.d blame
e.es.ed blam
me.mes.med bla
e.es blam solv
Ø.s blame solve
me.mes bla
e.ed blam
Ø.d blame
me.med bla
s.d blame
es.ed blam
mes.med bla
Ø blame blames blamed roams roamed roaming solve s
olves solving
e blam solv
me bla
s blame roam solve
es blam solv
mes bla
ed blam roam
d blame roame
med bla roa
24
a.as.o.os.tro 1 cas
  • Spanish Newswire Corpus
  • 40,011 Tokens
  • 6,975 Types

a.as.o.os 43 african, cas, jurídic, l, ...
a.as.os 50 afectad, cas, jurídic, l, ...
a.as.o 59 cas, citad, jurídic, l, ...
a.o.os 105 impuest, indonesi, italian, jurídic,
...
as.o.os 54 cas, implicad, jurídic, l, ...
a.as 199 huelg, incluid, industri, inundad, ...
a.os 134 impedid, impuest, indonesi, inundad, ...
as.os 68 cas, implicad, inundad, jurídic, ...
a.o 214 id, indi, indonesi, inmediat, ...
as.o 85 intern, jurídic, just, l, ...
a.tro 2 cas.cen
o.os 268 human, implicad, indici,
indocumentad, ...
a 1237 huelg, ib, id, iglesi, ...
as 404 huelg, huelguist, incluid, industri, ...
os 534 humorístic, human, hígad, impedid, ...
o 1139 hub, hug, human, huyend, ...
tro 16 catas, ce, cen, cua, ...
24
25
a.as.o.os.tro 1 cas
Suffixes Stems
Level 5 5 suffixes Stem Type Count
a.as.o.os 43 african, cas, jurídic, l, ...
a.as.os 50 afectad, cas, jurídic, l, ...
a.as.o 59 cas, citad, jurídic, l, ...
a.o.os 105 impuest, indonesi, italian, jurídic,
...
as.o.os 54 cas, implicad, jurídic, l, ...
a.as 199 huelg, incluid, industri, inundad, ...
a.os 134 impedid, impuest, indonesi, inundad, ...
as.os 68 cas, implicad, inundad, jurídic, ...
a.o 214 id, indi, indonesi, inmediat, ...
as.o 85 intern, jurídic, just, l, ...
a.tro 2 cas.cen
o.os 268 human, implicad, indici,
indocumentad, ...
a 1237 huelg, ib, id, iglesi, ...
as 404 huelg, huelguist, incluid, industri, ...
os 534 humorístic, human, hígad, impedid, ...
o 1139 hub, hug, human, huyend, ...
tro 16 catas, ce, cen, cua, ...
25
26
Adjective Inflection Class
From the spurious suffix tro
a.as.o.os 43 african, cas, jurídic, l, ...
a.as.os 50 afectad, cas, jurídic, l, ...
a.as.o 59 cas, citad, jurídic, l, ...
a.o.os 105 impuest, indonesi, italian, jurídic,
...
as.o.os 54 cas, implicad, jurídic, l, ...
a.as 199 huelg, incluid, industri, inundad, ...
a.os 134 impedid, impuest, indonesi, inundad, ...
as.os 68 cas, implicad, inundad, jurídic, ...
a.o 214 id, indi, indonesi, inmediat, ...
as.o 85 intern, jurídic, just, l, ...
o.os 268 human, implicad, indici,
indocumentad, ...
a 1237 huelg, ib, id, iglesi, ...
as 404 huelg, huelguist, incluid, industri, ...
os 534 humorístic, human, hígad, impedid, ...
o 1139 hub, hug, human, huyend, ...
26
27
Basic Search Procedure
a.as.o.os 43 african, cas, jurídic, l, ...
a.as.os 50 afectad, cas, jurídic, l, ...
a.as.o 59 cas, citad, jurídic, l, ...
a.o.os 105 impuest, indonesi, italian, jurídic,
...
as.o.os 54 cas, implicad, jurídic, l, ...
a.as 199 huelg, incluid, industri, inundad, ...
a.os 134 impedid, impuest, indonesi, inundad, ...
as.os 68 cas, implicad, inundad, jurídic, ...
a.o 214 id, indi, indonesi, inmediat, ...
as.o 85 intern, jurídic, just, l, ...
o.os 268 human, implicad, indici,
indocumentad, ...
a 1237 huelg, ib, id, iglesi, ...
as 404 huelg, huelguist, incluid, industri, ...
os 534 humorístic, human, hígad, impedid, ...
o 1139 hub, hug, human, huyend, ...
27
28
Examples and Evaluation of Automatically Selected
Suffix Sets
Ø.ba.n.ndo ada.adas.ado.ados.aron.ó
a.aba.ado.ados.ar.ará.arán ada.ado.ados.ar.o
a.aciones.ación.adas.ado.ar ado.adores.o
a.ada.adas.ado.ar.ará ado.ados.arse.e
a.adas.ado.an.ar ado.ar.aron.arse.ará
a.ado.ados.ar.ó do.dos.ndo.r.ron
a.ado.an.arse.ó e.ida.ido
a.ado.aron.arse.ó emos.ido.ía.ían
aba.ada.ado.ar.o.os ida.ido.idos.ir.ió
aciones.ación.ado.ados ido.iendo.ir
aciones.ado.ados.ará ido.ir.ro
ación.ado.an.e
Global Suffix Evaluation Precision 0.506 Reca
ll 0.517 F1 0.511
28
29
Next Steps for Morphology Induction
  • Improve the Quality of Induced Paradigms
  • Current Work
  • Convert Paradigms into a Segmenter
  • Soon
  • Learn Mappings from Morphemes to Features
  • Future Goal

30
Avenue Overview
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
Learning Module
Do NOT Use
Handcrafted rules
Run Time Transfer System
Learning Module
Transfer Rules
Rule Refinement Module
Elicitation Corpus
Morphology Analyzer
Lexical Resources
Lattice
Elicitation Tool
31
Mapudungun
  • Indigenous Language of Chile and Argentina
  • 1 Million Mapuche Speakers

32
Collaboration
Eliseo Cañulef Rosendo Huisca Hugo Carrasco
Hector Painequeo Flor Caniupil Luis Caniupil
Huaiquiñir Marcela Collio Calfunao Cristian
Carrillan Anton Salvador Cañulef
  • Mapuche Language Experts
  • Universidad de la Frontera (UFRO)
  • Instituto de Estudios Indígenas (IEI)
  • Institute for Indigenous Studies
  • Chilean Funding
  • Chilean Ministry of Education (Mineduc)
  • Bilingual and Multicultural Education Program

Carolina Huenchullan Arrúe Claudio Millacura
Salas
33
Accomplishments
  • Corpora Collection
  • Spoken Corpus
  • Collected Luis Caniupil Huaiquiñir
  • Medical Domain
  • 3 of 4 Mapudungun Dialects
  • 120 hours of Nguluche
  • 30 hours of Lafkenche
  • 20 hours of Pwenche
  • Transcribed in Mapudungun
  • Translated into Spanish
  • Written Corpus
  • 200,000 words
  • Bilingual Mapudungun Spanish
  • Historical and newspaper text

nmlch-nmjm1_x_0405_nmjm_00 M ltSPAgtno pütokovilu
kay ko C no, si me lo tomaba con agua M
chumgechi pütokoki femuechi pütokon pu ltNoisegt
C como se debe tomar, me lo tomé
pués nmlch-nmjm1_x_0406_nmlch_00 M
Chengewerkelafuymiürke C Ya no estabas como
gente entonces!
34
Accomplishments
  • Developed At UFRO
  • Bilingual Dictionary with Examples
  • 1,926 entries
  • Spelling Corrected Mapudungun Word List
  • 117,003 fully-inflected word forms
  • Segmented Word List
  • 15,120 forms
  • Stems translated into Spanish

35
Accomplishments
  • Developed at LTI using Mapudungun language
    resources from UFRO
  • Spelling Checker
  • Integrated into OpenOffice
  • Hand-built Morphological Analyzer
  • Prototype Machine Translation Systems
  • Rule-Based
  • Example-Based
  • LenguasAmerindias.org
Write a Comment
User Comments (0)
About PowerShow.com