Title: Next slide is the Neutral Avenue System Diagram with a Morphology Learning box added
1Next slide is the Neutral Avenue System Diagram
with a Morphology Learning box added
2Avenue Overview
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
Learning Module
Do NOT Use
Handcrafted rules
Run Time Transfer System
Learning Module
Transfer Rules
Rule Refinement Module
Elicitation Corpus
Morphology Analyzer
Lexical Resources
Lattice
Elicitation Tool
3The next slide is for Ari. It has her sections
highlighted but also has the extra box that I
added for Morphology Learning
4Rule Refinement
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
Learning Module
Do NOT Use
Handcrafted rules
Run Time Transfer System
Learning Module
Transfer Rules
Rule Refinement Module
Elicitation Corpus
Morphology Analyzer
Lexical Resources
Lattice
Elicitation Tool
5Here is where Christians presentation begins
6Avenue Overview
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
Learning Module
Do NOT Use
Handcrafted rules
Run Time Transfer System
Learning Module
Transfer Rules
Rule Refinement Module
Elicitation Corpus
Morphology Analyzer
Lexical Resources
Lattice
Elicitation Tool
7The Challenge of Morphology
- Mapudungun (Indigenous Language of Chile and
Argentina, 1 Million Speakers) -
Allkütulekefun
8The Challenge of Morphology
-ke
-fu
-n
-le
Allkütu
9The Challenge of Morphology
-ke
-fu
-n
-le
Allkütu
-past
-indic.1sg
-habitual
-prog.
Listen
10The Challenge of Morphology
-ke
-fu
-n
-le
Allkütu
-past
-indic.1sg
-habitual
-prog.
Listen
I
11The Challenge of Morphology
-ke
-fu
-n
-le
Allkütu
-past
-indic.1sg
-habitual
-prog.
Listen
I
used
to
12The Challenge of Morphology
-ke
-fu
-n
-le
Allkütu
-past
-indic.1sg
-habitual
-prog.
Listen
I
used
to
listen
13The Challenge of Morphology
- Tasks for Morphology
- Segment Words
- Map Morphemes onto Features
-
-ke
-fu
-n
-le
Allkütu
-past
-indic.1sg
-habitual
-prog.
Listen
I
used
to
listen
14The Challenge of Morphology
- Tasks for Morphology
- Segment Words
- Map Morphemes onto Features
-
- Learn these tasks
- unsupervised
- from data
- for any language
-
15Our Approach
- Leverage the Natural Structure of Morphology
- Paradigm
- Set of affixes that interchangeably attach to a
set of stems
16Our Approach
Ø.s blame solve
- Leverage the Natural Structure of Morphology
- Paradigm
- Set of affixes that interchangeably attach to a
set of stems
Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
17Our Approach
Ø.s.d blame
Ø.s blame solve
- Leverage the Natural Structure of Morphology
- Paradigm
- Set of affixes that interchangeably attach to a
set of stems
Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
18Our Approach
Ø.s.d blame
Ø.s blame solve
- Leverage the Natural Structure of Morphology
- Paradigm
- Set of affixes that interchangeably attach to a
set of stems
Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
19Our Approach
Ø.s.d blame
Ø.s blame solve
- Leverage the Natural Structure of Morphology
- Paradigm
- Set of affixes that interchangeably attach to a
set of stems
Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
s blame roam solve
20Our Approach
Ø.s.d blame
Ø.s blame solve
- Leverage the Natural Structure of Morphology
- Paradigm
- Set of affixes that interchangeably attach to a
set of stems
Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
s blame roam solve
21Our Approach
Ø.s.d blame
Ø.s blame solve
e.es blam solv
Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
s blame roam solve
22Our Approach
Ø.s.d blame
Ø.s blame solve
e.es blam solv
Example Vocabulary blame blamed blames
roamed roaming roams solve
solves solving
s blame roam solve
23Ø.s.d blame
e.es.ed blam
me.mes.med bla
e.es blam solv
Ø.s blame solve
me.mes bla
e.ed blam
Ø.d blame
me.med bla
s.d blame
es.ed blam
mes.med bla
Ø blame blames blamed roams roamed roaming solve s
olves solving
e blam solv
me bla
s blame roam solve
es blam solv
mes bla
ed blam roam
d blame roame
med bla roa
24a.as.o.os.tro 1 cas
- Spanish Newswire Corpus
- 40,011 Tokens
- 6,975 Types
a.as.o.os 43 african, cas, jurídic, l, ...
a.as.os 50 afectad, cas, jurídic, l, ...
a.as.o 59 cas, citad, jurídic, l, ...
a.o.os 105 impuest, indonesi, italian, jurídic,
...
as.o.os 54 cas, implicad, jurídic, l, ...
a.as 199 huelg, incluid, industri, inundad, ...
a.os 134 impedid, impuest, indonesi, inundad, ...
as.os 68 cas, implicad, inundad, jurídic, ...
a.o 214 id, indi, indonesi, inmediat, ...
as.o 85 intern, jurídic, just, l, ...
a.tro 2 cas.cen
o.os 268 human, implicad, indici,
indocumentad, ...
a 1237 huelg, ib, id, iglesi, ...
as 404 huelg, huelguist, incluid, industri, ...
os 534 humorístic, human, hígad, impedid, ...
o 1139 hub, hug, human, huyend, ...
tro 16 catas, ce, cen, cua, ...
24
25a.as.o.os.tro 1 cas
Suffixes Stems
Level 5 5 suffixes Stem Type Count
a.as.o.os 43 african, cas, jurídic, l, ...
a.as.os 50 afectad, cas, jurídic, l, ...
a.as.o 59 cas, citad, jurídic, l, ...
a.o.os 105 impuest, indonesi, italian, jurídic,
...
as.o.os 54 cas, implicad, jurídic, l, ...
a.as 199 huelg, incluid, industri, inundad, ...
a.os 134 impedid, impuest, indonesi, inundad, ...
as.os 68 cas, implicad, inundad, jurídic, ...
a.o 214 id, indi, indonesi, inmediat, ...
as.o 85 intern, jurídic, just, l, ...
a.tro 2 cas.cen
o.os 268 human, implicad, indici,
indocumentad, ...
a 1237 huelg, ib, id, iglesi, ...
as 404 huelg, huelguist, incluid, industri, ...
os 534 humorístic, human, hígad, impedid, ...
o 1139 hub, hug, human, huyend, ...
tro 16 catas, ce, cen, cua, ...
25
26Adjective Inflection Class
From the spurious suffix tro
a.as.o.os 43 african, cas, jurídic, l, ...
a.as.os 50 afectad, cas, jurídic, l, ...
a.as.o 59 cas, citad, jurídic, l, ...
a.o.os 105 impuest, indonesi, italian, jurídic,
...
as.o.os 54 cas, implicad, jurídic, l, ...
a.as 199 huelg, incluid, industri, inundad, ...
a.os 134 impedid, impuest, indonesi, inundad, ...
as.os 68 cas, implicad, inundad, jurídic, ...
a.o 214 id, indi, indonesi, inmediat, ...
as.o 85 intern, jurídic, just, l, ...
o.os 268 human, implicad, indici,
indocumentad, ...
a 1237 huelg, ib, id, iglesi, ...
as 404 huelg, huelguist, incluid, industri, ...
os 534 humorístic, human, hígad, impedid, ...
o 1139 hub, hug, human, huyend, ...
26
27Basic Search Procedure
a.as.o.os 43 african, cas, jurídic, l, ...
a.as.os 50 afectad, cas, jurídic, l, ...
a.as.o 59 cas, citad, jurídic, l, ...
a.o.os 105 impuest, indonesi, italian, jurídic,
...
as.o.os 54 cas, implicad, jurídic, l, ...
a.as 199 huelg, incluid, industri, inundad, ...
a.os 134 impedid, impuest, indonesi, inundad, ...
as.os 68 cas, implicad, inundad, jurídic, ...
a.o 214 id, indi, indonesi, inmediat, ...
as.o 85 intern, jurídic, just, l, ...
o.os 268 human, implicad, indici,
indocumentad, ...
a 1237 huelg, ib, id, iglesi, ...
as 404 huelg, huelguist, incluid, industri, ...
os 534 humorístic, human, hígad, impedid, ...
o 1139 hub, hug, human, huyend, ...
27
28Examples and Evaluation of Automatically Selected
Suffix Sets
Ø.ba.n.ndo ada.adas.ado.ados.aron.ó
a.aba.ado.ados.ar.ará.arán ada.ado.ados.ar.o
a.aciones.ación.adas.ado.ar ado.adores.o
a.ada.adas.ado.ar.ará ado.ados.arse.e
a.adas.ado.an.ar ado.ar.aron.arse.ará
a.ado.ados.ar.ó do.dos.ndo.r.ron
a.ado.an.arse.ó e.ida.ido
a.ado.aron.arse.ó emos.ido.ía.ían
aba.ada.ado.ar.o.os ida.ido.idos.ir.ió
aciones.ación.ado.ados ido.iendo.ir
aciones.ado.ados.ará ido.ir.ro
ación.ado.an.e
Global Suffix Evaluation Precision 0.506 Reca
ll 0.517 F1 0.511
28
29Next Steps for Morphology Induction
- Improve the Quality of Induced Paradigms
- Current Work
- Convert Paradigms into a Segmenter
- Soon
- Learn Mappings from Morphemes to Features
- Future Goal
30Avenue Overview
Elicitation
Rule Learning
Run-Time System
Rule Refinement
Morphology
Translation Correction Tool
Word-Aligned Parallel Corpus
Learning Module
Do NOT Use
Handcrafted rules
Run Time Transfer System
Learning Module
Transfer Rules
Rule Refinement Module
Elicitation Corpus
Morphology Analyzer
Lexical Resources
Lattice
Elicitation Tool
31Mapudungun
- Indigenous Language of Chile and Argentina
- 1 Million Mapuche Speakers
32Collaboration
Eliseo Cañulef Rosendo Huisca Hugo Carrasco
Hector Painequeo Flor Caniupil Luis Caniupil
Huaiquiñir Marcela Collio Calfunao Cristian
Carrillan Anton Salvador Cañulef
- Mapuche Language Experts
- Universidad de la Frontera (UFRO)
- Instituto de Estudios Indígenas (IEI)
- Institute for Indigenous Studies
- Chilean Funding
- Chilean Ministry of Education (Mineduc)
- Bilingual and Multicultural Education Program
Carolina Huenchullan Arrúe Claudio Millacura
Salas
33Accomplishments
- Corpora Collection
- Spoken Corpus
- Collected Luis Caniupil Huaiquiñir
- Medical Domain
- 3 of 4 Mapudungun Dialects
- 120 hours of Nguluche
- 30 hours of Lafkenche
- 20 hours of Pwenche
- Transcribed in Mapudungun
- Translated into Spanish
- Written Corpus
- 200,000 words
- Bilingual Mapudungun Spanish
- Historical and newspaper text
nmlch-nmjm1_x_0405_nmjm_00 M ltSPAgtno pütokovilu
kay ko C no, si me lo tomaba con agua M
chumgechi pütokoki femuechi pütokon pu ltNoisegt
C como se debe tomar, me lo tomé
pués nmlch-nmjm1_x_0406_nmlch_00 M
Chengewerkelafuymiürke C Ya no estabas como
gente entonces!
34Accomplishments
- Developed At UFRO
- Bilingual Dictionary with Examples
- 1,926 entries
- Spelling Corrected Mapudungun Word List
- 117,003 fully-inflected word forms
- Segmented Word List
- 15,120 forms
- Stems translated into Spanish
35Accomplishments
- Developed at LTI using Mapudungun language
resources from UFRO - Spelling Checker
- Integrated into OpenOffice
- Hand-built Morphological Analyzer
- Prototype Machine Translation Systems
- Rule-Based
- Example-Based
- LenguasAmerindias.org