Title: AVENUE: Machine Translation for ResourcePoor Languages NSF ITR 20012005
1AVENUE Machine Translation for Resource-Poor
LanguagesNSF ITR 2001-2005
2Project MembersAutomated Rule Learning
- Faculty
- Jaime Carbonell
- Ralf Brown
- Alon Lavie
- Lori Levin
- Coordinator of Latin American Projects
- Rodolfo Vega
- Graduate Students
- Ariadna Font Llitjos
- Katharina Probst
- Christian Monson
- Erik Peterson
3Resource Poor Languages
- Not enough linguists to write a human-engineered
system. - Not enough corpora to build a corpus-based
system. - No standard orthography.
- May be spoken by hundreds of thousands of people
(Mapudungun, Chile) or by only a few elderly
people (Siona, Colombia).
4AVENUE languages
- AVENUE is currently working with
- Mapudungun Chile
- Inupiaq Alaska
- Aymara, Quechua and Aguaruna Peru
- Siona Colombia
5Mapudungun for the Mapuche
Chile Official Language Spanish Population 15
million
1/2 million Mapuche people Language Mapudungun
6Where can Avenue make a difference for indigenous
communities?
- To contribute to the development of the
indigenous people at the local and national level
7There are two possible ways to do this
- A traditional way, from experts on development
- Outcome To translate government policy
documents, on health care, law, agriculture, etc. - An alternative way, from local experts, grounded
in the communitys experience and needs - Outcome To contribute to language education in
the form of literacy and second language
acquisition
8- Inter- and multi-cultural
- bilingual education
- An educational strategy contributing to the
development of the indigenous culture beyond the
point of subsistence. - Helping each individual and their communities to
achieve excellence in a multicultural national
and global context. - Increasing the use of information and
communication technologies, in a life-long
learning environment.
9In exchange for the language data, we agree to
contribute in the creation of the following
products
- Plug-in orthographic corrector for word
processors - Electronic dictionary
- Web based translator
- Intelligent tutor for literacy and second
language acquisition
10Our last meeting in Temuco, May 2002
11Automatic Learning of a Transfer-based MTS
Kathrin Probst
tentative Transfer rules
SVS algorithm
Elicitation corpus
Transfer module
Rule Refinement module
(tentative) TL sentences
SL sentences
Erik Peterson
Ariadna Font
Morphology learning
Morphological analyzer
Christian Monson
12Morphology Analyzer for Rule Based Machine
Translation
13Morphology Analyzer for Rule Based Machine
Translation
14Example and Motivation
15Results
- Language English
- Corpus Brown Corpus
- Set Accuracy 88.3
- Example Clusters
- NULLs navigator, discourse, peptide,
- NULLs smith, china, cook,
- NULLed slim, reappeared, munch,
- NULLing reappear, respond, grunt,
- NULLly peaceful, remote, superb,
16Future Directions
- More languages
- Spanish
- Mapudungun
- More types of morphology
- Prefixes
- Infixes
- Employ a human informant
- Small amount of knowledge might help a lot
17AVENUE Transfer Engine
- Written specifically for automatically learned
rules - Integrated with rule learner
- Can also be augmented with hand-written rules
- Currently researching constructions
- Constructions are non-compositional structures
- Many translation problems associated with
constructions
18 Translation Example
- Transfer English Output
- Will the president resign?
- During translation
- Question particle ? is deleted
- Auxiliary will is reordered before subject
- the is added before president
19New approach to MT
- Fully automatic (no human intervention)
- Very little electronic data available
elicitation corpus - Machine learning techniques
- Seeded version space algorithm to automatically
learn transfer rules - Interactive and Automatic refinement of Transfer
rules
20Elicitation Tool
21Rule Learning Overview
- Goal learn transfer rules for a language pair
where one language is resource-rich, the other is
resource-poor - Learning proceeds in three steps
- Flat Seed Generation informed guessing of
transfer rules - Compositionality adding structure to rules,
using previously learned rules - Seeded Version Space Learning generalizing rules
to make them scale to more unseen examples
22Flat Seed Generation - Example
The highly qualified applicant did not accept the
offer. Der äußerst qualifizierte Bewerber nahm
das Angebot nicht an. ((1,1),(2,2),(3,3),(4,4),(6,
8),(7,5),(7,9),(8,6),(9,7))
SS det adv adj n aux neg v det n? det adv
adj n v det n neg vpart (alignments (x1y1)(x2
y2)(x3y3)(x4y4)(x6y8)(x7y5)(x7y9)(x8
y6)(x9y7)) constraints ((x1 def) ) ((x4
agr) 3-sing) ((x5 tense) past) . ((y1
def) ) ((y3 case) nom) ((y4 agr)
3-sing) . )
23Compositionality - Example
SS det adv adj n aux neg v det n? det adv
adj n v det n neg vpart (alignments (x1y1)(x2
y2)(x3y3)(x4y4)(x6y8)(x7y5)(x7y9)(x8
y6)(x9y7) constraints ((x1 def) ) ((x4
agr) 3-sing) ((x5 tense) past) . ((y1
def) ) ((y3 case) nom) ((y4 agr)
3-sing) . )
NPNP det AJDP n det ADJP n ((x1y1) ((y3
agr) 3-sing) ((x3 agr 3-sing) .)
SS NP aux neg v det n? NP v det n neg
vpart (alignments (x1y1)(x3y5)(x4y2)(x4
y6)(x5y3)(x6y4) constraints ((x2 tense)
past) . ((y1 def) ) ((y1 case) nom) .
)
24Seeded Version Space Learning - Example
SS NP aux neg v det n? NP v det n neg
vpart (alignments (x1y1)(x3y5)(x4y2)(x4
y6)(x5y3)(x6y4) constraints ((x2 tense)
past) . ((y1 def) ) ((y1 case) nom)
((y1 agr) 3-sing) ) ((y3 agr) 3-sing)
((y4 agr) 3-sing) )
SS NP aux neg v det n? NP n det n neg
vpart ( alignments (x1y1)(x3y5) (x4y2)(x
4y6) (x5y3)(x6y4) constraints ((x2
tense) past) ((y1 def) ) ((y1 case)
nom) ((y4 agr) (y3 agr)) )
SS NP aux neg v det n? NP v det n neg
vpart (alignments (x1y1)(x3y5)(x4y2)(x4
y6)(x5y3)(x6y4) constraints ((x2 tense)
past) ((y1 def) ) ((y1 case) nom) ((y1
agr) 3-plu) ((y3 agr) 3-plu) ((y4 agr)
3-plu) )
25Remaining Research Issues
- Improvement of existing algorithms
- Reversal of translation direction
- Learning with less information on the
resource-poor language - Learning from an unstructured corpus
26Interactive and Automatic rule refinement
- 1. Given an MTS, translate sentences and present
them to the users for minimal correction
(interface design, MT error classification) - 2. Determine blame assignment
- 3. Structure learning, as opposed to binary
feedback, to automatically refine the existing
rules
27Interactive Learning
- Translation Correction Tool, web application
- Bilingual informants (no knowledge of linguistics
assumed) - User-friendly and Intuitive interface
- Can naïve users reliably pinpoint the source of
errors? MT error classification realistic? - Need of user studies
- Spanish - English
- English - Spanish
- English - Chinese
28(No Transcript)
29(No Transcript)
30Structure learning
Learn mapping between incorrect structures
and correct structures She saw ? high woman
She saw the tall woman
- Given user feedback (correction error
classification) and blame assignment, modify the
appropriate transfer rule(s) to obtain correct
translation - Need to evaluate based on cross-validation,
number of sentences it can translate correctly
(elicitation corpus)
31A simple example
- Spanish SLS Ella vio a la mujer alta
- English TLS She saw high woman
- Corrected TLS She saw the tall woman
- MT error classification missing determiner
wrong lexical selection - Blame assignment (NP rule that generated the
direct object selectional restrictions) - Rule refinement
- the Noun Phrase (NP) rule that generated the
error - NP -gt Adj N
- needs to be refined into 2 different cases
- NP -gt Det Adj Nsg (the tall woman)
- NP -gt (Det) Adj Npl ((the)? tall women)
32Remaining research issues
- Refine MT error classification
- Blame assignment
- Structure Learning algorithm
- Expand elicitation corpus with more verb
subcategorization patterns