AVENUE: Machine Translation for ResourcePoor Languages NSF ITR 20012005 PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: AVENUE: Machine Translation for ResourcePoor Languages NSF ITR 20012005


1
AVENUE Machine Translation for Resource-Poor
LanguagesNSF ITR 2001-2005
2
Project MembersAutomated Rule Learning
  • Faculty
  • Jaime Carbonell
  • Ralf Brown
  • Alon Lavie
  • Lori Levin
  • Coordinator of Latin American Projects
  • Rodolfo Vega
  • Graduate Students
  • Ariadna Font Llitjos
  • Katharina Probst
  • Christian Monson
  • Erik Peterson

3
Resource Poor Languages
  • Not enough linguists to write a human-engineered
    system.
  • Not enough corpora to build a corpus-based
    system.
  • No standard orthography.
  • May be spoken by hundreds of thousands of people
    (Mapudungun, Chile) or by only a few elderly
    people (Siona, Colombia).

4
AVENUE languages
  • AVENUE is currently working with
  • Mapudungun Chile
  • Inupiaq Alaska
  • Aymara, Quechua and Aguaruna Peru
  • Siona Colombia

5
Mapudungun for the Mapuche
Chile Official Language Spanish Population 15
million
1/2 million Mapuche people Language Mapudungun
6
Where can Avenue make a difference for indigenous
communities?
  • To contribute to the development of the
    indigenous people at the local and national level

7
There are two possible ways to do this
  • A traditional way, from experts on development
  • Outcome To translate government policy
    documents, on health care, law, agriculture, etc.
  • An alternative way, from local experts, grounded
    in the communitys experience and needs
  • Outcome To contribute to language education in
    the form of literacy and second language
    acquisition

8
  • Inter- and multi-cultural
  • bilingual education
  • An educational strategy contributing to the
    development of the indigenous culture beyond the
    point of subsistence.
  • Helping each individual and their communities to
    achieve excellence in a multicultural national
    and global context.
  • Increasing the use of information and
    communication technologies, in a life-long
    learning environment.

9
In exchange for the language data, we agree to
contribute in the creation of the following
products
  • Plug-in orthographic corrector for word
    processors
  • Electronic dictionary
  • Web based translator
  • Intelligent tutor for literacy and second
    language acquisition

10
Our last meeting in Temuco, May 2002
11
Automatic Learning of a Transfer-based MTS
Kathrin Probst
tentative Transfer rules
SVS algorithm
Elicitation corpus
Transfer module
Rule Refinement module
(tentative) TL sentences
SL sentences
Erik Peterson
Ariadna Font
Morphology learning
Morphological analyzer
Christian Monson
12
Morphology Analyzer for Rule Based Machine
Translation
13
Morphology Analyzer for Rule Based Machine
Translation
14
Example and Motivation
15
Results
  • Language English
  • Corpus Brown Corpus
  • Set Accuracy 88.3
  • Example Clusters
  • NULLs navigator, discourse, peptide,
  • NULLs smith, china, cook,
  • NULLed slim, reappeared, munch,
  • NULLing reappear, respond, grunt,
  • NULLly peaceful, remote, superb,

16
Future Directions
  • More languages
  • Spanish
  • Mapudungun
  • More types of morphology
  • Prefixes
  • Infixes
  • Employ a human informant
  • Small amount of knowledge might help a lot

17
AVENUE Transfer Engine
  • Written specifically for automatically learned
    rules
  • Integrated with rule learner
  • Can also be augmented with hand-written rules
  • Currently researching constructions
  • Constructions are non-compositional structures
  • Many translation problems associated with
    constructions

18
Translation Example
  • Transfer English Output
  • Will the president resign?
  • During translation
  • Question particle ? is deleted
  • Auxiliary will is reordered before subject
  • the is added before president

19
New approach to MT
  • Fully automatic (no human intervention)
  • Very little electronic data available
    elicitation corpus
  • Machine learning techniques
  • Seeded version space algorithm to automatically
    learn transfer rules
  • Interactive and Automatic refinement of Transfer
    rules

20
Elicitation Tool
21
Rule Learning Overview
  • Goal learn transfer rules for a language pair
    where one language is resource-rich, the other is
    resource-poor
  • Learning proceeds in three steps
  • Flat Seed Generation informed guessing of
    transfer rules
  • Compositionality adding structure to rules,
    using previously learned rules
  • Seeded Version Space Learning generalizing rules
    to make them scale to more unseen examples

22
Flat Seed Generation - Example
The highly qualified applicant did not accept the
offer. Der äußerst qualifizierte Bewerber nahm
das Angebot nicht an. ((1,1),(2,2),(3,3),(4,4),(6,
8),(7,5),(7,9),(8,6),(9,7))
SS det adv adj n aux neg v det n? det adv
adj n v det n neg vpart (alignments (x1y1)(x2
y2)(x3y3)(x4y4)(x6y8)(x7y5)(x7y9)(x8
y6)(x9y7)) constraints ((x1 def) ) ((x4
agr) 3-sing) ((x5 tense) past) . ((y1
def) ) ((y3 case) nom) ((y4 agr)
3-sing) . )
23
Compositionality - Example
SS det adv adj n aux neg v det n? det adv
adj n v det n neg vpart (alignments (x1y1)(x2
y2)(x3y3)(x4y4)(x6y8)(x7y5)(x7y9)(x8
y6)(x9y7) constraints ((x1 def) ) ((x4
agr) 3-sing) ((x5 tense) past) . ((y1
def) ) ((y3 case) nom) ((y4 agr)
3-sing) . )
NPNP det AJDP n det ADJP n ((x1y1) ((y3
agr) 3-sing) ((x3 agr 3-sing) .)
SS NP aux neg v det n? NP v det n neg
vpart (alignments (x1y1)(x3y5)(x4y2)(x4
y6)(x5y3)(x6y4) constraints ((x2 tense)
past) . ((y1 def) ) ((y1 case) nom) .
)
24
Seeded Version Space Learning - Example
SS NP aux neg v det n? NP v det n neg
vpart (alignments (x1y1)(x3y5)(x4y2)(x4
y6)(x5y3)(x6y4) constraints ((x2 tense)
past) . ((y1 def) ) ((y1 case) nom)
((y1 agr) 3-sing) ) ((y3 agr) 3-sing)
((y4 agr) 3-sing) )
SS NP aux neg v det n? NP n det n neg
vpart ( alignments (x1y1)(x3y5) (x4y2)(x
4y6) (x5y3)(x6y4) constraints ((x2
tense) past) ((y1 def) ) ((y1 case)
nom) ((y4 agr) (y3 agr)) )
SS NP aux neg v det n? NP v det n neg
vpart (alignments (x1y1)(x3y5)(x4y2)(x4
y6)(x5y3)(x6y4) constraints ((x2 tense)
past) ((y1 def) ) ((y1 case) nom) ((y1
agr) 3-plu) ((y3 agr) 3-plu) ((y4 agr)
3-plu) )
25
Remaining Research Issues
  • Improvement of existing algorithms
  • Reversal of translation direction
  • Learning with less information on the
    resource-poor language
  • Learning from an unstructured corpus

26
Interactive and Automatic rule refinement
  • 1. Given an MTS, translate sentences and present
    them to the users for minimal correction
    (interface design, MT error classification)
  • 2. Determine blame assignment
  • 3. Structure learning, as opposed to binary
    feedback, to automatically refine the existing
    rules

27
Interactive Learning
  • Translation Correction Tool, web application
  • Bilingual informants (no knowledge of linguistics
    assumed)
  • User-friendly and Intuitive interface
  • Can naïve users reliably pinpoint the source of
    errors? MT error classification realistic?
  • Need of user studies
  • Spanish - English
  • English - Spanish
  • English - Chinese

28
(No Transcript)
29
(No Transcript)
30
Structure learning
Learn mapping between incorrect structures
and correct structures She saw ? high woman
She saw the tall woman
  • Given user feedback (correction error
    classification) and blame assignment, modify the
    appropriate transfer rule(s) to obtain correct
    translation
  • Need to evaluate based on cross-validation,
    number of sentences it can translate correctly
    (elicitation corpus)

31
A simple example
  • Spanish SLS Ella vio a la mujer alta
  • English TLS She saw high woman
  • Corrected TLS She saw the tall woman
  • MT error classification missing determiner
    wrong lexical selection
  • Blame assignment (NP rule that generated the
    direct object selectional restrictions)
  • Rule refinement
  • the Noun Phrase (NP) rule that generated the
    error
  • NP -gt Adj N
  • needs to be refined into 2 different cases
  • NP -gt Det Adj Nsg (the tall woman)
  • NP -gt (Det) Adj Npl ((the)? tall women)

32
Remaining research issues
  • Refine MT error classification
  • Blame assignment
  • Structure Learning algorithm
  • Expand elicitation corpus with more verb
    subcategorization patterns
Write a Comment
User Comments (0)
About PowerShow.com