Whats so hard about translation - PowerPoint PPT Presentation

Loading...

PPT – Whats so hard about translation PowerPoint presentation | free to view - id: 6a385-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Whats so hard about translation

Description:

Partner with speakers of languages that have never been written ... translate the Bible and community ... arrows-at-the-gazelle. Syntactic variation ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 25
Provided by: umiac7
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Whats so hard about translation


1
Whats so hard about translation?
  • Ed Kenschaft
  • University of Maryland
  • UMIACS, CLIP Lab

2
Translation Needs (1)
  • Assimilation
  • News monitoring
  • Intercepts, noisy documents
  • High recall, low precision

3
Translation Needs (2)
  • Dissemination
  • UN, EU
  • Commercial documentation
  • Bible translation
  • High recall precision

4
Translation Needs (3)
  • Emergency
  • Military
  • Medical
  • Disaster relief
  • High precision, moderate recall

5
SIL International
  • Faith-based Christian organization
  • Partner with speakers of languages that have
    never been written down
  • Purposes
  • preserve the language and culture
  • document the language for study
  • translate the Bible and community development
    materials
  • Documented 1400 languages in 70 countries

6
Challenges 1
  • Ultra-low-density languages
  • mostly unwritten
  • no large (or small) parallel corpora
  • no Bible for bootstrapping

7
Challenges 2
  • Untrained translators
  • 6th grade education
  • One trained linguist for 10 languages

8
Challenges 3
  • Exceedingly rich domain of discourse
  • approximates all of natural language
  • Genres
  • historical narrative
  • dialog
  • poetry
  • personal letters
  • Topics
  • business, politics, sex, relationships, diet …
  • no controlled vocabulary

9
Challenges 4
  • Demand for 100 accuracy/fluency
  • Life-changing lessons
  • Easy to misinterpret

10
Challenges 5
  • Nearly endless variety of target languages
  • 6800 languages
  • 1400 written, 5400 unwritten
  • half will survive next century
  • 2000-3000 remaining

11
Linguistic Variation
  • Phonological variation
  • Morphological variation
  • three-boys-shot-arrows-at-the-gazelle
  • Syntactic variation
  • grammatical markers (e.g. dual, causative)
  • discourse markers (e.g. topic/focus)
  • honorifics

12
Cultural Variation
  • Cleanse me with hyssop, and I will be
    clean wash me, and I will be whiter than
    snow. (Psalm 517, NIV)
  • What is hyssop?
  • What is snow?
  • What does it mean to be white?

13
Cultural Variation
  • Cleanse me with a plant indigenous to the lands
    of the ancient Near East, used in Jewish
    religious ceremonies, and I will be whiter than
    the precipitation that falls like rain when the
    weather is very cold, which indicates a state of
    moral purity.

14
Intelligibility ? Fidelity (1)
  • Moses had horns.

15
Intelligibility ? Fidelity (2)
  • Where there is no vision, the people perish.
    (Proverbs 2918a, KJ21)

16
Intelligibility ? Fidelity (2)
  • Where there is no vision, the people perish.
    (Proverbs 2918a, KJ21)
  • When people do not accept divine guidance, they
    run wild. (Pr 2918a, NLT)

17
Waste of Time?
  • Can a computer solve all these problems?
  • Not on your life
  • Can a computer replace a translator?
  • Limited domains only
  • What can it do?
  • Word-processing
  • Data storage analysis
  • First draft?

18
General Approach
  • CAT vs. MT
  • Linguistically informed systems
  • Supervised learning
  • Exploit all available resources
  • SL resources
  • Existing TL data

19
Data Representation
  • Text encoding
  • Unicode
  • Fonts
  • Graphite
  • Interlinear text
  • LinguaLinks, Toolbox, FieldWorks

20
Elicitation Analysis
  • Elicit syntactic morphological data
  • AVENUE, EXPEDITION
  • Elicit word lists for language survey
  • WordSurv

21
SL Resources
  • Related language adaptation
  • CARLA
  • Projection across word alignment
  • GIZA, Multi-Align, Parser Projection

22
NLG
  • Rich interlingua
  • TBTA (Tod Allman)
  • Statistical fluency enhancement
  • (Sebastian Varges)

23
Evaluation
  • Need for automation
  • Multiplying documents
  • Shortage of experts
  • BLEU
  • How well does it work?
  • What does it mean?
  • METEOR
  • Stresses recall

24
The Limits of NLP
  • Who knows?
About PowerShow.com