LING 138238 SYMBSYS 138 Intro to Computer Speech and Language Processing - PowerPoint PPT Presentation

About This Presentation
Title:

LING 138238 SYMBSYS 138 Intro to Computer Speech and Language Processing

Description:

Systran (Babelfish) been used for 40 years. 1970's: European focus in MT; mainly ignored in US ... Babelfish: http://babelfish.altavista.com/ 6/15/09. LING 138 ... – PowerPoint PPT presentation

Number of Views:174
Avg rating:3.0/5.0
Slides: 49
Provided by: DanJur6
Learn more at: https://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: LING 138238 SYMBSYS 138 Intro to Computer Speech and Language Processing


1
LING 138/238 SYMBSYS 138Intro to Computer Speech
and Language Processing
  • Lecture 11 Machine Translation (I)
  • November 2, 2004
  • Dan Jurafsky

Thanks to Bonnie Dorr for some of these slides!!
2
Outline for MT Week
  • Intro and a little history
  • Language Similarities and Divergences
  • Four main MT Approaches
  • Transfer
  • Interlingua
  • Direct
  • Statistical
  • Evaluation

3
What is MT?
  • Translating a text from one language to another
    automatically.

4
Machine Translation
  • Dai-yu alone on bed top think-of-with-gratitude
    Bao-chai again listen to window outside bamboo
    tip plantain leaf of on-top rain sound sigh drop
    clear cold penetrate curtain not feeling again
    fall down tears come
  • As she lay there alone, Dai-yus thoughts turned
    to Bao-chai Then she listened to the insistent
    rustle of the rain on the bamboos and plantains
    outside her window. The coldness penetrated the
    curtains of her bed. Almost without noticing it
    she had begun to cry.

5
Machine Translation
  • The Story of the Stone
  • The Dream of the Red Chamber (Cao Xueqin 1792)
  • Issues
  • Breaking up into words
  • Breaking up into sentences
  • Zero-anaphora
  • Penetrate - penetrated
  • Bamboo tip plaintain leaf - bamboos and
    plantains
  • Curtain - curtains of her bed
  • Rain sound sigh drop - insistent rustle of the
    rain

6
What is MT not good for?
  • Really hard stuff
  • Literature
  • Natural spoken speech (meetings, court reporting)
  • Really important stuff
  • Medical translation in hospitals, 911

7
What is MT good for?
  • Tasks for which a rough translation is fine
  • Web pages, email
  • Tasks for which MT can be post-edited
  • MT as first pass
  • Computer-aided human translation
  • Tasks in sublanguage domains where high-quality
    MT is possible

8
Sublanguage domain
  • Weather forecasting
  • Cloudy with a chance of showers today and
    Thursday
  • Low tonight 4
  • Can be modeling completely enough to use raw MT
    output
  • Word classes and semantic features like MONTH,
    PLACE, DIRECTION, TIME POINT

9
MT History
  • 1946 Booth and Weaver discuss MT at Rockefeller
    foundation in New York
  • 1947-48 idea of dictionary-based direct
    translation
  • 1949 Weaver memorandum popularized idea
  • 1952 all 18 MT researchers in world meet at MIT
  • 1954 IBM/Georgetown Demo Russian-English MT
  • 1955-65 lots of labs take up MT

10
History of MT Pessimism
  • 1959/1960 Bar-Hillel Report on the state of MT
    in US and GB
  • Argued FAHQT too hard (semantic ambiguity, etc)
  • Should work on semi-automatic instead of
    automatic
  • His argumentLittle John was looking for his toy
    box. Finally, he found it. The box was in the
    pen. John was very happy.
  • Only human knowledge lets us know that
    playpens are bigger than boxes, but writing
    pens are smaller
  • His claim we would have to encode all of human
    knowledge

11
History of MT Pessimism
  • The ALPAC report
  • Headed by John R. Pierce of Bell Labs
  • Conclusions
  • Supply of human translators exceeds demand
  • All the Soviet literature is already being
    translated
  • MT has been a failure all current MT work had to
    be post-edited
  • Sponsored evaluations which showed that
    intelligibility and informativeness was worse
    than human translations
  • Results
  • MT research suffered
  • Funding loss
  • Number of research labs declined
  • Association for Machine Translation and
    Computational Linguistics dropped MT from its
    name

12
History of MT
  • 1976 Meteo, weather forecasts from English to
    French
  • Systran (Babelfish) been used for 40 years
  • 1970s
  • European focus in MT mainly ignored in US
  • 1980s
  • ideas of using AI techniques in MT (KBMT, CMU)
  • 1990s
  • Commercial MT systems
  • Statistical MT
  • Speech-to-speech translation

13
Language Similarities and Divergences
  • Some aspects of human language are universal or
    near-universal, others diverge greatly.
  • Typology the study of systematic
    cross-linguistic similarities and differences
  • What are the dimensions along with human
    languages vary?

14
Morphological Variation
  • Isolating languages
  • Cantonese, Vietnamese each word generally has
    one morpheme
  • Vs. Polysynthetic languages
  • Siberian Yupik (Eskimo) single word may have
    very many morphemes
  • Agglutinative languages
  • Turkish morphemes have clean boundaries
  • Vs. Fusion languages
  • Russian single affix may have many morphemes

15
Syntactic Variation
  • SVO (Subject-Verb-Object) languages
  • English, German, French, Mandarin
  • SOV Languages
  • Japanese, Hindi
  • VSO languages
  • Irish, Classical Arabic
  • SVO lgs generally prepositions to Yuriko
  • VSO lgs generally postpositions Yuriko ni

16
Segmentation Variation
  • Not every writing system has word boundaries
    marked
  • Chinese, Japanese, Thai, Vietnamese
  • Some languages tend to have sentences that are
    quite long, closer to English paragraphs than
    sentences
  • Modern Standard Arabic, Chinese

17
Inferential Load
  • Some languages require the hearer to do more
    figuring out of who the various actors in the
    various events are
  • Japanese, Chinese,
  • Other languages are pretty explicit about saying
    who did what to whom.
  • English

18
Lexical Divergences
  • Word to phrases
  • English computer science French
    informatique
  • POS divergences
  • Eng. she likes/VERB to sing
  • Ger. Sie singt gerne/ADV
  • Eng Im hungry/ADJ
  • Sp. tengo hambre/NOUN

19
Lexical Divergences Specificity
  • Grammatical constraints
  • English has gender on pronouns, Mandarin not.
  • So translating 3rd person from Chinese to
    English, need to figure out gender of the person!
  • Similarly from English they to French
    ils/elles
  • Semantic constraints
  • English brother
  • Mandarin gege (older) versus didi (younger)
  • English wall
  • German Wand (inside) Mauer (outside)
  • German Berg
  • English hill or mountain

20
Lexical Divergence one-to-many
21
Lexical Divergence lexical gaps
  • Japanese no word for privacy
  • English no word for Cantonese haauseun or
    Japanese oyakoko (something like filial
    piety)
  • English cow versus beef, Cantonese ngau

22
Event-to-argument divergences
  • English
  • The bottle floated out.
  • Spanish
  • La botella salió flotando.
  • The bottle exited floating
  • Verb-framed lg mark direction of motion on verb
  • Spanish, French, Arabic, Hebrew, Japanese, Tamil,
    Polynesian, Mayan, Bantu familiies
  • Satellite-framed lg mark direction of motion on
    satellite
  • Crawl out, float off, jump down, walk over to,
    run after
  • Rest of Indo-European, Hungarian, Finnish, Chinese

23
Structural divergences
  • G Wir treffen uns am Mittwoch
  • E Well meet on Wednesday

24
Head Swapping
  • E X swim across Y
  • S X crucar Y nadando
  • E I like to eat
  • G Ich esse gern
  • E Id prefer vanilla
  • G Mir wäre Vanille lieber

25
Thematic divergence
  • Y me gusto
  • I like Y
  • G Mir fällt der Termin ein
  • E I forget the date

26
Divergence counts from Bonnie Dorr
  • 32 of sentences in UN Spanish/English Corpus (5K)

27
MT on the web
  • Babelfish
  • http//babelfish.altavista.com/

28
3 methods for MT
  • Direct
  • Transfer
  • Interlingua

29
Three MT Approaches Direct, Transfer,
Interlingual
Interlingua
This slide from Bonnie Dorr! Original metaphor
due to Bernard Vauquois
Semantic Composition
Semantic Decomposition
Semantic Structure
Semantic Structure
Semantic Analysis
Semantic Generation
Semantic Transfer
Syntactic Structure
Syntactic Structure
Syntactic Transfer
Syntactic Analysis
Syntactic Generation
Word Structure
Word Structure
Direct
Morphological Generation
Morphological Analysis
Target Text
Source Text
30
The Transfer Model
  • Idea apply contrastive knowledge, i.e.,
    knowledge about the difference between two
    languages
  • Steps
  • Analysis Syntactically parse Source language
  • Transfer Rules to turn this parse into parse for
    Target language
  • Generation Generate Target sentence from parse
    tree

31
Transfer architecture
32
English to French
  • Generally
  • English Adjective Noun
  • French Noun Adjective
  • Note not always true
  • Route mauvaise bad road, badly-paved road
  • Mauvaise route wrong road)
  • But is a reasonable first approximation
  • Rule

33
Example English to Japanese Transfer
34
English to Japanese Transfer
  • From niqa no teire o suru ojiisan ita
  • Add ga to mark subject
  • Chose verb to agree with suject
  • Inflect verbs
  • Linearize tree
  • Niwa no teire o shite ita ojiisan ga ita
  • Garden GEN upkeep OBJ do PASTPROG old man SUBJ
    was

35
E-to-J Transfer rules used
  • Existential-There-Sentence
  • There1 Verb2 NP3 Postnominal4
  • -
  • (NP - NP3 Relative-Clause4) Verb2
  • NP - Np1 Relative-Clause2
  • -
  • NP - Relative-Clause2 NP1

36
Lexical Transfer
  • Man
  • Ojisan old man
  • Man is the only linguistic animal -
  • Ningen man, human being
  • Or
  • Hito person, persons
  • Can treat like lexical ambiguity,
  • Disambiguate during parsing

37
Transfer some problems
  • N2 sets of transfer rules!
  • Grammar and lexicon full of language-specific
    stuff
  • Hard to build, hard to maintain

38
MT Method 2 Interlingua
  • Intuition Instead of lg-lg knowledge rules, use
    the meaning of the sentence to help
  • Steps
  • 1) translate source sentence into meaning
    representation
  • 2) generate target sentence from meaning.

39
Interlingua forthere was an old man gardening
  • EVENT GARDENING
  • AGENT MAN
  • NUMBER SG
  • DEFINITENESS INDEF
  • ASPECT PROGRESSIVE
  • TENSE PAST

40
Interlingua
  • Idea is that some of the MT work that we need to
    do is part of other NLP tasks
  • E.g., disambiguating Ebook Slibro from Ebook
    Sreservar
  • So we could have concepts like BOOKVOLUME and
    RESERVE and solve this problem once for each
    language

41
Vauqois diagram
42
Direct Translation
  • Idea more robust, word-specific models
  • Start with a Source language sentence
  • Write little transformations, directly on words,
    to turn it into a Target language sentence.

43
Direct MT J-to-E
  • Watashihatsukuenouenopenwojonniageta.
  • 1. Morphological analysis
  • Watashi h tsukue no ue no pen wo jon ni ageru
    PAST
  • 2) lexical transfer of content words
  • I ha desk no ue no pen wo John ni give PAST
  • 3) various preposition work
  • I ha pen on desk wo John to give PAST.
  • 4) SVO rearrangements
  • I give PAST pen on desk John to.
  • 5) miscellany
  • I give PAST the pen on the desk to John.
  • 6) morphological generation
  • I gave the pen on the desk to John.

44
Direct MT stage 2, (ex. from Panov 1960 via
Hutchins 1986)
  • Function direct-translate-much/many
  • If preceding word is how
  • Return skolko
  • Else if preceding word is as
  • Return skolko zhe
  • Else if word is much
  • If preceding words is very
  • Return nil (not translated)
  • Else if following word is a noun
  • Return mnogo
  • Else /word is many/
  • If preceding word is PREP and following is NOUN
  • Return mnogii
  • Else return mnogo

45
Three MT Approaches Direct, Transfer,
Interlingual
Interlingua
This slide from Bonnie Dorr! Original metaphor
due to Bernard Vauquois
Semantic Composition
Semantic Decomposition
Semantic Structure
Semantic Structure
Semantic Analysis
Semantic Generation
Semantic Transfer
Syntactic Structure
Syntactic Structure
Syntactic Transfer
Syntactic Analysis
Syntactic Generation
Word Structure
Word Structure
Direct
Morphological Generation
Morphological Analysis
Target Text
Source Text
46
3 methods pros and cons
  • Thanks to Bonnie Dorr!

47
Direct MT pros and cons (thanks to Bonnie Dorr)
  • Pros
  • Fast
  • Simple
  • Cheap
  • No translation rules hidden in lexicon
  • Cons
  • Unreliable
  • Not powerful
  • Rule proliferation
  • Requires lots of context
  • Major restructuring after lexical substitution

48
Interlingual MT pros and cons (from B. Dorr)
  • Pros
  • Avoids the N2 problem
  • Easier to write rules
  • Cons
  • Semantics is HARD
  • Useful information lost (paraphrase)
Write a Comment
User Comments (0)
About PowerShow.com