Translating%20Collocations%20for%20Bilingual%20Lexicons - PowerPoint PPT Presentation

About This Presentation
Title:

Translating%20Collocations%20for%20Bilingual%20Lexicons

Description:

Translating Collocations for Bilingual Lexicons Collocations (idiomatic multi-word expressions) difficult to translate semantically opaque cannot be translated word ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 7
Provided by: Vasile3
Category:

less

Transcript and Presenter's Notes

Title: Translating%20Collocations%20for%20Bilingual%20Lexicons


1
Translating Collocations for Bilingual Lexicons
  • Collocations (idiomatic multi-word expressions)
    difficult to translate
  • semantically opaque
  • cannot be translated word-by-word
  • a major obstacle to second language acquisition
  • Example demonstrate support ? prouver son
    adhésion (prove adherence)

2
The Champollion approach
  • Input Large parallel corpora
  • Output List of collocations in each language,
    and equivalence mappings between these
    collocations
  • The method is statistical and language-independent

3
Algorithm
  • Align sentences across corpora
  • Extract collocations from co-occurrence
  • Identify all words that frequently appear across
    a source collocation
  • Iteratively consider and score combinations of
    those words
  • Select best set of words for the translation
  • Determine word order and fill in prepositions

4
Sample translations
  • additional costs ? coûts supplémentaires
  • affirmative action ? action positive
  • free trade ? libre-échange
  • freer trade ? libéralisation échanges
  • take steps ? prendre mesures
  • stock market ? bourse

5
Evaluation results
  • Corpus of 3.5 million words, collocations
    selected from the same corpus 78
  • Corpus of 8.5 million words, collocations
    selected from the same corpus 74
  • Corpus of 3.5 million words, collocations
    selected from a different corpus 65

6
Conclusion
  • Champollion provides for collocation translation
  • Robust
  • Language-independent
  • Requires no tools
  • But Requires parallel corpora
Write a Comment
User Comments (0)
About PowerShow.com