Title: Translation Selection Using Bilingual Lexicon and Monolingual Corpus
1Translation Selection Using Bilingual Lexicon and
Monolingual Corpus
2Introduction
- In bilingual dictionary
- a word may have many senses, each sense may have
multiple target words
kill
English word
deprive of life
put an end to
English gloss
Malay Target Words
membunuh
mengorbankan
memusnahkan
menghapuskan
3Introduction (cont.)
- In machine translation
- translation selection is a process that selects
an appropriate target word corresponding to a
word in a source language (Lee et al., 2002)
to kill someone
? memusnahkan seseorang (X)
? menghapuskan seseorang (X)
? mengorbankan seseorang (?)
? membunuh seseorang (O)
4Introduction (cont.)
- Bilingual dictionary
- essential for translation selection
- usually contains less descriptive definitions
5Objective
- to build an English-Malay translation selection
system - select an appropriate Malay word corresponding to
an English word in an English sentence
6Previous Work
- Hyun and Gil, 2002
- Translation selection
- Source word sense disambiguation
- Target word selection
- Advantage
- Reduce complexity
- consider only target words for each word sense
- Weakness
- used only simple method of word sense
disambiguation
7Proposed System Design
Target LanguageCorpus
Bilingual Lexicon
Lexical ConceptualDistance Data(LCDD)
Sense Definition Example Sentence
Sense TargetWord Equivalent
Target WordCo-occurrence
Input sentence
Source word sensedisambiguation
Target wordselection
Word-level translation
Figure 1. Proposed System Design
8Knowledge Source
English-Malay Lexicon
Kamus InggerisMelayu Dewan(KIMD)
gloss,example
definition
Longman Dictionaryof ContemporaryEnglish (LDOCE)
WordNet
English Lexicon
English Lexicon
Figure 2. Bilingual lexicon enriched by
monolingual lexicons
9Knowledge Source (cont.)
- Target language corpus
- Malay corpus
- provide target word co-occurrence
- a pair of words which co-occur within a
predefined window (e.g. sentence) - e.g. (melawan, pertandingan), (mengajar, sekolah)
- Lexical Conceptual Distance Data (LCDD)
- measurement of relatedness between two word senses
10Source Word Sense Disambiguation
- Methods
- part-of-speech tagging
- use word sense definitions and LCDD
- calculate relatedness between definitions
- the sense with the most related definition is the
preferred sense
11Target Word Selection
- Method
- use target word co-occurrence
- calculate frequency
- a target word co-occurs in a corpus with
translations of other words within an input
sentence - frequency increases, probability increases
- apply distance factor
- if distance between two words in co-occurrence
increases, probability decreases
12Thank You