Translation Enhancement: a New Relevance Feedback Method for CrossLanguage Information Access - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Translation Enhancement: a New Relevance Feedback Method for CrossLanguage Information Access

Description:

none – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 26
Provided by: daq3
Category:

less

Transcript and Presenter's Notes

Title: Translation Enhancement: a New Relevance Feedback Method for CrossLanguage Information Access


1
Translation Enhancement a New Relevance
Feedback Method for Cross-Language Information
Access
Daqing He University of Pittsburgh dah44_at_pitt.edu
Dan Wu Wuhan University woodan_at_whu.edu.cn
2
Outline
  • Motivations
  • Translation Enhancement
  • Experiments and Results
  • Conclusions

3
Query Translation Based CLIR in TREC like
Environments
query translation
4
Usages of RF Information
  • Query expansion (QE) methods perform QE before
    query translation (thus pre-translation QE)
    or/and after query translation (post-translation
    QE)
  • Post-translation QE or the combination of the two
    performed the best Ballesteros Croft 97,
    McNamee Mayfield 02

Query Translation
Search on Target Language Collection
Relevance Feedback
Query
Pre-translation Query Expansion
Post-translation Query Expansion
Search on a Source Language Collection
5
Translations in Query Translation based CLIR
query translation
result translation
6
What Can Obtain From RF?
query translation
f1,f2,fn
result translation
e1,e2,em
e1 ltgt f1 en ltgt fn
7
Usages of RF Information - II
  • Query expansion (QE) methods expand
    pre-translation or/and post-translation queries
  • Translation Enhancement (TE) improve query
    translation resources using the obtained relevant
    translation relationships

Query Translation
Search on Target Language Collection
Relevance Feedback
Query
Pre-translation Query Expansion
Post-translation Query Expansion
Search on a Source Language Collection
8
Benefits
  • By applying extracted translation relationships
    back to query translation
  • Make query translation and result document
    translation consistent with each other
  • Help future pre-translation query expansion
  • Tailor the query translation resources toward
    users current search
  • New translation alternatives can be introduced by
    TE
  • Potentially solve some out-of-vocabulary terms
    (OOV)
  • TE does not replace QE
  • they work at different steps of RF in CLIA
  • TE can help pre-translation QE
  • Maybe they can be combined?

9
Contributions of This Work
  • many related works on applying extracted
    translation relationships in improving CLIR
    effectiveness
  • Nie, Simard, Isabelle, Durand 99 used web mined
    parallel texts for CLIR
  • Xu, Weischedel and Nguyen 01 estimates
    translation probabilities based on a parallel
    corpus
  • Lavrenko, Choquette and Croft 02 describes a
    cross lingual relevance model that uses parallel
    corpus as one resource for translation
  • Our contributions are at
  • Studying methods for extracting translation
    relationships
  • Using extracted translation relationships from
    relevant returned documents pairs for enhancing
    query translation directly
  • Exploring the combination of TE and QE

10
Research Questions on TE
  • How to obtain relevant translation relationships?
  • How to enhance query translation with the
    relevant translation relationships?
  • Do it make sense to integrate TE with other RF
    methods?

11
Obtain Translation Relationships
  • Borrow ideas from mining on parallel corpus
  • Establish alignment at certain level
  • Best at word alignment level between docs and
    their translations
  • Minimum at sentence alignment level
  • When word alignment is available
  • Translations based on Word Alignment (TWA) train
    GIZA to obtain a word alignment model, and get
    word alignment from the model
  • When only sentence alignment is available
  • Keep All Translations (KAT) keep all the
    translation relationships of the query terms
    identified in the sentence pairs in relevant docs
  • Keep One Best Translation (K1T) based on KAT,
    but keep the one has the highest translation
    probability in the dictionary
  • Keep Most Frequent Translation (KFT) based on
    KAT, but keep the one has the highest frequency
    in the relevant doc

12
Obtain Translation Relationships without Word
Alignment
Dictionary
E1 F11, F1m1 E2 F21, F2m2 EnFn1, Fnmn
E1, E2, ,En
E1 E2 E2E1E1 E2 E1E1
F11 F21 F22F11F12 F22F11F11
D1
D1
E1 ? F11 (D1.4) E1 ? F12 (D1.1) E2 ? F21
(D1.1) E2 ? F22 (D1.2)
Stemming and back off strategy are used to
increase the finding of instances of query terms
and their translations inside the relevant docs
and their translation docs
KAT
E1 ? F11 (D1.4) E1 ? F12 (D1.1) E2 ? F21
(D1.1) E2 ? F22 (D1.2) E1 ? F11 (D2.4,)
K1T
E1 ? F11 (D1.4) E2 ? F21 (D1.1)
KFT
E1 ? F11 (D1.4) E2 ? F22 (D1.2)
13
Convert Extracted Relationships into Translation
Probability
  • Pi,j(j is trans of ij is in Rel) the
    probability of translation alternative j being
    the translation of term i, given that j is in the
    relevant documents set
  • tfj,k the frequency of j being extracted as the
    translation of i from the relevant document k
  • n all the relevant documents
  • mi all the translation alternatives of term i

14
Enhanced Translation Probability
  • Combine the translation probabilities obtained
    from relevant document set with that in the
    original dictionary
  • ? the parameter to adjust different weight of
    translation probability in relevant documents set
    and general dictionary
  • Normalization

15
Experiment Goals and Objectives
  • Is Translation Enhancement an effective RF
    method?
  • To test whether translation enhancement methods
    can improve CLIA in blind RF
  • Can Translation Enhancement be combined with
    other RF methods?
  • To test whether combining translation enhancement
    with query expansion can improve CLIA in blind RF
  • To test whether translation enhancement can
    improve CLIA in interactive RF (not discuss in
    this talk)
  • Is Translation Enhancement effective in real
    interactive search environment?

16
Experiment Resources
  • English to Chinese CLIR
  • English queries and Chinese documents
  • Preprocessing Tools
  • Stanford Chinese segmentation tool for Chinese
    documents
  • Porter stemmer for English queries and documents
  • an English and a Chinese stop word list
  • Collections
  • TDT4 and TDT5 Chinese collection (83,627
    documents)
  • TDT4 and TDT5 English MT collection (83,627
    documents)
  • TDT4 and TDT5 English collection (306,498
    documents)
  • Translation Resources
  • an English-Chinese bilingual lexicon with
    translation probabilities obtained from large
    parallel corpus Wang Oard 06
  • GIZA machine translation toolkit
  • Indri 2.4 search engine
  • Evaluation Metrics (TREC evaluation)
  • MAP Mean Average Precision

17
Query Types
  • Topics
  • 44 TDT4 and TDT5 English topics converted into
    TREC format
  • All topics manually translated into Chinese
  • Query (TREC format)
  • Title (short T queries)
  • Title Description (medium TD queries)
  • Title Description Narrative (long TDN
    queries)

18
Baselines
  • Monolingual Baseline
  • use Chinese queries to search on Chinese
    collection
  • Lower Cross-language Baseline
  • use English queries to cross language search on
    Chinese collection without any performance
    enhancement technique
  • cumulative probability threshold (CPT) from 0.0
    to 1.0 with an increment of 0.1 at each time,
    below display the one with the best MAP

19
Baselines - II
  • Higher Cross-lingual Baseline
  • Same as the low CL baseline, but with query
    expansion
  • Use default Indri Pseudo RF mechanism
  • use top 20 documents of the result rank list
  • top 20 terms are expanded
  • Relative weight between original query and
    expanded term are tuned for specific QE method
  • pre-translation query expansion
  • post-translation query expansion
  • combine pre and post translation query expansion

20
TE Methods vs Baselines
  • All four TE performed better than CL lower
    baseline
  • TWA improved the most, KAT improved the least
  • TWA significantly improved in all three query
    types
  • KFT significantly improved in T and TD query
    types
  • But only TWA achieved 93 of Mono Baseline at TDN

21
TE Methods vs QE Methods
  • Pre-QE performed the worst among QE methods
  • All TE methods are at least comparable to best QE
  • TWA outperforms best QE at TD and TDN
  • significant at TDN

22
TE and QE Combination
  • Combine TWA and Post-QE
  • Comparable to the State-of-Art CLIR performance
  • Significant over the single runs in almost all
    query types

? p 0.01, ? 0.01 lt p 0.05
23
TE in Resolving OOV Terms
  • Trough word alignment, some OOV terms can be
    resolved with high quality translations
  • 11 OOV terms found their translations through
    TWA, only 2 of them are wrong (indicated by )

24
Conclusion
  • Translation enhancement can improve CLIA in
    pseudo RF
  • Translation enhancement approach performs better
    in the process where human are involved in
    (discussed in the paper)
  • Translation enhancement can be combined with QE
  • TE and QE work on different part of RF process
  • Combination of them significantly improve the
    CLIR performance
  • Translation enhancement can help resolve out of
    vocabulary terms in query translation
  • The quality of resolving OOV is reasonable high
  • Future work
  • Extract translation relationships based on
    Statistical MT output, no word alignment needed
  • Better integration of TE and QE
  • Interactive translation enhancement

25
Thank you !
Write a Comment
User Comments (0)
About PowerShow.com