Multilingual Information Retrieval - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Multilingual Information Retrieval

Description:

Machine translation. Statistical methods. Dictionary / Thesaurus driven methods ... Machine Translation. Automatically generated translation of query and / or ... – PowerPoint PPT presentation

Number of Views:714
Avg rating:3.0/5.0
Slides: 16
Provided by: lill4
Category:

less

Transcript and Presenter's Notes

Title: Multilingual Information Retrieval


1
Multilingual Information Retrieval
  • by
  • Jeanine Lilleng
  • IDI, NTNU

2
What is information retrieval?
  • Information retrieval (IR) deals with the
  • representation, storage, organisation of, and
  • access to information items.
  • Baeza-Yates annd Ribeiro-Neto
  • in Modern Information Retrieval

3
Problem
  • The available amount of information is huge and
    ever increasing.
  • We have little experience with handling these
    huge amounts of information.
  • This information must be accessible, to be
    usable.
  • The technology for navigation these huge amounts
    of information is still quite immature.

4
Applications of information retrieval
  • Searching for information at the Internet
  • Searching for papers and books in digital
    libraries or normal libraries with digital
    inventory.
  • Any search in textual information.

5
Motivation for Multilingual Information Retrieval
  • The existing division of information due to
    language is artificial.
  • Norwegian bilingual research community
  • Still very immature technology.
  • Existing technology should be adapted to be used
    with Norwegian.

6
Multilingual information retrieval
  • IR in information expressed in more than one
    language.
  • IR in multilingual collections.

Collection Documents or books that have
something in common.
Multilingual Collection A collection with
content expressed in more than one language
7
MLIR solution
  • Translate query and / or documents. This
  • enables us to use traditional IR methods on
  • queries / documents.

8
Issues in Multilingual Information Retrieval
  • Takes more time to do the necessary processing.
  • Inaccuracies due to translations can cause
    problems.
  • Methods created to make information retrieval are
    mostly language dependent and only applicable in
    one language at the time.

9
Strategies for doing MLIR
  • Machine translation
  • Statistical methods
  • Dictionary / Thesaurus driven methods

Thesaurus (Treasury) An extended dictionary
including references between words and preferred
words to be used.
10
Machine Translation
  • Automatically generated translation of query and
    / or document.
  • Based on AI technology
  • ? Makes documents in foreign languages available
    for people not speaking the language
  • ? Expensive
  • ? Language dependent
  • ? Bi-lingual technology
  • ? Cultural differences and ambiguities can
    introduces errors

11
Statistical methods
  • Several probable translations are suggested with
    different probabilities.
  • Uses parallel corpuses to mine probable
    translations.
  • ? Methods are mostly language independent
  • ? Domain independent
  • ? Requires parallel corpuses
  • ? Computational expensive

12
Dictionary / Thesaurus driven methods
  • Translation is based on dictionaries and / or
    thesauri.
  • ? Computationally inexpensive method
  • ? Can capture / represent domain knowledge
  • ? Domain and language dependent
  • ? Expensive to create dictionaries / thesauri
  • ? Ambiguities are introduced when one word has
    several translations.

13
Combination of methods
  • Most current research is based on combination of
    the above mentioned strategies.
  • This makes sense due to the fact that different
    approaches have different shortcomings.
  • Recent results confirms this.

14
Aims in multilingual information retrieval
  • Adapt information retrieval techniques to
    multilingual information retrieval.
  • Create new methods, developed for multilingual
    information retrieval.

Simplify the searching process. Create new ways
to manage the ever increasing information
overflow.
15
My Thesis
  • Research on MLIR
  • Seen from a Norwegian perspective bilingual
    Norwegians.
  • Experiment with combination of approaches.
  • Preferably low cost, language independent
    methods.
Write a Comment
User Comments (0)
About PowerShow.com