Evaluation Framework for Information Retrieval in Indian Languages - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Evaluation Framework for Information Retrieval in Indian Languages

Description:

Evaluation Framework for Information Retrieval in Indian Languages: Mandar Mitra, ISI, Kolkata ... Indian Statistical Institute, Kolkata. Language/Language pair: ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 11
Provided by: tdilM
Category:

less

Transcript and Presenter's Notes

Title: Evaluation Framework for Information Retrieval in Indian Languages


1
  • Evaluation Framework for Information
    Retrieval in Indian Languages
  • Proposed by
  • Mandar Mitra
  • Institution
  • Indian Statistical Institute, Kolkata
  • Language/Language pair
  • Bengali, Hindi, English
  • Name the lexical resources that will be
    built
  • Bench Mark Data for CLIR Evaluation

2
Objective of the Project
  • TREC, CLEF and NTCIR have an enormous positive
    impact on serious IR research
  • Similar effort for Indian languages.
  • Lack of large-scale, standardized, benchmark
    datasets to conduct ILIR

3
IR Corpus
  • IR Corpus Documents set of sample queries
    Relevance judgments
  • Pooling method used at TREC will be used
  • different indexing units, e.g. n-grams, single
    words, and multi-word phrases
  • various term-weighting schemes, e.g. BM-25 and
    pivoted normalization
  • different retrieval models, e.g. vector space,
    language modeling, cover detection.

4
Lexical Resource
  • Language/Language pair Bengali / Hindi
  • Name the lexical resource News Corpora for
    Hindi and Bengali
  • Final size of the lexical resource 50,000 docs
    for each language
  • Average size of such a resource in other
    languages 16,000 to 1,00,000 docs

5
Lexical Resource
  • Language/Language pair Bengali / Hindi
  • Name the lexical resource topics (test
    queries)and corresponding relevance
    judgments.
  • Final size of the lexical resource 50 topics
    for each language
  • Average size of such a resource in other
    languages 50 topics

6
Lexical Resource
  • Language/Language pair Bengali / Hindi
  • Name the lexical resource Questions, along with
    corresponding Answers for evaluating
    Question-Answering systems.
  • Final size of the lexical resource 50 Questions
    for each language

7
Lexical Resource
  • Language/Language pair Bengali / Hindi
  • Name the lexical resource Manually tagged with
    named entities.
  • Final size of the lexical resource 1000 docs
    for each language

8
Lexical Resource
  • Language/Language pair Bengali / Hindi
  • Name the lexical resource Manually categorized
    following the convention used in the
    Reuters-21578 corpus.
  • Final size of the lexical resource 1500 docs
    for each language

9
Lexical Resource
  • Language/Language pair Bengali / Hindi/
    English
  • Name the lexical resource Tri-lingual
    comparable corpus
  • Final size of the lexical resource 1,000
    docs for each language

10
Lexical Resource
  • Language/Language pair Bengali / Hindi/
    English
  • Name the lexical resource Evaluation software
    for calculating commonly used evaluation
    metrics for given set of results.
  • Final size of the lexical resource N/A
Write a Comment
User Comments (0)
About PowerShow.com