IIT Bombay in ILILMT Consortium - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

IIT Bombay in ILILMT Consortium

Description:

IIT Bombay in IL-IL-MT Consortium. Multiway lexicon, Lexical Disambiguation etc.: Horizontals ... Given a text containing shikshaa in Hindi, one should hit ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 12
Provided by: profbhat
Category:

less

Transcript and Presenter's Notes

Title: IIT Bombay in ILILMT Consortium


1
IIT Bombay in IL-IL-MT Consortium
  • Multiway lexicon, Lexical Disambiguation etc.
    Horizontals
  • Marathi Vertical

2
Existing IITB Architecture for MT as specified in
the EoI
3
Experience gathered from existing work A
Multiway Lexicon Scenerio
shikshan
education (akoprocess)
education
shikkhaa
shikshaa
Disambiguating restriction
4
Lexical Disambiguation
  • Given a text containing shikshaa in Hindi, one
    should hit education(akoprocess)
  • Then pick up shikkhaa in Bengali and prashikshan
    in Marathi
  • Disambiguation algorithms with language dependent
    and language independent components needed

5
Ancillary processes
  • Root identification in the concerned language
    morphology analysis
  • Morph generation after transformation rule
    application and lexical substitution
  • Case marker/post position generation
  • Syntax planning

6
Evaluation Metric Determination
  • Automated as well as manual
  • N-gram matching (BLEU/NIST like)
  • Examine existing strategies IIITHs approaches
  • IL-IL texts expected to be well aligned
  • N-gram matching might work well

7
Existing Capability
  • MCIT funded project on Technology Development in
    Indian Languages (first phase completed)
  • World Bank funded Project on Development Gateway
    Foundation Language Technology Part
  • MCIT funded Media Lab Asia Project Meaning
    Based, Multilingual Search Engine in the
    Agricultural Domain- AGroExplorer
  • TCS funded project on Laboratory for Intelligent
    Internet Research

8
Relevant Language Resources and Tools
developed/under_development (1/2)
  • Morph Analysers for Marathi and Hindi
  • PoS taggers for Marathi and Hindi
  • 3-way lexicon for Marathi, Hindi and English in
    the agricultural domain
  • Concept dictionaries linked to Marathi, Hindi and
    English Words

9
Relevant Language Resources and Tools
developed/under_development (2/2)
  • WordNets for Hindi and Marathi
  • Limited Word sense disambiguator for Hindi using
    the Hindi WordNet
  • Limited Semantics Extractor for English
  • Syntax planners for Marathi and Hindi
  • Morphers for Marathi and Hindi

10
Existing Team
  • 5 PhD scholars
  • 6 M.Tech students
  • 8 B.Tech/B.E. students
  • 3 linguists
  • 7 lexicographers/lexiconists

11
Websites for Publications and Resources
  • www.cfilt.iitb.ac.in
  • www.cse.iitb.ac.in/pb
  • www.cse.iitb.ac.in/laiir
  • www.mlasia.iitb.ac.in
  • (please also see the publications list in the
    Expression of Interest Document of 15th February,
    2006)
Write a Comment
User Comments (0)
About PowerShow.com