The Hindi Surprise Language Exercise - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

The Hindi Surprise Language Exercise

Description:

A department in the School of Computer Science ... Add linguistic information to dictionaries: e.g., whether nouns are masculine or feminine. ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 14
Provided by: loril8
Category:

less

Transcript and Presenter's Notes

Title: The Hindi Surprise Language Exercise


1
The Hindi Surprise Language Exercise
  • Language Technologies Institute
  • Carnegie Mellon University

2
Agenda
  • Overview of the DARPA TIDES project
  • What is a surprise language exercise
  • Quick intro to Machine Translation
  • List of job tasks

3
Language Technologies Institute
  • A department in the School of Computer Science
  • Language Technologies translating one natural
    language to another, speech recognition,
    information retrieval, information summarization,
    automatic question answering
  • We have courses that undergraduates can take, a
    masters degree program, and a Ph.D. program.

4
Overview of the DARPA TIDES Project
  • DARPA Defense Advanced Research Projects Agency
  • Defense and non-defense research ARPA Net was
    the pre-cursor of the internet
  • TIDES Translingual Informantion Detection
    Extraction and Summarization

5
Overview of the DARPA TIDES Project
  • Make news sources in other languages available to
    English speakers
  • Automatically translate
  • Automatically extract specific facts
  • Automatically summarize
  • How well does it work?
  • Promising so far
  • Participants
  • Over 20 universities, government contractors, and
    government agencies
  • Probably between 100 and 200 people across the US

6
What is a surprise language exercise?
  • Commercially available language technologies
    (such as machine translation) take many years to
    develop.
  • What if you needed language technologies for a
    new language quickly (in a matter of days or
    weeks)?
  • The surprise language is a test of the TIDES
    research to see if an translation, extraction,
    and summarization can be built in one month for a
    new language.

7
Surprise Language Timeline
  • June 2, a.m., The name of the surprise language
    is announced Hindi
  • June 30 Exercise ends.

8
Quick intro to machine translation
  • Machine Translation automatically translating
    from one human language to another human language
  • The ideal system
  • High quality as good as a human translator
  • Broad coverage any topic, any type of text or
    speech
  • There is no ideal system currently

9
Tradeoffs based on the task
  • Dissemination you want people to read something
    that you wrote (e.g., a users manual) requires
    high quality, but not broad coverage
  • Conversation medium quality is ok. People can
    compensate. Conversational systems are currently
    do not have broad coverage.
  • Assimilation you want to gather information
    from documents in another language broad
    coverage is critical gisting quality is
    acceptable.

10
Methods for Machine Translation
  • Humans write translation rules
  • English be NOUN (e.g., be a teacher)
  • Hindi NOUN he
  • Example-Based Machine Translation
  • Rules are learned automatically from parallel
    corpora.
  • Statistical Machine Translation
  • Probabilities are learned automatically from
    parallel corpora.

11
What is a parallel corpus?
  • The same text in two languages.
  • Corresponding sentences are aligned.
  • EBMT and SMT require huge amounts of parallel
    text millions of words long.

12
Job Tasks
  • For someone who is not a linguist or a
    programmer
  • Align sentences in parallel text
  • Make sure that parallel texts are really parallel
  • Align words in parallel sentences
  • Translate sentences
  • Add linguistic information to dictionaries e.g.,
    whether nouns are masculine or feminine.
  • Check the quality of the translations that the
    system produces
  • For someone who is a linguist Work with Lori
    and some graduate students to identify rules.
  • For someone who is a programmer
  • Help run alignment algorithms
  • Help to automatically identify and normalize
    temporal expressions (e.g., last year, on
    Tuesday, in 1993)
  • Help to automatically identify and normalize
    proper names, etc.

13
To get employed
  • Go to the payroll office at 407 S. Craig St.
  • Ask to fill out an I9 form.
  • Bring a copy of the I9 form to Cheryl Webber
    (assistant business manager of LTI)
  • Or fax a copy of the I9 form to Cheryl Webber at
    312-268-6298.
  • Fill out an information sheet.
Write a Comment
User Comments (0)
About PowerShow.com