SpeechtoSpeech MT CSTARNespoleLingWear - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

SpeechtoSpeech MT CSTARNespoleLingWear

Description:

Spoken dialogue is very different from written text: ... operatives in the field to assimilate forien language information they encounter ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 38
Provided by: AlonL
Category:

less

Transcript and Presenter's Notes

Title: SpeechtoSpeech MT CSTARNespoleLingWear


1
Speech-to-Speech MTC-STAR/Nespole!/LingWear
  • Lori Levin, Alon Lavie, Alex Waibel,
  • Bob Frederking, Tanja Schultz
  • LTI Immigration Course
  • August 24, 2001

2
Outline
  • Problems in Speech-to-Speech MT
  • The JANUS Approach
  • The Task-oriented Interlingua (IF)
  • System Design and Engineering
  • The C-STAR Nespole! And LingWear Projects
  • Open Problems, Current and Future Research

3
Issues in Speech Translation
  • Spoken dialogue is very different from written
    text
  • different linguistically syntax, constructions
  • contains unique phenomena repairs, hesitations,
    filled pauses
  • Speech Translation requires specialized
    approches
  • robust analysis
  • focus on communicative goals, semantics, rather
    than syntax

4
Our Speech Translation Approach
  • Translation via a task-oriented interlingua
    representation
  • Focus on large, well-defined domains
  • Robust analysis approaches
  • Semantic grammars
  • Modular grammar design
  • Incorporate alternative translation engines

5
The Travel Planning Domain
  • General Scenario
  • Dialogue between one traveler and a travel
    service provider (agent, hotel clerk, etc.)
  • Task oriented goal is to obtain information,
    reserve or purchase services related to travel
  • Free spontaneous speech

6
The Travel Planning Domain
  • Natural breakdown into several sub-domains
  • Hotel Information and Reservation
  • Transportation Information and Reservation
  • Information about Sights and Events
  • General Travel Information
  • Cross Domain

7
Semantic Grammars
  • Describe structure of semantic concepts instead
    of syntactic constituency of phrases
  • Well suited for task-oriented dialogue containing
    many fixed expressions
  • Appropriate for spoken language - often disfluent
    and syntactically ill-formed
  • Faster to develop reasonable coverage for limited
    domains

8
Semantic Grammars
  • Hotel Reservation Example
  • Input we have two hotels available
  • Parse Tree
  • give-informationavailabilityhotel
  • (we have hotel-type
  • (quantity (two)
  • hotel (hotels)
  • available)

9
HLT Server Architecture
10
HLT Server Architecture
11
Rule-based Translation Approach
12
The SOUP Parser
  • Specifically designed to parse spoken language
    using domain-specific semantic grammars
  • Robust - can skip over disfluencies in input
  • Stochastic - probabilistic CFG encoded as a
    collection of RTNs with arc probabilities
  • Top-Down - parses from top-level concepts of the
    grammar down to matching of terminals
  • Chart-based - dynamic matrix of parse DAGs
    indexed by start and end positions and head cat

13
The SOUP Parser
  • Supports parsing with large multiple domain
    grammars
  • Produces a lattice of parse analyses headed by
    top-level concepts
  • Disambiguation heuristics rank the analyses in
    the parse lattice and select a single best path
    through the lattice
  • Graphical grammar editor

14
SOUP Disambiguation Heuristics
  • Maximize coverage (of input)
  • Minimize number of parse trees (fragmentation)
  • Minimize number of parse tree nodes
  • Minimize the number of wild-card matches
  • Maximize the probability of parse trees
  • Find sequence of domain tags with maximal
    probability given the input words P(TW), where
    T t1,t2,,tn is a sequence of domain tags

15
Generation Modules
  • Two alternative generation modules
  • GenKit - unification-based generator augmented
    with Morphe morphology module - used for German
  • Top-Down context-free based generator - fast,
    used for English and Japanese

16
Translation with Multiple Domain Grammars
17
A SOUP Parse Lattice
18
Hybrid Stat/Rule-based Analysis
  • Developing large coverage semantic analysis
    grammars is time consuming ? difficult to port
    analysis system to new domains
  • low-level argument grammars are more
    domain-independent contain many concepts that
    are used across domains time, location, prices,
    etc.
  • high-level domain-actions are domain-specific,
    must be redeveloped for each new domain
    give-infoonsetsymptom
  • Tagging data sets with interlingua
    representations is less time consuming, needed
    anyway for system development

19
Hybrid Rule/Stat Approach
  • Combines grammar-based and statistical approaches
    to analysis
  • Develop semantic grammars for phrase-level
    arguments that are more portable to new domains
  • Use statistical machine learning techniques for
    classifying into domain-actions
  • Porting to a new domain requires
  • developing argument parse rules for new domain
  • tagging training set with domain-actions for new
    domain
  • training the classifiers for domain-actions on
    the tagged data

20
The Hybrid Analysis Process
  • Parse an utterance for arguments
  • Segment the utterance into sentences
  • Extract features from the utterance and the
    single best parse output
  • Use a learned classifier to identify the speech
    act
  • Use a learned classifier to identify the concept
    sequence
  • Combine into a full parse

21
Automatic Classification of Domain Actions
  • Train classifiers for speech acts and concepts
  • Training data Utterances labeled with speech
    act, concepts, and best argument parse
  • Input features
  • n most common words
  • Arguments and pseudo-arguments in best parse
  • Speaker
  • Predicted speech act (for concept classifier)

22
Argument Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) ) )
23
Full Parse Example
We have a double room available for you at
twenty-three thousand five hundred
yen give-informationavailabilityroom
( availabilityPSD ( we have
super_room-type ( room-type ( a
roomdouble ( double room ) ) ) available
) arg-partyfor-whomARG ( for you ( you )
) argtimeARG ( point ( at
hour-minute ( bighour ( big23 (
twenty-three ) ) ) ) ) argsuper_priceARG (
price ( one-pricemain-quantity (
n-1000 ( thousand ) pricen-100 ( five
hundred ) ) currency ( yen ( yen ) ) )
) )
24
Classification Results UsingMemory-based (TiMBL)
Classifiers
25
Alternative Approaches MEMT
  • Glossary-based Translation
  • Translates directly into target language (no IF)
  • Based on Pangloss translation system developed at
    CMU
  • Uses a combination of EBMT, phrase glossaries and
    a bilingual dictionary
  • Good fall-back for uncovered utterances

26
C-STAR-III
  • Partners ATR, CMU, CLIPS, ETRI, IRST, UKA
  • Main Research Goals
  • Expandability - towards unlimited domains
  • Accessibility - Speech Translation over wireless
    phone
  • Usability - real service for real users

27
  • Speech-to-speech translation for eCommerce
  • CMU, Karlsruhe, IRST, CLIPS, 2 commercial
    partners
  • Improved limited-domain speech translation
  • Experiment with multimodality and with MEMT
  • EU-side has strict scheduling and deliverables
  • First test domain Italian travel agency
  • Second showcase international Help desk
  • Tied in to CSTAR-III

28
LingWear for the Information Warrior
  • New Ideas
  • The pre-development of appropriate interlingua
    representations for domains of interest
    facilitates generation into a new language within
    two weeks.
  • The development of new MT engines (e.g.
    learnable transfer rules) and improved
    multi-engine integration supports rapid
    deployment of MT for a new language with scarce
    resources.
  • Gisting and summarzation in the source language
    followed by MT is better than vice versa.
  • Impact
  • Allow military and relief organizations to
    converse in limited domains of interest with the
    local population in an area of conflict and/or
    disaster
  • Allow military and other operatives in the field
    to assimilate forien language information they
    encounter on-the-move
  • Rapidly port and deploy the technology into new
    languages with scarce resources

Schedule
Port to second language
Baseline summarizer ready
Baseline MT systems ready
Port to third language
Carnegie Mellon University School of Computer
Science A.Waibel, L. Levin, A. Lavie, R.
Frederking
29
Domain Portability Travel to Medical
Knowledge-Based Methods Re-usability of knowledge
sources for translation and speech recognition
Corpus-Based Methods Reduce the amount of new
training data for translation and speech
recognition
30
Portability
  • Advantage Interlingua
  • Problem Writing semantic grammars
  • Domain dependent
  • Requires time, effort, and expertise
  • Approach
  • Grammar modularity
  • Domain action learning
  • Automatic/Interactive semantic grammar induction

31
Automatic Induction of Semantic Grammars
  • Seed grammar for a new domain has very limited
    coverage
  • Corpus of development data tagged with
    interlingua representations available
  • Expand the seed grammar by learning new rules for
    covering the same domain-actions
  • First step how well can we do with no human
    intervention?

32
System Evaluation Methodology
  • End-to-end evaluations conducted at the SDU
    (sentence) level
  • Multiple bilingual graders compare the input with
    translated output and assign a grade of Perfect,
    OK or Bad
  • OK meaning of SDU comes across
  • Perfect OK fluent output
  • Bad translation incomplete or incorrect

33
C-STAR 1999 Evaluation Results
34
Evaluation - Progress Over Time
35
Current and Future Research
  • Expanding the domains of coverage
  • Machine Learning-based approaches to analysis
    hybrid rule/stat analysis approach, grammar
    induction
  • Multiple interfaces web, phone, PDAs
  • Integration of multiple MT approaches into a MEMT
    system
  • Disambiguation improved sentence-level
    disambiguation applying discourse contextual
    information for disambiguation

36
Students Working on the Project
  • Chad Langley Hybrid Rule/Stat analyzer
  • Benjamin Han Grammar Induction
  • Stan Jou Phone interfaces and recognizer
  • Alicia Tribble Language portability
  • Kornel Laskowski H323 Speech Recognizer

37
The C-STAR/Nespole!/LingWear Team
  • Project Leaders Lori Levin, Alon Lavie, Alex
    Waibel, Bob Frederking, Tanja Schultz
  • Grammar and Component Developers Donna
    Gates, Dorcas Wallace, Kay Peterson, Chad
    Langley, Benjamin Han, Alicia Tribble, Kornel
    Laskowski, Stan Jou, Celine Morel, Susie Burger
Write a Comment
User Comments (0)
About PowerShow.com