A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System - PowerPoint PPT Presentation

About This Presentation
Title:

A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System

Description:

Roldano Cattoni, ITC-irst. Erica Costantini, University of Trieste. July 8, 2002 ... Each site should verify that most up-to-date results are being reported ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 20
Provided by: AlonL
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System


1
A Multi-Perspective Evaluation of the NESPOLE!
Speech-to-Speech Translation System
  • Alon Lavie, Carnegie Mellon University
  • Florian Metze, University of Karlsruhe
  • Roldano Cattoni, ITC-irst
  • Erica Costantini, University of Trieste

2
Outline
  • The NESPOLE! Project
  • Approach and System Architecture
  • Performance and Usability Challenges
  • Distributed real-time performance over internet
  • Integration and use of multi-modal capabilities
  • End-to-end Translation performance
  • Lessons learned and conclusions

3
  • Speech-to-speech translation for E-Commerce
    applications
  • Partners CMU, Univ of Karlsruhe, ITC-irst,
    UJF-CLIPS, AETHRA, APT-Trentino
  • Builds on successful collaboration within C-STAR
  • Improved limited-domain speech translation
  • Experiment with multimodality and with MEMT
  • Showcase-1 Travel and Tourism in Trentino,
    completed in Nov-2001, demonstrated
  • Showcase-2 expanded travel medical service

4
Speech-to-speech in E-commerce
  • Replace current passive web E-commerce with live
    interaction capabilities
  • Client starts via web, can easily connect to
    agent for specific information
  • Thin client - very little special hardware and
    software on client PC browser, MS Netmeeting,
    Shared Whiteboard

5
NESPOLE! User Interfaces
6
NESPOLE! Architecture
7
Distributed S2S Translation over the Internet
8
Network Traffic Impact
9
NESPOLE! Monitor
10
Aethra Whiteboard
11
Recent Developments Apr-02
  • Improved analysis and generation grammars (using
    old C-STAR data)
  • Improved SR engines
  • Packet-loss, video, and modem connection tests
  • Data Collection for Showcase 2A
  • Evaluation Scheme Experiment
  • Paper and Demo at HLT-02
  • Paper submissions to ACL-02, ICSLP-02, ESSLLI-02

12
IF Status Report
  • Presented by Donna Gates

13
WP5 HLT Modules
  • Data Collection for Showcase-2A completed in
    February-2002
  • Status of transcriptions from all sites?
  • CMU will maintain a data repository (Alon
    collecting all data CDs here)
  • IF discussions and development have already
    started (Donna)
  • Development Schedule?

14
WP7 Evaluation
  • D9 Evaluation of Showcase-1 Report draft
    circulated earlier this week
  • Each site should verify that most up-to-date
    results are being reported
  • Include detailed tables in the report?
  • Majority vote finalize a common procedure
  • New evaluation experiments

15
Majority Vote Scheme
  • Issue did all sites use same guidelines?
  • What to do when there is no majority?
  • i.e. 4 graders assign P/P/K/K
  • What to do when there is complete disagreement?
  • i.e. 3 graders assign P/K/B
  • Need to recalculate scores from prev evaluation?

16
New Evaluation Experiments
  • We are investigating three main issues
  • Binary versus 3-way grading
  • Majority vote versus averaging of scores
  • Intercoder and Intracoder agreement
  • Grading Experiment
  • Four groups, three graders in each group
  • Each group grades two sets, two weeks apart
  • Sets are different but have a common large
    overlap
  • Groups differ in eval scheme used (binary/3-way)

17
Planned Analysis of Data
  • Compare results across grading schemes (binary
    vs. 3-way) on same set of data
  • Compare majority scores with average scores
  • Evaluate Intercoder agreement between graders (on
    same set and same scheme)
  • Evaluate Intracoder agreement of same grader (on
    overlap data in the two sets, same grading scheme
    in both sessions)

18
Preliminary Results
Group(procedure) W1 Acc W1 Acc W1 Bad W1 Bad W2 Acc W2 Acc W2 Bad
Gr1 (binary/3-way) 50.2 49.8 49.8 48.7 48.7 51.3 51.3
Gr2 (3-way/binary) 52.4 47.6 47.6 48.8 48.8 51.2 51.2
Gr3 (3-way/3-way) 53.8 46.2 46.2 54.9 54.9 45.1 45.1
Gr4 (binary/binary) 49.0 51.0 51.0 50.0 50.0 50.0 50.0
19
Plans for Final Evaluations
  • Improved end-to-end evaluations
  • Additional component evaluations?
  • Additional user studies?
  • How do we evaluate user interfaces, communication
    effectiveness?
Write a Comment
User Comments (0)
About PowerShow.com