Agenda - PowerPoint PPT Presentation

Loading...

PPT – Agenda PowerPoint presentation | free to view - id: f723f-NTA2Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Agenda

Description:

Binaural. Head. Janus English. Close N-gram. Close CFG. Janus Spanish. Close. Janus Cat. Close ... Binaural. Head. Janus Cat. Dist. Janus Spa. Dist. Janus ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 22
Provided by: i13p
Category:
Tags: agenda | binaural

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Agenda


1
Agenda
Monday
Tuesday
Wednesday
900 1200 planning next steps (esp. review)
900 1200 ITC, UPC demo, SonyUKA
WP10 Towards Lab/Museum Demo
1300 1530 suggestions/discussion on
demonstrator type(s) and location(s)
1400 1500 project status and administrative
items
1530 1830 reports on work done (from all
sites)
1600 1800 discussion / decision on
demonstrator type(s) and location(s)
1900 … dinner
2
Status of the Project
3
Deliverable Status (now 23)
 
Deliverable
Deliverable
due
due
9.1
Catalan Wordnet
12
5.1
Software for robust speech rec. and speaker
localisation / ID under reverberant/noisy condit.
24
10.2
Initial Dialog/Discourse Model Software Component
15
6.3
Software to adapt acoustic and Lang. model to
current context
6.2
Conversational speech recog. optimised with data
from distant-speaking microphones
24
18
7.2
Description of the interface to the conversation
context model program for selecting appropr. mode
of presentation
24
3.2
Additional Development Test Data Package and
update of test-bed software
24
2.2, 10.3 7.3, 9.2 4.3, 11.2 8.2
30 32 35 39
4.2
Automatic selection of video presentation based
on communication activity
24
4
Project Review
  • probably end of October (week of 27th)
  • as part of greater review event together with
    other projects
  • same style as last year
  • probably same location as last year
  • same reviewers will be asked to do the review
    again (Jean-Marc Langé, Christian Wellekens)

5
Options for Demonstrator
  • Place/Event Duration Domain
  • a) Forum 2004 6 Weeks Forum Visitor Service
  • Seminar at ACL 1-x Days any
  • UPC 2003 1-x Days any
  • Museum Grenoble ?
  • Museum KA (ZKM) 1 Weeks
  • Lab INPG and/or UJF 1 Day
  • Lab Karlsruhe 1 Day
  • Lab Sony 1 Day

preferably Barcelona
Museum Visitor Service
Lab Visitor Service
6
Options Pros and Cons
Place/Event Pro Con a) Forum
2004 Visibility, Contract Duration,
Forum, Data, Cost b) Seminar at ACL at
Forum, Languages, Duration, Domain (any) c)
UPC 2003 Languages, Duration, Visibility Do
main, Organisation, Cost d) Museum
Grenoble Experience, Data e) Museum KA
(ZKM) Cooperation Languages f) g) h) Labs
at INPG,UJF,UKA,Sony Organisation Languages,
Visibility
7
Personnel at Karlsruhe
  • Ivica Rogina will leave by the end of
    September (due to legislation)
  • core Team will be
  • Petra Gieselmann (Dialog)
  • Matthias Wölfel (Speech, Integration)
  • Hartwig Holzapfel (Dialog, Integration)
  • Tobias Kluge (Room, Integration)
  • others
  • Alex Waibel

8
OAA Integration Status ltMay 2003
Voice Input Close
Binaural Head
Janus English Close N-gram
Janus Spanish Close
Camera Man
Close CFG
Janus Cat. Close
Testimony Tracker
Focus of Attention
available and integrated
available, not integrated
Augmented Table
Information Retrieval
Dialog
in development
partially done
Translation
Room
9
OAA Integration
Voice Input Distant
Voice Input Close
Binaural Head
1
1
1
14
12
Janus English Distant
Janus English Close N-gram
Janus Spanish Close
Camera Man
Janus Spa. Dist.
Close CFG
Janus Cat. Close
Janus Cat. Dist.
18
4
9b
11
3
9a
Testimony Tracker
Focus of Attention
Topic Detection
10
5
6a
8
6b
13b
7a
Augmented Table
Information Retrieval
Dialog
Agent
7b
13a
17a
15
17b
Message
Topics
Testimonies
Translation
Room
Data
10
OAA Integration Status May 2003
Voice Input Distant
Voice Input Close
Binaural Head
1
1
1
14
12
Janus English Distant
Janus English Close N-gram
Janus Spanish Close
Camera Man
Janus Spa. Dist.
Close CFG
Janus Cat. Close
Janus Cat. Dist.
18
4
9b
11
3
9a
Testimony Tracker
Focus of Attention
Topic Detection
available and integrated
10
5
6a
8
6b
available, not integrated
13b
7a
Augmented Table
Information Retrieval
Dialog
7b
13a
17a
in development
15
17b
Topics
Testimonies
partially done
Translation
Room
planned
11
OAA Integration Status Sept. 2003
Voice Input Distant
Voice Input Close
Binaural Head
1
1
1
14
12
Janus English Distant
Janus English Close N-gram
Janus Spanish Close
Camera Man
Janus Spa. Dist.
Close CFG
Janus Cat. Close
Janus Cat. Dist.
18
4
9b
11
3
9a
Testimony Tracker
Focus of Attention
Topic Detection
available and integrated
10
5
6a
8
6b
available, not integrated
13b
7a
Augmented Table
Information Retrieval
Dialog
7b
13a
17a
in development
15
17b
Topics
Testimonies
partially done
Translation
Room
planned
12
Topic Spotting Experimental Environment
Segmenter
Audio Stream
1-1½m
Distant English Recogniser
OAA
Topic Detector
  • seven topics (from switchboard evaluation) were
    offered
  • fully spontaneous multi-human conversation

13
Speech Recognizer
  • As presented at ARPA RT-03 Workshop, Boston
  • trained on 265h Switchboard and
    Call-Home-English data
  • no model adaptation (no MLLR or VTLN)
  • linear feature space adaptation, semi-tied full
    covariances
  • fully continuous models (10k codebooks, 50k
    mixtures)
  • vocabulary 41k from Switchboard Broadcast
    News CNN
  • language model 3gram SWB 5gram class SWB
    4gram BN 4gram CNN
  • performance on SWB-evaluation 2003 23.4 WER

14
First Experimental Results
correctly recognized / incorrectly inserted
lassie movies movie pavillion people probably
volume old stuff try system see puts lassie pets
movies nice dinosaurs count pets permit movies
source ten talk years cut taxes cinema watching
matrix mexico matrix old movie rather drugs two
matrix metrics two came year movie theater
tuesday promotion mad fold matrix three saturday
watch wheel rider little legalize bit bizarre
matrix computer game screen saver played
played mall seen sprint playing game play like
movie screen thing first rest person shooter
right three around kill all exactly smoking guns
right actually read statistics many germans
jumpes smoke almost forty percent germans jobs
smoke decreasing days cigarette tax went idea
suppose case especially kids girls still
human percentage men higher among olders mobiles
switch topics terrorists smoke persons world
trade center towers technicalities looked like
smoking cigarettes sponsored stuff lucky strike
trying funny show terrorism alzheimers
politically incorrect topic exhibit like pictures
exhibitions parallels construct warhol
museum mothers desctruction cook photograps
paintings certainly paintings warhol warhol topic
talking health help fitness exercise regularly
went going gym right true actually playing
volleyball really healthy fitness fun fun phone
fitness studio swimming sitting thomas schaaf
sharks around show sure education
depiction computers education asking question
computers harm morning improve education heard
c.m.u. developed system helps children read truth
good really fun learn lessons read specifically
insist agree american style reading claim tested
high schools three old children high school
children johanna oceans never sure right probably
testing see kindergarten kindergarten right
remember legitimizing ended guy wall arpa
awful conference coincidence said end america
like five percent analphabets cause closed damage
like means hundred million dollars guys per year
educational program reduce beans used damage
hundred billion dollars probably hundred swimming
pool internet like anybody hungry restaurant know
couple restaurants days italian need chinese
mensa mensa already morning italian restaurant
sparkling chinese better mexican food quite right
requirement true two weeks really read frequent
good spicy spicy
15
Statistics on First Experiment
Testset 729 words ( many noises) Content-Wor
ds 262 words Topic-Indicators 124
words Correctly recognised 54 content-words
(20.6) Missed 208 content-words
(79.4) Inserted 56 content-words Correctly
recognised 29 topic-indicators
(23.4) Missed 95 topic-indicators
(76.6) Inserted 9 topic-indicators
16
Problems Identified
  • wider range of signal energy gt difficulties
    with segmentation of speech - some utterances
    were missed completey - laughter and other
    noises of close people can have much more
    power than actual speech from distand people
  • gt difficulties with signal adaptation
  • so far very poor recognition accuracy
  • when talking about topic A many triggers for
    topic B are uttered smoking ! (cigarette) tax,
    Lassie-Movies ! pets, etc.

17
Room for Improvements
  • statistics on phrases
  • wordnet create synonyms
  • tfidf-statistics on available topics
  • quality of speech recognizer (model and signal
    adaptation)
  • adaptive speech segmenter
  • improved signal quality (distant signal
    binaural processing)
  • adapt vocabulary and language model to set of
    topics

18
Improvements to the Lecture Tracker
  • optimised speed to 0.2?realtime (P4 2GHz)
  • optimised automatic segmenter (min/max segment
    sizes)
  • added laser-pointer interaction functionality
  • stable pointing (projected finger)
  • highlighting of selected presentation items
  • drawing into presentation
  • operating pull-down menus within presentation

19
The ISL Lecture-Talk-Presentation Corpus
41 Lectures, talks, presentations (29 hours of
recorded audio) - recorded with lavaliere close
microphone - plus video if possible - plus slides
if available
- Transcription on word level disfluencies
spont. events
- Topics Speech Technology (ASR, IR, NLP, TTS,
Multimodality)
  • Native speakers of English
  • Non-native speakers of English with accent

20
Lecture-Talk-Presentation Corpus
7 Lectures CMU faculty, ave
duration73 min, stdev 9min
23 Presentations Student presentations
ave duration 29 min, stdev 19 min
11 Talks Guest speakers and invited talks
ave duration 52 min, stdev
20 min
21
Lecture-Talk-Presentation Corpus
  • Transcription - first pass, second pass,
    final check
  • segmented in turns or logical paragraphs
  • contributions from the audience are
    transcribed as long as they were understandable
  • transcription conventions similar to
    Verbmobil conventions

Susi Burger Send us your data CMU/ISL offers
to transcribe English data recorded at partner
locations
About PowerShow.com