Overview to ICMI02 International Conference on Multimodal Interfaces 02 - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

Overview to ICMI02 International Conference on Multimodal Interfaces 02

Description:

Hiroshi Ishii, 'Tangible Bits: Designing the Seamless Interface between People, Bits, ... When should synthetic and veridical aspects of interfaces be mixed? ... – PowerPoint PPT presentation

Number of Views:50

Avg rating:3.0/5.0

Slides: 34

Provided by: michaelchr4

Category:

more less

Transcript and Presenter's Notes

Title: Overview to ICMI02 International Conference on Multimodal Interfaces 02

1
Overview to ICMI02 (International Conference on
Multimodal Interfaces 02)

Rong Yan

2
General Information

Official Website http//www.is.cs.cmu.edu/icmi/
Submission results

3
Paper Submission
Total 165 submissions
4
Paper Acceptance
Total 87 papers in Proceedings
5
Attendees
About 150 attendees
6
Conference Outline
Sensors, Tools and Platforms
Speech Generation/Recognition
Speech/Text
Dialogue/Language Understanding
ICMI 02
Translation/Multilingual Interface
Signing, Gesturing, Writing
Vision
Gaze Tracking and Lipreading
Face Detection/Recognition
Application, User study, Evaluation
7
Keynote Speaker

Three keynote speakers
Hiroshi Ishii, Director of Tangible Media Lab,
MIT
Lucas Parra, Sarnoff Corporation
Clifford Nass, Stanford University

8
Keynote Talk I

Hiroshi Ishii, Tangible Bits Designing the
Seamless Interface between People, Bits, and
Atoms
Goal change the "painted bits" of GUIs
(Graphical User Interfaces) to "tangible bits
(Physical User Interface)

9
Keynote Talk I

Tangible user interfaces employ physical
objects, surfaces, and spaces as tangible digital
information

Three key concepts
Interactive Surfaces
Surfaces(e.g. walls, desktops) become active
interface
Coupling of Bits and Atoms
Seamless coupling of graspable objects (e.g.
books , cards)
Ambient Media
Use of ambient media(e.g. sound, light) as
background interface

10
Keynote Talk I

Demo Movie

11
Keynote Talk II

Lucas Parra, Noninvasive Brain Computer
Interfaces for Rehabilitation and Augmentation
Brain Computer Interfaces
reading information directly from the brain,
instead of typing, writing and pointing

12
Keynote Talk II

Advance non-invasive brain imaging
Avoid exposure to the radiation of X-rays
Application
Rehabilitation research (two decades before)
Augment HCI (now)

13
Keynote Talk III

Clifford Nass, Integrating Multiple Modalities
Psychology and Design of Multimodal Interfaces
Question How USERS integrate the modalities and
content? (social psychology experiments)
When should synthetic and veridical aspects of
interfaces be mixed?
What is the link between visual appearance and
voice characteristics of pictorial agents?
How should interfaces respond to
misunderstandings (a common problem in multimodal
interfaces)?
How do multi-modal interfaces manifest gender,
personality, and emotion?

14
Keynote Talk III

An example for their study results
When integrating voice and face, they should be
consistent
Non-native speakers face Non-native speakers
voice is better than Non-native speakers face
native speakers voice
Cant internationalize website or agent by only
manipulating one dimension
Nass Slides

15
Conference Outline
Sensors, Tools and Platforms
Speech Generation/Recognition
Speech/Text
Dialogue/Language Understanding
ICMI 02
Translation/Multilingual Interface
Signing, Gesturing, Writing
Vision
Gaze Tracking and Lipreading
Face Detection/Recognition
Application, User study, Evaluation
16
Speech Recognition/Generation

Covariance-tied Clustering in Speaker
Identification (Zhiqiang Wang, etc.)
Problem Description
Speaker Identification GMM model EM algorithm
(Suffers from Local Maxima)
Solution K-Means clustering -gt better initial
model
Focus How to provide more robust initial models
using K-Means clustering

17
Speech Recognition/Generation

Euclidean Distance
Most widely used distance for K-Means clustering
No consideration for point distribution
Mahalanobis Distance
Weights the distance by covariance matrix
Sphere the data and define better distance
But, data sparseness has to be solved
More parameters to estimate (The covariance
matrix)

18
Speech Recognition/Generation

Layered Clustering Algorithm
Train one covariance matrix for all speech data
Use k-Means algorithm based on Mahalanobis
distance to split data into two clusters
Get two new covariance matrices
Iterate 2,3 until it has enough clusters

19
Speech Recognition/Generation

Speaker Database (83 male, 83 female)
26-dimensional MFCC vector
64 mixture GMMs
Error Rate
The number represent the layers of clustering

20
Conference Outline
Sensors, Tools and Platforms
Speech Generation/Recognition
Speech/Text
Dialogue/Language Understanding
ICMI 02
Translation/Multilingual Interface
Signing, Gesturing, Writing
Vision
Gaze Tracking and Lipreading
Face Detection/Recognition
Application, User study, Evaluation
21
Gesturing

Prosody-based Co-analysis for continuous
Recognition of Coverbal Gestures (Sanshzar etc.)
Motivation Better ways to recognize gesture
Previous
Combination of speech and gesture to boost
classification (Semantically motivated)
This work
Fuse more elementary features with gesture
features Prosodic feature, including Fundamental
frequency (F0) contour and voiceless
intervals(pause)
Construct co-occurrence model

22
Gesturing

Results
Correct Recognition Rate 81.8 (vs 72.4 with
only visual feature)
Deletion Error 8.6 (vs 16.1)
Substitution Error 5.8 (vs. 9.2)
Comment Coanalysis for audio-visual features
will be helpful, such as the monologue detection
in TREC.

23
Conference Outline
Sensors, Tools and Platforms
Speech Generation/Recognition
Speech/Text
Dialogue/Language Understanding
ICMI 02
Translation/Multilingual Interface
Signing, Gesturing, Writing
Vision
Gaze Tracking and Lipreading
Face Detection/Recognition
Application, User study, Evaluation
24
Multilingual Interface

Improved Named Entity Translation and Bilingual
Named Entity Extraction ( Fei Huang, Stephen
Vogel)
Motivation Improve the named entity annotation
quality with bilingual corpus information
Basic Idea
Cross-lingual information may be hints to correct
named entity extraction error in baseline system
Extracted named entity with high alignment cost
tends to be wrong

25
Multilingual Interface

Proposed NE annotation scheme
Annotate the bilingual corpus separately, using
BBNs IdentiFinder
Compute Augmented Sentence Alignment Cost(ASAC)
on baseline annotation
Find all the possible NEs from corpus
Using a greedy approximation algorithm to find
alignment with minimal ASAC, if the cost are less
than baseline algorithm, accept the alignment.
Tag the unaligned NE with most frequent type

26
Multilingual Interface

Results
Comments Cross-lingual relation contains useful
information

27
Conference Outline
Sensors, Tools and Platforms
Speech Generation/Recognition
Speech/Text
Dialogue/Language Understanding
ICMI 02
Translation/Multilingual Interface
Signing, Gesturing, Writing
Vision
Gaze Tracking and Lipreading
Face Detection/Recognition
Application, User study, Evaluation
28
Sensors, Tools and Platforms

Audiovisual arrays for untethered spoken
interfaces (Kenvin Wilson, Trevor Darrell, etc. )
Motivation
When faced with a distant speaker at a known
location, microphone array can improve speech
recognition
Estimating the location of a speaker in
reverberant environment is difficult, so use a
video camera array to aid localization
A audio-visual array approach to tracking speaker

29
Sensors, Tools and Platforms

Two array processing problem
Beamforming spatial filtering for speech signal
Amplify the signals coming from the selected
region by adding/filtering from the array
Source Localization estimate location of signal
source
One way beamform for all the location and choose
the strongest location (large computation cost)
Another way Using the delay between arrays to
calculate location
These problems are complementary to each other

30
Sensors, Tools and Platforms

Person tracking with multiple stereo views
To aid the source localizaition
Beamforming only the audio data
Source Localization audio video
Process
Vision Tracker initial guess for the location,
accurate within less than one meter
Beam Power gradient descent search for a local
maxima

31
Sensors, Tools and Platforms