Overview to ICMI02 International Conference on Multimodal Interfaces 02 - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Overview to ICMI02 International Conference on Multimodal Interfaces 02

Description:

Hiroshi Ishii, 'Tangible Bits: Designing the Seamless Interface between People, Bits, ... When should synthetic and veridical aspects of interfaces be mixed? ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 34
Provided by: michaelchr4
Category:

less

Transcript and Presenter's Notes

Title: Overview to ICMI02 International Conference on Multimodal Interfaces 02


1
Overview to ICMI02 (International Conference on
Multimodal Interfaces 02)
  • Rong Yan

2
General Information
  • Official Website http//www.is.cs.cmu.edu/icmi/
  • Submission results

3
Paper Submission
Total 165 submissions
4
Paper Acceptance
Total 87 papers in Proceedings
5
Attendees
About 150 attendees
6
Conference Outline
Sensors, Tools and Platforms
Speech Generation/Recognition
Speech/Text
Dialogue/Language Understanding
ICMI 02
Translation/Multilingual Interface
Signing, Gesturing, Writing
Vision
Gaze Tracking and Lipreading
Face Detection/Recognition
Application, User study, Evaluation
7
Keynote Speaker
  • Three keynote speakers
  • Hiroshi Ishii, Director of Tangible Media Lab,
    MIT
  • Lucas Parra, Sarnoff Corporation
  • Clifford Nass, Stanford University

8
Keynote Talk I
  • Hiroshi Ishii, Tangible Bits Designing the
    Seamless Interface between People, Bits, and
    Atoms
  • Goal change the "painted bits" of GUIs
    (Graphical User Interfaces) to "tangible bits
    (Physical User Interface)

9
Keynote Talk I
  • Tangible user interfaces employ physical
    objects, surfaces, and spaces as tangible digital
    information
  • Three key concepts
  • Interactive Surfaces
  • Surfaces(e.g. walls, desktops) become active
    interface
  • Coupling of Bits and Atoms
  • Seamless coupling of graspable objects (e.g.
    books , cards)
  • Ambient Media
  • Use of ambient media(e.g. sound, light) as
    background interface

10
Keynote Talk I
  • Demo Movie

11
Keynote Talk II
  • Lucas Parra, Noninvasive Brain Computer
    Interfaces for Rehabilitation and Augmentation
  • Brain Computer Interfaces
  • reading information directly from the brain,
    instead of typing, writing and pointing

12
Keynote Talk II
  • Advance non-invasive brain imaging
  • Avoid exposure to the radiation of X-rays
  • Application
  • Rehabilitation research (two decades before)
  • Augment HCI (now)

13
Keynote Talk III
  • Clifford Nass, Integrating Multiple Modalities
    Psychology and Design of Multimodal Interfaces
  • Question How USERS integrate the modalities and
    content? (social psychology experiments)
  • When should synthetic and veridical aspects of
    interfaces be mixed?
  • What is the link between visual appearance and
    voice characteristics of pictorial agents?
  • How should interfaces respond to
    misunderstandings (a common problem in multimodal
    interfaces)?
  • How do multi-modal interfaces manifest gender,
    personality, and emotion?

14
Keynote Talk III
  • An example for their study results
  • When integrating voice and face, they should be
    consistent
  • Non-native speakers face Non-native speakers
    voice is better than Non-native speakers face
    native speakers voice
  • Cant internationalize website or agent by only
    manipulating one dimension
  • Nass Slides

15
Conference Outline
Sensors, Tools and Platforms
Speech Generation/Recognition
Speech/Text
Dialogue/Language Understanding
ICMI 02
Translation/Multilingual Interface
Signing, Gesturing, Writing
Vision
Gaze Tracking and Lipreading
Face Detection/Recognition
Application, User study, Evaluation
16
Speech Recognition/Generation
  • Covariance-tied Clustering in Speaker
    Identification (Zhiqiang Wang, etc.)
  • Problem Description
  • Speaker Identification GMM model EM algorithm
    (Suffers from Local Maxima)
  • Solution K-Means clustering -gt better initial
    model
  • Focus How to provide more robust initial models
    using K-Means clustering

17
Speech Recognition/Generation
  • Euclidean Distance
  • Most widely used distance for K-Means clustering
  • No consideration for point distribution
  • Mahalanobis Distance
  • Weights the distance by covariance matrix
  • Sphere the data and define better distance
  • But, data sparseness has to be solved
  • More parameters to estimate (The covariance
    matrix)

18
Speech Recognition/Generation
  • Layered Clustering Algorithm
  • Train one covariance matrix for all speech data
  • Use k-Means algorithm based on Mahalanobis
    distance to split data into two clusters
  • Get two new covariance matrices
  • Iterate 2,3 until it has enough clusters

19
Speech Recognition/Generation
  • Speaker Database (83 male, 83 female)
  • 26-dimensional MFCC vector
  • 64 mixture GMMs
  • Error Rate
  • The number represent the layers of clustering

20
Conference Outline
Sensors, Tools and Platforms
Speech Generation/Recognition
Speech/Text
Dialogue/Language Understanding
ICMI 02
Translation/Multilingual Interface
Signing, Gesturing, Writing
Vision
Gaze Tracking and Lipreading
Face Detection/Recognition
Application, User study, Evaluation
21
Gesturing
  • Prosody-based Co-analysis for continuous
    Recognition of Coverbal Gestures (Sanshzar etc.)
  • Motivation Better ways to recognize gesture
  • Previous
  • Combination of speech and gesture to boost
    classification (Semantically motivated)
  • This work
  • Fuse more elementary features with gesture
    features Prosodic feature, including Fundamental
    frequency (F0) contour and voiceless
    intervals(pause)
  • Construct co-occurrence model

22
Gesturing
  • Results
  • Correct Recognition Rate 81.8 (vs 72.4 with
    only visual feature)
  • Deletion Error 8.6 (vs 16.1)
  • Substitution Error 5.8 (vs. 9.2)
  • Comment Coanalysis for audio-visual features
    will be helpful, such as the monologue detection
    in TREC.

23
Conference Outline
Sensors, Tools and Platforms
Speech Generation/Recognition
Speech/Text
Dialogue/Language Understanding
ICMI 02
Translation/Multilingual Interface
Signing, Gesturing, Writing
Vision
Gaze Tracking and Lipreading
Face Detection/Recognition
Application, User study, Evaluation
24
Multilingual Interface
  • Improved Named Entity Translation and Bilingual
    Named Entity Extraction ( Fei Huang, Stephen
    Vogel)
  • Motivation Improve the named entity annotation
    quality with bilingual corpus information
  • Basic Idea
  • Cross-lingual information may be hints to correct
    named entity extraction error in baseline system
  • Extracted named entity with high alignment cost
    tends to be wrong

25
Multilingual Interface
  • Proposed NE annotation scheme
  • Annotate the bilingual corpus separately, using
    BBNs IdentiFinder
  • Compute Augmented Sentence Alignment Cost(ASAC)
    on baseline annotation
  • Find all the possible NEs from corpus
  • Using a greedy approximation algorithm to find
    alignment with minimal ASAC, if the cost are less
    than baseline algorithm, accept the alignment.
  • Tag the unaligned NE with most frequent type

26
Multilingual Interface
  • Results
  • Comments Cross-lingual relation contains useful
    information

27
Conference Outline
Sensors, Tools and Platforms
Speech Generation/Recognition
Speech/Text
Dialogue/Language Understanding
ICMI 02
Translation/Multilingual Interface
Signing, Gesturing, Writing
Vision
Gaze Tracking and Lipreading
Face Detection/Recognition
Application, User study, Evaluation
28
Sensors, Tools and Platforms
  • Audiovisual arrays for untethered spoken
    interfaces (Kenvin Wilson, Trevor Darrell, etc. )
  • Motivation
  • When faced with a distant speaker at a known
    location, microphone array can improve speech
    recognition
  • Estimating the location of a speaker in
    reverberant environment is difficult, so use a
    video camera array to aid localization
  • A audio-visual array approach to tracking speaker

29
Sensors, Tools and Platforms
  • Two array processing problem
  • Beamforming spatial filtering for speech signal
  • Amplify the signals coming from the selected
    region by adding/filtering from the array
  • Source Localization estimate location of signal
    source
  • One way beamform for all the location and choose
    the strongest location (large computation cost)
  • Another way Using the delay between arrays to
    calculate location
  • These problems are complementary to each other

30
Sensors, Tools and Platforms
  • Person tracking with multiple stereo views
  • To aid the source localizaition
  • Beamforming only the audio data
  • Source Localization audio video
  • Process
  • Vision Tracker initial guess for the location,
    accurate within less than one meter
  • Beam Power gradient descent search for a local
    maxima

31
Sensors, Tools and Platforms
  • Test Environments

32
Sensors, Tools and Platforms
  • Results

33
Summary
  • Informedia is related to multimodal interface
  • Information gathering, searching and browsing
  • Experience On Demand (EOD)
  • Capturing Coordinating and Remembering Human
    Experience (CCRHE)
  • Cross-modal relationship
  • Cross-modal consistency
Write a Comment
User Comments (0)
About PowerShow.com