Speech, Language and Human-Computer Interaction - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Speech, Language and Human-Computer Interaction

Description:

Cognitive Sciences ... Rooted in cognitive/psycholinguistic accounts of the ... Discourse planners to select content from data and knowledge bases and organise ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 34
Provided by: philip290
Category:

less

Transcript and Presenter's Notes

Title: Speech, Language and Human-Computer Interaction


1
Speech, Language andHuman-Computer Interaction
William Marslen-Wilson
Steve Young
Johanna Moore Martin Pickering Mark Steedman
2
Contents
  • Background and motivation
  • State of the Art
  • Speech recognition and understanding
  • Cognitive neuroscience
  • Computational models of interaction
  • The Grand Challenge
  • Research Themes

2
3
Spoken language and human interaction will be an
essential feature of truly intelligent systems.
For example, Turing made it the basis of his
famous test to answer the question Can machines
think?
(Computing Machinery and Intelligence, Mind, 1950)
Spoken language is the natural mode of
communication and truly ubiquitous computing will
rely on it.
The Vision Apples Knowledge Navigator
The Reality A currently deployed flight enquiry
demo
... but we are not quite there yet!
3
Introduction
4
Current situation
Human Language System
Collect data
Observe
Cognitive Sciences
Computational Language Use
Development of neuro-biologically and
psycholinguistically plausible accounts of human
language processes (comprehension and production)
Symbolic and statistical models of human
language processing (e.g., via parsing,
semantics, generation, discourse analysis)
Engineering Systems
4
Introduction
5
State of the Art Speech Recognition
Goal is to convert acoustic signal to words
Acoustics
Y
He bought it
Words
W
State of the Art Speech Recognition
6
General Approach Hierarchy of Markov Chains
State of the Art Speech Recognition
7
Model Building
He said that ...
Speaking from the White House, the president said
today that the nation would stand firm against
the ....
about 500 million words of text
about 100 hours of speech
Acoustic Models
Language Model
7
State of the Art Speech Recognition
8
Recognising
State of the Art Speech Recognition
9
Progress in Automatic Speech Recognition
Easy
Word Error Rate
Hard
9
State of the Art Speech Recognition
10
Current Research in Acoustic Modelling
Hidden Markov Model
Quasi-stationary assumption a major weakness
State of the Art Speech Recognition
11
State of the Art Cognitive Neuroscience of
Speech and Language
  • Scientific understanding of human speech and
    language in state of rapid transformation and
    development
  • Rooted in cognitive/psycholinguistic accounts of
    the functional structure of the language
    system.
  • Primary drivers now coming from neurobiology, new
    neuroscience techniques

11
State of the Art Cognitive Neuroscience of
Speech and Language
12
Neurobiology of homologous brain systems primate
neuroanatomy and neurophysiology
(Rauschecker Tian, PNAS, 2000)
12
State of the Art Cognitive Neuroscience of
Speech and Language
13
  • Provides a potential template for investigating
    the human system
  • Illustrates the level of neural and functional
    specificity that is achievable
  • Points to an explanation in terms of multiple
    parallel processing pathways, hierarchically
    organised

13
State of the Art Cognitive Neuroscience of
Speech and Language
14
Speech and language processing in the human brain
  • Requires an interdisciplinary combination of
    neurobiology, psycho-acoustics,
    acoustic-phonetics, neuro-imaging, and
    psycholinguistics
  • Starting to deliver results with high degree of
    functional and neural specificity
  • Ingredients for a future neuroscience of speech
    and language

14
State of the Art Cognitive Neuroscience of
Speech and Language
15
Hierarchical organisation of processes in primary
auditory cortex (belt, parabelt)
(from Patterson, Uppenkamp, Johnsrude
Griffiths, Neuron, 2002)
15
State of the Art Cognitive Neuroscience of
Speech and Language
16
Hierarchical organisation of processing streams
Activation as a function of intelligibility for
sentences heard in different types of noise
(Davis Johnsrude, J. Neurosci, 2003). Colour
scale plots intelligibility-responsive regions
which were sensitive to the acoustic-phonetic
properties of the speech distortion (orange to
red) contrasted with regions (green to blue)
whose response was independent of lower-level
acoustic differences .
16
State of the Art Cognitive Neuroscience of
Speech and Language
17
  • Essential to image brain activity in time as well
    as in space
  • EEG and MEG offer excellent temporal resolution
    and improving spatial resolution
  • This allows dynamic tracking of the
    spatio-temporal properties of language processing
    in the brain
  • Demonstration (Pulvermüller et al) using MEG to
    track cortical activity related to spoken word
    recognition

17
State of the Art Cognitive Neuroscience of
Speech and Language
18
700 ms
18
State of the Art Cognitive Neuroscience of
Speech and Language
19
  • Glimpse of future directions in cognitive
    neuroscience of language
  • Importance of understanding the functional
    properties of the domain
  • Neuroscience methods for revealing the
    spatio-temporal properties of the underlying
    systems in the brain

20
State of the Art Computational Language Systems
  • Modelling interaction requires solutions for
  • Parsing Interpretation
  • Generation Synthesis
  • Dialogue management
  • Integration of component theories and technologies

20
State of the Art Computational Language Systems
21
Parsing and Interpretation
  • Goal is to convert a string of words into an
    interpretable structure.
  • Marks bought Brooks
  • (TOP (S (NP-SBJ Marks)
  • (VP (VPD bought)
  • (NP
    Brooks))
  • ()))
  • Translate treebank into a grammar and statistical
    model

21
State of the Art Computational Language Systems
22
Parser Performance
  • Improvement in performance in recent years over
    unlexicalized baseline of 80 in ParsEval
  • Magerman 1995 84.3 LP 84.0 LR
  • Collins 1997 88.3 LP 88.1 LR
  • Charniak 2000 89.5 LP 89.6 LR
  • Bod 2001 89.7 LP 89.7 LR
  • Interpretation is beginning to follow (Collins
    1999 90.9 unlabelled dependency recovery)
  • However there are signs of asymptote

23
Generation and Synthesis
  • Spoken dialogue systems use
  • Pre-recorded prompts
  • Natural sounding speech, but practically limited
    flexibility
  • Text-to-speech
  • Provides more flexibility, but lacks adequate
    theories of how timing, intonation, etc. convey
    discourse information
  • Natural language (text) generation
  • Discourse planners to select content from data
    and knowledge bases and organise it into semantic
    representations
  • Broad coverage grammars to realise semantic
    representation in language

24
Spoken Dialogue Systems
  • Implemented as Hierarchical Finite State Machines
    or Voice XML
  • Can
  • Effectively handle simple tasks in real time
  • automated call routing
  • travel and entertainment information and booking
  • Be robust in face of barge-in
  • e.g., cancel or help
  • Take action to get dialogue back on track
  • Generate prompts sensitive to task context

24
State of the Art Computational Language Systems
25
Limitations of Current Approaches
  • Design and maintenance are labour intensive,
    domain specific, and error prone
  • Must specify all plausible dialogues and content
  • Mix task knowledge and dialogue knowledge
  • Difficult to
  • Generate responses sensitive to linguistic
    context
  • Handle user interruptions, user-initiated task
    switches or abandonment
  • Provide personalised advice or feedback
  • Build systems for new domains

25
State of the Art Computational Language Systems
26
What Humans Do that Todays Systems Dont
  • Use context to interpret and respond to questions
  • Ask for clarification
  • Relate new information to whats already been
    said
  • Avoid repetition
  • Use linguistic and prosodic cues to convey
    meaning
  • Distinguish whats new or interesting
  • Signal misunderstanding, lack of agreement,
    rejection
  • Adapt to their conversational partners
  • Manage the conversational turn
  • Learn from experience

26
State of the Art Computational Language Systems
27
Current directions
  • 1M words of labelled data is not nearly enough
  • Current emphasis is on lexical smoothing and
    semi-supervised methods for training parser
    models
  • Separation of dialogue management knowledge from
    domain knowledge
  • Integration of modern (reactive) planning
    technology with dialogue managers
  • Reinforcement learning of dialogue policies
  • Anytime algorithms for language generation
  • Stochastic generation of responses
  • Concept-to-speech synthesis

27
State of the Art Computational Language Systems
28
Grand Challenge
To understand and emulate human capability for
robust communication and interaction.
28
Grand Challenge
29
Research Programme
Three inter-related themes
  1. Exploration of language function in the human
    brain
  2. Computational modelling of human language use
  3. Analysis and modelling of human interaction

The development of all three themes will aim at a
strong integration of neuroscience and
computational approaches
29
Grand Challenge
30
Theme 1 Exploration of language function in the
human brain
  • Development of an integrated cognitive
    neuroscience account
  • precise neuro-functional mapping of speech
    analysis system
  • identification/analysis of different cortical
    processing streams
  • improved (multi-modal) neuro-imaging methods for
    capturing spatio-temporal patterning of brain
    activity supporting language function
  • linkage to theories of language learning/brain
    plasticity
  • Expansion of neurophysiological/cross-species
    comparisons
  • research into homologous/analogous systems in
    primates/birds
  • development of cross-species neuro-imaging to
    allow close integration with human data
  • Research across languages and modalities (speech
    and sign)
  • contrasts in language systems across different
    language families
  • cognitive and neural implications of spoken vs
    sign languages

30
Grand Challenge
31
Theme 2 Computational modelling of human
language function
  • Auditory modelling and human speech recognition
  • learn from human auditory system especially use
    of time synchrony and vocal tract normalisation
  • move away from quasi-stationary assumption and
    develop effective continuous state models
  • Data-driven language acquisition and learning
  • extend successful speech recognition and parsing
    paradigm to semantics, generation and dialogue
    processing
  • apply results as filters to improve speech and
    syntactic recognition beyond the current
    asymptote
  • develop methods for learning from large
    quantities of unannotated data
  • Neural networks for speech and language
    processing
  • develop kernel-based machine learning techniques
    such as SVM to work in continuous time domain
  • understand and learn from human neural processing

31
Grand Challenge
32
Theme 3 Analysis and modelling of human
interaction
  • Develop psychology, linguistics, and neuroscience
    of interactive language
  • Integrate psycholinguistic models with context to
    produce situated models
  • Study biological mechanisms for interaction
  • Controlled scientific investigation of natural
    interaction using hybrid methods
  • Integration of eye tracking with neuro-imaging
    methods
  • Computational modelling
  • Tractable computational models of situated
    interaction
  • e.g., Joint Action, interactive alignment,
    obligations, SharedPlans
  • Integration across levels
  • in interpretation integrate planning, discourse
    obligations, and semantics into language models
  • in production
  • semantics of intonation
  • speech synthesizers that allow control of
    intonation, timing

32
Grand Challenge
33
Summary of Benefits
Grand Challenge
To understand and emulate human capability for
robust communication and interaction.
  1. Greater scientific understanding of human
    cognition and communication
  2. Significant advances in noise-robust speech
    recognition, understanding, and generation
    technology
  3. Dialogue systems capable of adapting to their
    users and learning on-line
  4. Improved treatment and rehabilitation of
    disorders in language function novel language
    prostheses

33
Grand Challenge
Write a Comment
User Comments (0)
About PowerShow.com