Working memory and text-to-speech-to-text in language task - PowerPoint PPT Presentation

Loading...

PPT – Working memory and text-to-speech-to-text in language task PowerPoint presentation | free to view - id: 664e7-ZDc1Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Working memory and text-to-speech-to-text in language task

Description:

Working memory. and. text-to-speech-to-text. in language task ... Coney Island 1. Coney Island 2. Coney Island 3. PhD technical writing class, KUT, May 24, 2007 ... – PowerPoint PPT presentation

Number of Views:905
Avg rating:3.0/5.0
Slides: 71
Provided by: lawr75
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Working memory and text-to-speech-to-text in language task


1
Working memory and text-to-speech-to-text in
language task
Lawrie Hunter Kochi University of
Technology http//www.core.kochi-tech.ac.jp/hunter
2
Working memory and text-to-speech-to-text in
language task Recent advances have made
text-to-speech and speech-to-text (T2S2T)
software usable in a very practical sense, and
the user can now both create text by speaking
naturally and listen to electronic text. This
suggests that working memory as modeled by
Baddeley (1986, 2000, 2001) can now be
externalized to some extent, which would in turn
impact on cognitive load in language task. Olive
(2003) reports findings from dual-task
experimentation which link writing task and
short-term storage. In a time of earlier
technological capability, Ong (1998) claimed that
cultures that do not have a system of writing
(primary oral cultures) and those that do
(chirographic cultures) think differently as a
result of the writing difference. Ong said that a
second orality dominated by electronic modes of
communication has emerged in Western culture.
This second orality has aspects of both oral and
chirographic modes. Ong suggested that
orality-literacy differentiation would influence
our interpretation of various kinds of writing.
If text-to-speech-to-text empowerment were to
become broadly used, hypertext, which is just
settling into a mainstream niche, would have to
undergo a severe framework reconstruction. This
paper juxtaposes Ongs second orality and
Baddeleys model of working memory, with its
(since 2000) 4 components, the phonological loop,
the visuospatial sketchpad, the central executive
and the episodic buffer. Workable T2S2T promises
to change the nature of cognitive load
constraints in language learning task. It also
makes Baddeley's concept of working memory look
like a most promising task design tool. This
presentation examines whether a new third kind of
orality may emerge from the new T2S2T
technological reality, and makes some tentative
observations based on the exploratory hands-on
experience of second language users.
3
Current state Fragmentation of knowledge as a
result of the ongoing creation of research
niches A voracious, yet protective and
covetous knowledge industry
4
Current state Isnt CALL just a subset
of User Experience (UX?)
5
URGENT Just-in-time learner sociologyURGENT
Near-instant learner profilingUpgrade Learner
gt USERUser Experience (UX) practiceUZANTOs
MindCanvas -user profiling for a large target
group in a matter of hours RUMM rapid user
mental modelling GEMS game emulationThis may
be very fruitfully adapted to the foundation
explorationsleading to CALL decision-making.
Hunter (2006) Learners are evolving
The expanding palette Emergent CALL
paradigms (Invited virtual presentation) Antwerp
CALL 2006 http//www.core.kochi-tech.ac.jp/hunter/
professional/CALLparadigms/index.html
6
Now text-to-speech and speech-to-text (T2S2T)
software have become truly usable in a very
practical sense. This blurs the line between
speech and text in a very immediate way.
http//www.nextuptech.com/
http//www.nuance.com/naturallyspeaking/
7
John Milton (1609-1674) wrote Paradise Lost, long
considered one of the great works of English
literature. The book was first published in
1667. Milton came from a middle class family.
According to biographer Gordon Teskey, as a
youth, Milton routinely studied and churned over
homework until midnight. By the time he was
college age, Milton was fluent in English, Latin,
and Greek, and further had a proficient
understanding of Italian, Hebrew, and French.
After years of failing eyesight, in 1652, this
man who devoted so much of his life to reading
and writing became totally blind. In the same
year his first wife died, leaving Milton a
widower with three daughters, the oldest of whom
was not even six. Milton remarried four
years later to a woman who died soon after in
childbirth, along with the child. To write
Paradise Lost, Milton had to dictate the entire
epic poem to a transcriber. In those days,
punctuation was more a concern of copyists
and printers, and the person doing his dicatation
did not know how to use commas, quotation marks,
or other tools of grammar, forcing the blind
poet to dictate the poem's punctuation as well.
The book's first printing was considered a
success. The edition sold barely 1300 copies in
just under two years. It is said that to achieve
the standard of living an average American
enjoys today, a person in 1667 would have
required 200 servants.
8
Usable T2S2T No more typing. No more reading. No
more hands. Composition by speaking...ooh! Infor
mation acquisition by listening...ahh! If we do
this, we will be in a new orality.
9
What?? Audio is lame VIDEO is the game. We are
in the youtube era. Get a second life!
10
T2S will be fully usable in 2 (or x) years we
must assume the future and shift our place of
work there.
11
Is audio going out?
(for second language learning systems development)
12
Is audio going out?
Number Of Voice Calls Dropping In the UK by Carlo
Longino on May 7th, 2007 in Stats Is the
post-voice mobile era upon us? Stats out of the
UK show a significant drop in the number of voice
calls both pre- and postpaid users are making
each week. Last year, prepay users made an
average of 14 calls per week this year, its
down to 10. Postpaid users similarly fell, from
35 to 27. Prepay users texting levels held
steady, but postpaid users are now sending almost
50 more texts each week. Whats interesting is
this is happening as voice prices are falling,
too resulting in significantly lower spending,
according to the survey. It says prepaid spending
is down from 19.29 per month to 12.35 per
month, while postpaid is off 20 percent. Im not
sure just how much I buy into the spending
figures, though, as looking over the ARPU stats
for Vodafone and T-Mobile for the last couple of
years dont show a similar level of disruption
(and their subscriber growth doesnt make me
think people are flocking to cut-rate
providers). Anyhow, its worth noting the
apparent drop in call volume. People are talking
less, texting more and, hopefully, using more
data services in spite of the tariffs... Perhaps
were running out of things to say, or are even
more fully embracing the brevity and non-verbal
communication offered by SMS, email or IM. Maybe
people are figuring out that they want to talk
less on their mobiles, and do more with
them. http//mobhappy.com/blog1/2007/05/07/numbe
r-of-voice-calls-dropping-in-the-uk/
13
TODAY A search for principles governing the
use of voice in CALL
14
Investigation of voice and cognition
15
Walter Ong, 1982 Orality and Literacy The
Technologizing of the Word PRIMARY ORAL
cultures (no system of writing) think
differently from CHIROGRAPHIC cultures
16
Walter Ong, 1982 Orality and Literacy The
Technologizing of the Word Electronic media
(e.g. telephone, radio and television) brought
about a second orality paraphrase Both
primary and secondary oralities afford a
strong sense of membership in a group.
paraphrase
17
Walter Ong, 1982 Orality and Literacy The
Technologizing of the Word Electronic media
(e.g. telephone, radio and television) brought
about a second orality paraphrase Both
primary and secondary oralities afford a
strong sense of membership in a group.
paraphrase
BUT Secondary orality is "essentially a more
deliberate and self-conscious orality, based
permanently on the use of writing and print,"
and produces much larger groups.
18
Kathleen Welch rejects claims that Ong posits
mutually exclusive, competitive, reductive
orality-literacy divide. Welch argues that Ong
emphasizes -a mingling of these types of
consciousness -tenacity of established forms as
new ones appear
Welch, K. (1999) Electric Rhetoric Classical
Rhetoric, Oralism, and a New Literacy. MIT Press.
p. 59
19
Welch argues that TV's ubiquity has resulted in
a new, electronic literacy. We shall not go
there today.
20
Workable T2S2T promises to change the nature of
cognitive load constraints in text
production/decoding, and hence in language
learning task.
21
Workable T2S2T There is now S2T (Dragon Voice)
for Indian English, British English... but not
for Japanese English yet. (Ever?)
http//labnol.blogspot.com/2007/01/dragon-naturall
yspeaking-9-speech.html
22
Workable T2S2T There is now S2T (Dragon Voice)
for Indian English, British English... but not
for Japanese English yet. (Ever?)
http//labnol.blogspot.com/2007/01/dragon-naturall
yspeaking-9-speech.html So the tech is
there for computers to decode human speech better
than humans can...?
23
HOWEVER we dont know much about how orality
works. Perhaps that is because orality is so
ingrained in us.
24
Walter Ong, 1982 Orality and Literacy The
Technologizing of the Word
Secondary orality 163 years
The three stages of consciousness
Literacy 2800 years
Primary orality 200,000 years
Telegraphy USA, 1844
Invention of phonetic alphabet in 8th century BCE
Rhys Carpenter (1933) The antiquity of the Greek
alphabet. American journal of archaeology 37
8-29.
25
WIRED FOR SPEECH Orality has been part of human
life for a long time. After 200,000 years of
evolution ...humans have become
voice-activated, with brains that are wired to
equate voices with people and to act quickly on
that information.
Nass, C. S. Brave. (2005) Wired for speech.
(2005). MIT Press.
26
Writing a secondary modelling system
Lotman, J., trans. R. Vroon (1977) The structure
of the artistic text. Michigan Slavic Studies, 7.
Writing can never exist without orality. p.
8 Speeches that were studied as rhetoric could
only be studied if they were transcribed.
Ong, W. (1982) Orality and literacy The
technologizing of the word. 1997 reprint
Routledge.
27
Writing a secondary modelling system
Lotman, J., trans. R. Vroon (1977) The structure
of the artistic text. Michigan Slavic Studies, 7.
...to this day no concepts have yet been formed
for effectively, let alone gracefully,
conceiving of oral art as such without
reference, conscious or unconscious, to
writing. p.10
Ong, W. (1982) Orality and literacy The
technologizing of the word. 1997 reprint
Routledge.
28
Psychodynamics of orality ...you know what you
can recall.
Ong, W. (1982) Orality and literacy The
technologizing of the word. 1997 reprint
Routledge.
29
Psychodynamics of orality Pythagoras and the
acousmatics
The term acousmatic dates back to Pythagoras, who
is believed to have tutored his students from
behind a screen so as not to let his presence
distract them from the content of his lectures.
wikipedia.org May 20, 2007 edited from Chion,
M.(1994). "Audio-Vision Sound on Screen",
Columbia University Press.
30
Psychodynamics of orality Pythagoras and the
acousmatics
In cinema, acousmatic sound is sound one hears
without seeing an originating cause - an
invisible sound source. Radio, phonograph and
telephone, all which transmit sounds without
showing the source cause, are acousmatic media.
wikipedia.org May 20, 2007 edited from Chion,
M.(1994). "Audio-Vision Sound on Screen",
Columbia University Press.
31
Psychodynamics of orality Acousmatic is
ubiquitous in CALL. Arent there situations
where acousmatic sound is appropriate? and
situations where it is not?
32
Orality and writing production Kellogg Sentence
Production Demands Verbal Working
Memory Orthographic as well as phonological
representations must be activated for written
spelling. o Bonin, Fayol, Gombert
(1997) Verbal WM is necessary to maintain
representations during grammatical, phonological,
and orthographic encoding. o Levy Marek
(1999) o Chenoweth Hayes (2001) o Kellogg,
Olive, Piolat (2006)
Kellogg, R. (2006) Training writing skills A
cognitive developmental perspective. EARLI
SigWriting 2006 Antwerp. http//webhost.ua.ac.be/s
igwriting2006/Kellogg_SigWriting2006.pdf
33
Audio sources in life
John Thackara tells of Ivan Illichs finding
that
In the 1930s, 9 out of 10 words a man heard by
age 20 were spoken directly to him.In the
1970s, 9 out of 10 words a man heard by age 20
were spoken through a loudspeaker.

Illich (1982) Computers are doing to
communication what fences did to pastures and
what cars did to streets. book In the
Bubble blog http//www.doorsofperception.com/
34
We are innately orate
Human beings can quickly distinguish one
persons voice from another. p. 3
we know these things from differing heartbeat
responses Nass, C. S. Brave. (2005) Wired for
speech. (2005). MIT Press.
35
We are innately orate
Human beings can quickly distinguish one
persons voice from another. p. 3 -even in the
womb we can distinguish our mothers voice from
that of another.
we know these things from differing heartbeat
responses Nass, C. S. Brave. (2005) Wired for
speech. (2005). MIT Press.
36
We are innately orate
Human beings can quickly distinguish one
persons voice from another. p. 3 -even in the
womb we can distinguish our mothers voice from
that of another. -a few days after birth,
newborns prefer their mothers voice to that of
others, and can distinguish one unfamiliar voice
from another.
we know these things from differing heartbeat
responses Nass, C. S. Brave. (2005) Wired for
speech. (2005). MIT Press.
37
We are innately orate
Human beings can quickly distinguish one
persons voice from another. p. 3 -even in the
womb we can distinguish our mothers voice from
that of another. -a few days after birth,
newborns prefer their mothers voice to that of
others, and can distinguish one unfamiliar voice
from another. -by 8 months of age we can attend
to one voice even when another is speaking at
the same time.
we know these things from differing heartbeat
responses Nass, C. S. Brave. (2005) Wired for
speech. (2005). MIT Press.
38
Humans experts at extracting social info from
speech
Word choice carries social information. UX work
makes choices such as blaming 1. Speak
up. 2. Im sorry, I didnt catch
that. 3. We seem to have a bad connection.
Could you please repeat that?
Nass, C. S. Brave. (2005) Wired for speech.
(2005). MIT Press.
39
Humans experts at extracting social info from
speech
Word choice carries social information. UX work
makes choices such as voice quality Booming
deep voice Could I possible ask you if you
wouldnt mind doing a tiny favor? High-pitched,
soft voice Pick up that shovel and start
digging!
Nass, C. S. Brave. (2005) Wired for speech.
(2005). MIT Press.
40
Humans automatically react socially to voice
...the conscious knowledge that speech can have
a non-human origin is not enough for the brain
to overcome the historically appropriate
activation of social relationships by
voice even when voice quality is low and
speech understanding is poor.
Nass, C. S. Brave. (2005) Wired for speech.
(2005). MIT Press.
41
Interiority of sound
...in an oral noetic economy, mnemonic
serviceability is sine qua non... p. 70 In
other words, oral information must be arranged
in a certain way a visual way if it is to be
remembered.
Ong, W. (1982) Orality and literacy The
technologizing of the word. 1997 reprint
Routledge.
42
Incorporating interiority
The eye cannot perceive interiority, only
surfaces. Taste and smell are not much help in
registering interiority/exteriority. Touch can
detect interiority but in the process damages
it. Hearing can register interiority without
violating it. Sight isolates, sound
incorporates.
Ong, W. (1982) Orality and literacy The
technologizing of the word. 1997 reprint
Routledge.
43
Incorporating interiority
44
Oral memory
In primary oral cultures, need for an aide
memoire
-heavily rhythmic speech -balanced
patterns -epithetic expressions -formulary
expressions -standard thematic settings
Ong, W. (1982) Orality and literacy The
technologizing of the word. 1997 reprint
Routledge. p. 33
45
Oral memory
In primary oral cultures, thought and
expression are additive rather than subordinate.
Ong, W. (1982) Orality and literacy The
technologizing of the word. 1997 reprint
Routledge. p. 37 ff.
46
3 modes of listening
Causal listening to identify the source of a
sound. Semantic listening refers to a code or
language to interpret a message. Reduced
listening focuses on the traits of the sound
itself.
Chion, M. (1994) Audio-vision sound on the
screen. Columbia University Press. p. 37 ff.
47
Tentative observations based on the exploratory
hands-on experience of second language users.
Innisfree 1 Innisfree 2 Innisfree 3 Coney
Island 1 Coney Island 2 Coney Island 3
PhD technical writing class, KUT, May 24, 2007
48
Tentative observations based on the exploratory
hands-on experience of second language users.
PhD technical writing class, KUT, May 24, 2007
49
Tentative observations based on the exploratory
hands-on experience of second language users.
PhD technical writing class, KUT, May 24, 2007
50
Tentative observations based on the exploratory
hands-on experience of second language users.
Self-reported estimates of comprehension of
samples.
PhD technical writing class, KUT, May 24, 2007
51
Tentative observations based on the exploratory
hands-on experience of second language users.
Self-reported estimates of comprehension of
samples.
PhD technical writing class, KUT, May 24, 2007
52
How might language learning support systems be
influenced by the new T2S2T technological
reality?
53
Articulation at the phrase level In the
learners awareness S2T software foregrounds
articulation T2S foregrounds intonation,
blending, pausing
54
Articulation at the phrase level Can S2T be used
to improve pronunciation?
Mitra, S., Tooley, J., Inamdar, P. and Dixond, P.
(2003) Improving English Pronunciation An
Automated Instructional Approach. Information
Technologies and International Development Volume
1, Number 1, Fall 2003, 7584. Massachusetts
Institute of Technology. http//www.mitpressjourna
ls.org/doi/abs/10.1162/itid.2003.1.1.75
55
Articulation at the phrase level Can S2T be used
to improve pronunciation?
Mitra, S., Tooley, J., Inamdar, P. and Dixond, P.
(2003) Improving English Pronunciation An
Automated Instructional Approach. Information
Technologies and International Development Volume
1, Number 1, Fall 2003, 82. Massachusetts
Institute of Technology. http//www.mitpressjourna
ls.org/doi/abs/10.1162/itid.2003.1.1.75
56
The looming prospect of a text-reduced
world Specificity as a foreign language e.g.
University X web site Japanese interview gt
English web site
57
T2S2T brings richness to materials design. T2S2T
should imply that there will be a broad,
instantaneous choice of interface with
data. Aside from tangible choices of medium,
other parameters demand attention input
density -number of communication objects per
signal input complexity -degree of text
reduction -visual field richness -number of
simultaneous signals
58
  • Sometimes signals are
  • 1. complementary, e.g. Changs sound track
    supplies one of many possible intonations for a
    hypertext.
  • 2. conflicting, e.g. phone user in a movie
    theater
  • 3. mutually irrelevant, e.g. Muzak vs.
    supermarket sale signs
  • 4. channel competing, e.g. powerpoint text and
    speech
  • e.g. mosquito buzz vs. TV images
  • 5. internal-external conflicting
  • e.g. on-screen text back-checking during S2T
    writing
  • http//www.yhchang.com

59
A cubist look at text and attention Chang,
Young-Hae. NIPPON.html/ Here, is text so reduced
as to be iconic? How is this parallel to sound
objects?
http//www.yhchang.com
60
We take the messenger as the source of the
message p184 If (as the psychological
literature suggests) people orient toproximate
sources, then Rule 1 The computer, not the
programmer, will be considered the source of
information. Rule 2 People working with
computers will not think about
programmers during an interaction. Rule 3
Interaction with computers are more desirable for
users when they don't think about a
programmer. p186 Presence is relevance
People do not respond to computers as arresult
of imagining an interaction with a programmer
Reeves and Nass, The media equation
61
A marvel in this age of niche books many
answers from one source
evolving from
Nass and Brave, Wired for speech.
62
Improving voice interfaces by applying knowledge
of human speech
gender choice gender stereotyping voice
personalities accent, race, ethnicity user
emotion / voice emotion voice and content
emotion synthetic vs. recorded variation of
synthetic voice character assignment of
humanity input type error and blame
Nass and Brave, Wired for speech.
63
Improving voice interfaces by applying knowledge
of human speech
Emotion can direct users towards or away from an
aspect of an interface. Emotion affects
cognition, e.g. in vehicle driving support
software. Finding people find it easier and
more natural to attend to voice emotions
consistent with their own present emotions. p. 77
gender choice gender stereotyping voice
personalities accent, race, ethnicity user
emotion / voice emotion voice and content
emotion synthetic vs. recorded variation of
synthetic voice character assignment of
humanity input type error and blame
Nass and Brave, Wired for speech.
64
A promising task design tool Baddeley and
Hitchs 1986 model of working memory, with its
3 components.
  • Three-component model of working memory
  • -assumes an attentional controller, the central
    executive, aided by two subsidiary systems
  • the phonological loop, capable of holding
    speech-based information, and
  • the visuospatial sketchpad, which performs a
    similar function for visual information.
  • The two subsidiary systems form active stores
    that are capable of combining information from
    sensory input, and from the central executive.
    Hence a memory trace in the phonological store
    might stem either from a direct auditory input,
    or from the subvocal articulation of a visually
    presented item such as a letter.

Please read this on Hunters web site.
65
Working memory model extended Phonological loop
Important for short-term storage -ALSO for long
term phonological learning Associated
with -development of vocabulary in
children -speed of FLA in adults
Central Executive
Phonological Loop
Visuo-spatial Sketchpad
Visual semantics
Episodic LTM
Language
Baddeley, A. D. (2000) The episodic buffer a new
component of working memory? Trends in cognitive
sciences 4(11) 417-423.
66
  • Working memory model extended
  • Phonological loop effects
  • Phonological similarity
  • Word-length
  • Articulatory suppression
  • Code transfer
  • Central rehearsal code,
  • not operation

Central Executive
Phonological Loop
Visuo-spatial Sketchpad
Visual semantics
Episodic LTM
Language
Baddeley, A. D. (2000) The episodic buffer a new
component of working memory? Trends in cognitive
sciences 4(11) 417-423.
67
A most promising task design tool Baddeleys
model of working memory, with its (since 2000)
4 components.
Central Executive
The episodic buffer -assumed capable of storing
infor-mation in a multi-dimensional code. -thus
provides a temporary interface between the slave
systems and LTM. -assumed to be controlled by
the central executive -serves as a modelling
space that is separate from LTM, but which forms
an important stage in longterm episodic learning.
Phonological Loop
Visuo-spatial Sketchpad
Episodic Buffer
Visual semantics
Episodic LTM
Language
Shaded areas crystallized cognitive systems
capable of accumulating long-term
knowledge Unshaded areas fluid capacities
(such as attention and temporary storage),
themselves unchanged by learning.
Baddeley, A. D. (2000) The episodic buffer a new
component of working memory? Trends in cognitive
sciences 4(11) 417-423.
68
Current state Isnt CALL just a subset
of User Experience (UX?)
69
Thank you for your kind attention.
Dont hesitate to write to me.
Lawrie Hunter Kochi University of
Technology http//www.core.kochi-tech.ac.jp/hunter
70
Lawrie Hunter directs the critical thinking and
technical academic writing programs at Kochi
University of Technology in Japan. He is
interested in language task design, information
design for second language learning materials,
working memory and cognitive load, and hypertext
for second language readers.
About PowerShow.com