Connecting Acoustics to Linguistics in Chinese Intonation - PowerPoint PPT Presentation

About This Presentation
Title:

Connecting Acoustics to Linguistics in Chinese Intonation

Description:

Connecting Acoustics to Linguistics in Chinese Intonation. Greg Kochanski (Oxford Phonetics) ... a small set of discrete symbols, in sequence, with ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 29
Provided by: gregkoc
Learn more at: https://kochanski.org
Category:

less

Transcript and Presenter's Notes

Title: Connecting Acoustics to Linguistics in Chinese Intonation


1
Connecting Acoustics to Linguistics in Chinese
Intonation
  • Greg Kochanski (Oxford Phonetics)
  • Chilin Shih (University of Illinois)
  • Tan Lee (CUHK)
  • with
  • Hongyan Jing (IBM)
  • Jiahong Yuan (Cornell)

2
Questions
Goal
Build a mathematical model thattakes a sequence
of discrete symbols as inputandproduces a
quantitative prediction for f0.
  • Can we usefully include biomechanics into a
    phonetics model?
  • Can we objectively assign an importance to a
    syllable?
  • Can we write a unified description of F0 for both
    tone and accent languages?

3
TheChallenge
4
Existing work
Rising?
5
Basic assumptions used in modeling
  • People plan their utterances several syllables in
    advance.
  • People produce speech optimized to communicate
    with minimal effort.
  • A realistic model for the muscles that control f0

6
Realistic model of muscle control for F0
  • Wed like a model of prosody that can apply
    beyond F0.

7
People talk nearly as fast as possible.
8
Speech could be optimal
  • Most of what we say is made from bits and pieces
    weve said before.
  • There are only 4 (Mandarin) or 6 (Cantonese)
    tones to combine.
  • A speaker has the chance to practice and optimize
    all the common 3- and 4- tone sequences.

9
Optimize what?
  • People want to minimize effort and/or talk faster
  • Chairs, Cars
  • People want to minimize the chance that they will
    be misunderstood.
  • Risk P(misinterpreted) cost(misinterpreted)
  • Minimize Effort costError
  • We allow each syllable to have a different
    weight, so error is a sum over syllables or
    words.
  • Perhaps cost matches importance.

10
Modeling math
Effort
is the muscle tension (frequency) at time t.
Each target encodes some linguistic information,
ri is the error of the ith target, and si is its
importance.
Error
y is the ith pitch target and a bar denotes an
average over a target.
11
Effort and Error
How does Effort depend on the form of the pitch
curve?
Error mean-squared deviation between the f0 and
the templates.
12
Model behavior
  • For costgtgt1, Error dominates, and pitch matches
    target.
  • For costltlt1, Effort dominates, both speaker and
    listener accept large deviations, and pitch
    smoothly interpolates.
  • For cost1, everything compromises.

Cost plays the role of a prosodic strength.
13
Another Challenge
1
F0 (Hz)
Time (10 ms intervals)
14
The rest of the model.
  • A model is a sequence of targets (used to compute
    the Error terms).
  • Each target has a strength (i.e. the cost of
    misinterpretation).
  • One target per tone.
  • Targets are stretched to fit syllable duration.
  • Only one phonological rule 33?23

15
Model fits for Mandarin Chinese
Tone class (input)
Strength (result)
Inside a word, strength is distributed by the
metrical pattern
16
Whats the procedure?
Sequence of tones (phonology)
Data
Compute the pitch curve as a function of
phonological inputs and prosodic strength.
Predicted F0
Prosodic strengths
Nonlinear least-squares fitting algorithm
17
Model fits to Mandarin Chinese
0.61 free parameters per syllable, 13 Hz RMS
error.
18
Strengths are stable under small changes in the
model.
This model allows extra freedom different tones
are allowed to define their targets differently
This model allows less freedom all tones have
the same type of target.
The two models have words defined by different
labelers
19
Model parameters
Cantonese
Phrasing is marked in speech.
Cantonese data courtesy of Prof. Tan Lee
Mandarin
20
Model parameters
Cantonese
Nouns are relatively important.
Mandarin
21
Model parameters
Cantonese
Longer words tend to be spoken more carefully.
Mandarin
22
Metrical patterns inside words
Normal segmentation of characters into words.
Mandarin
Random segmentation of characters into words.
Lexical acquisition
23
Other nice properties
  • Strengths are correlated with duration
  • (duration is a proxy for prominence)
  • r 0.40 (sentence final)
  • r 0.27 (non-final)
  • gt95 confidence
  • Strength is correlated with mutual information of
    neighboring syllables
  • r -0.175
  • gt95 confidence
  • Sloppy when generating unsurprising syllables,
    and precise for surprising syllables.

24
Local Conclusion
  • Intonation can be represented as
  • a small set of discrete symbols, in sequence,
    with
  • a per-person or per-style shape for each symbol
  • modulated by a variable prosodic strength.
  • One symbol per syllable seems enough
  • The strength parameter seems real
  • Similar across languages
  • Matches language structure

25
Q But does it work for English?
A Yes, under circumstances where the
intonational phonology is simple enough to be
obvious.
26
Reminder Limitations of f0 and complexity of
prosody.
To show the range of information that can be
carried by prosody, observe an elegant experiment
by Stan Freberg (1950) The text has virtually
no lexical information, but it still tells a
story. Even so, it is very hard to label
individual words.
27
English
  • Sentences in the form 123-456-7890?
  • Speaker is trying to confirm a single digit.
  • Models have just 1.1 parameter per sentence.

28
The model for English
  • There are identical boundary tones on every
    utterance.
  • All target shapes are identical, except the
    focus.
  • X B B B B A B B B B B Y
  • X B B B A B B B B B B Y
  • X B A B B B B B B B B Y
  • Rather simple phonology.
  • Accent prominence depends on position in phrase
    and in utterance.

29
Model details
910 999 - 1010
Decline over utterance
Strength
time
Decline over phrase
Local effect around accent
Compress range after accent
30
The rest of the model.
  • Where do you put the targets?
  • What are the targets?
  • Pitch values?
  • Slopes?
  • Do the targets change in f0 range with changes in
    strength?

31
Model fits well over a range of speeds.
Low speed
Merger of accent with boundary tone
High speed
32
Model reproduces nontrivial features of the data
and fits well over a range of speeds.
Low speed
Merger of accent with boundary tone
High speed
33
Conclusion
  • Physiologically-based models can capture
    important aspects of speech.
  • A very compact representation of behavior.
  • It can be applied broadly
  • Two dialects of Chinese
  • Some aspects of English
  • It raises questions about where the
    phonetics/phonology boundary actually sits.
  • Introduces an objective acoustic measure of
    prosodic prominence.
  • Suggests that the speaker may help the listener
    segment the speech stream.
Write a Comment
User Comments (0)
About PowerShow.com