Modeling Prosody for Automatic Corpus Labeling and Utterance Classification for Translation - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Modeling Prosody for Automatic Corpus Labeling and Utterance Classification for Translation

Description:

Construct FSM from syntactic-prosodic language model and CHMM acoustic models ... results from prosody labeling to build prosodic 'language models' for each class ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 13
Provided by: wwwsc
Category:

less

Transcript and Presenter's Notes

Title: Modeling Prosody for Automatic Corpus Labeling and Utterance Classification for Translation


1
Modeling Prosody for Automatic Corpus Labeling
and Utterance Classification for Translation
  • Shankar Ananthakrishnan
  • July 29, 2004

2
Labeling Previous work
  • Automatic annotation of stress patterns
  • Stress patterns based on ToBI labeling
  • Previous approach
  • Static feature vectors
  • Syllable level feature extraction
  • Only features related to pitch employed
  • Limitations
  • Temporal misalignment between event and evidence
  • No history or language model
  • Syntactic information not incorporated

3
Labeling Current approach
  • Binary decision at the syllable level
  • Simple stressed/unstressed classification
  • Avoids complicated ToBI stress categorization
  • Time series modeling framework
  • Integrate multiple streams of information (ex.
    intensity, phone duration, pitch)
  • Streams evolve at different speeds,
    asynchronity
  • Modeling framework Coupled HMMs
  • Incorporate syntactic knowledge statistically
  • Part-of-Speech bigrams predict probability of
    stress

4
Acoustic Model
  • Coupled Hidden Markov Model
  • Captures relationship between asynchronous
    information streams
  • Used in audio-visual ASR to jointly model
    acoustic/video data

Stream 1
A1
A2
A3
A4
B1
B2
B3
B4
Stream 2
5
Language Model
  • Syntax is a powerful predictor of prosody
  • Can be used to determine a baseline prosody
    regardless of the specific utterance Arnfield
    1994
  • Part-of-Speech (P-o-S) is a very accurate
    indicator of presence/absence of stress
  • Words in noun phrases and content words much more
    likely to be stressed
  • P-o-S based language model for prosody
    recognizer
  • Use tagger to label corpus with P-o-S
  • Annotated data can be used to build language
    models of the type p(NP_s PP_u)

6
Prosody Recognizer
  • Acoustic models at the syllable level
  • Stress occurs at the syllable level
  • Word/higher level models will have an undesirable
    averaging effect on stress labeling
  • Only two categories for acoustic model stressed
    and unstressed two CHMMs
  • Recognition (decoding)
  • Construct FSM from syntactic-prosodic language
    model and CHMM acoustic models using lexical
    stress information from standard dictionaries
  • Decode using Viterbi algorithm to obtain most
    likely stress pattern sequence

7
Machine Translation
  • Utterance classification
  • Map each utterance to its canonical form
  • Pick corresponding utterance in target language
  • Simple approach (Transonics)
  • Build class-specific language models p(w c)
  • Evaluate test utterance against each LM and pick
    the class whose model provides the highest score
  • c arg maxc p(w c)
  • Limitations of this approach
  • Insufficient data to train class-specific LMs
  • Ignores information contained in acoustics

8
Prosody for Classification
  • Joint model incorporating acoustics and language
    information
  • Assume all classes are equally likely for the
    following formulation
  • MAP classifier
  • c arg maxc p(c p, w) arg maxc p(p
    w, c) p(w c)

class LM
acoustic-prosodic model
9
A Naïve Approach
  • Simplify the acoustic-prosodic model
  • c arg maxc p(c p, w) arg maxc p(p, w
    c) arg maxc p(p c) p(w c)
  • Assumption given the target class, prosody is
    only weakly dependent on the word sequence
  • Can use results from prosody labeling to build
    prosodic language models for each class
  • Class-based word LMs can be used as before

10
Joint Modeling
  • Sparsity of training data
  • Insufficient data for modeling p(p w, c) if we
    model acoustics at the word or higher level
  • Even p(w c) suffers from sparsity problems
    (only 4-5 sentences per class on average)
  • Modeling at phrase level
  • Captures high-level information useful for
    classification
  • CHMMs at the phrase level instead of syllable
    level
  • Issue what units to use for acoustic models so
    as to capture phrase-level prosody and overcome
    problems posed by sparsity?

11
Semantic Annotation
  • Attach semantic tags to phrases
  • Automatic phrase parser segments text into phrase
    constituents (Charniak)
  • Statistical semantic tagger can be built from
    hand-annotated corpora (Gildea Jurafsky)
  • UC Berkeley ICSI FrameNet corpus
  • Eliminating sparsity
  • Few semantic roles for our translation problem
    (medical domain)
  • Class-based language and acoustic models can be
    built from tagged semantic roles instead of raw
    word sequences
  • LMs will be much more accurate
  • Acoustic models may perform better, since prosody
    depends more on the semantic role than on the
    actual word sequence

12
Priorities
  • Procure FrameNet corpus (request sent)
  • Finalize feature set and topology for CHMM
    acoustic models
  • Begin implementing above ideas!
  • ICASSP 2005
Write a Comment
User Comments (0)
About PowerShow.com