Automatic Cue-Based Dialogue Act Tagging - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Cue-Based Dialogue Act Tagging

Description:

Identify dialogue acts in conversational speech. Spoken corpus: ... Reweight for different accuracies. Slightly better than raw ASR. Integrated Classification ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 20
Provided by: ginal5
Category:

less

Transcript and Presenter's Notes

Title: Automatic Cue-Based Dialogue Act Tagging


1
Automatic Cue-Based Dialogue Act Tagging
  • Discourse Dialogue
  • CMSC 35900-1
  • November 3, 2006

2
Roadmap
  • Task Corpus
  • Dialogue Act Tagset
  • Automatic Tagging Models
  • Features
  • Integrating Features
  • Evaluation
  • Comparison Summary

3
Task Corpus
  • Goal
  • Identify dialogue acts in conversational speech
  • Spoken corpus Switchboard
  • Telephone conversations between strangers
  • Not task oriented topics suggested
  • 1000s of conversations
  • recorded, transcribed, segmented

4
Dialogue Act Tagset
  • Cover general conversational dialogue acts
  • No particular task/domain constraints
  • Original set 50 tags
  • Augmented with flags for task, conv mgmt
  • 220 tags in labeling some rare
  • Final set 42 tags, mutually exclusive
  • Agreement K0.80 (high)
  • 1,155 conv labeled split into train/test

5
Common Tags
  • Statement Opinion declarative /- op
  • Question Yes/NoDeclarative form, force
  • Backchannel Continuers like uh-huh, yeah
  • Turn Exit/Adandon break off, /- pass
  • Answer Yes/No, follow questions
  • Agreement Accept/Reject/Maybe

6
Probabilistic Dialogue Models
  • HMM dialogue models
  • Argmax U P(U)P(EU) E evidence,UDAs
  • Assume decomposable by utterance
  • Evidence from true words, ASR words, prosody
  • Structured as offline decoding process on
    dialogue
  • States DAs, ObsUtts, P(Obs)P(EiUi),
    transP(U)
  • P(U)
  • Conditioning on speaker tags improves model
  • Bigram model adequate, useful

7
DA Classification -Words
  • Words
  • Combines notion of discourse markers and
    collocations e.g. uh-huhBackchannel
  • Contrast true words, ASR 1-best, ASR n-best
  • Results
  • Best 71- true words, 65 ASR 1-best

8
DA Classification - Prosody
  • Features
  • Duration, pause, pitch, energy, rate, gender
  • Pitch accent, tone
  • Results
  • Decision trees 5 common classes
  • 45.4 - baseline16.6
  • In HMM with DT likelihoods as P(EiUi)
  • 49.7 (vs. 35 baseline)

9
DA Classification - All
  • Combine word and prosodic information
  • Consider case with ASR words and acoustics
  • P(Ai,Wi,FiUi) P(Ai,WiUi)P(FiUi)
  • Reweight for different accuracies
  • Slightly better than raw ASR

10
Integrated Classification
  • Focused analysis
  • Prosodically disambiguated classes
  • Statement/Question-Y/N and Agreement/Backchannel
  • Prosodic decision trees for agreement vs
    backchannel
  • Disambiguated by duration and loudness
  • Substantial improvement for prosodywords
  • True words S/Q 85.9-gt 87.6 A/B 81.0-gt84.7
  • ASR words S/Q 75.4-gt79.8 A/B 78.2-gt81.7
  • More useful when recognition is iffy

11
Observations
  • DA classification can work on open domain
  • Exploits word model, DA context, prosody
  • Best results for prosodywords
  • Words are quite effective alone even ASR
  • Questions
  • Whole utterance models? more fine-grained
  • Longer structure, long term features

12
Automatic Metadata Annotation
  • What is structural metadata?
  • Why annotate?

13
What is Structural Metadata?
  • Issue Speech is messy
  • Sentence/Utterance boundaries not marked
  • Basic units for dialogue act, etc
  • Speech has disfluencies
  • Result Automatic transcripts hard to read
  • Structural metadata annotation
  • Mark utterance boundaries
  • Identify fillers, repairs

14
Metadata Details
  • Sentence-like units (SU)
  • Provide basic units for other processing
  • Not necessarily grammatical sentences
  • Distinguish full and incomplete SUs
  • Conversational fillers
  • Discourse markers, disfluencies um, uh, anyway
  • Edit disfluencies
  • Repetitions, repairs, restarts
  • Mark material that should be excluded from fluent
  • Interruption point (IP) where corrective starts

15
Annotation Architecture
  • 2 step process
  • For each word, mark IP, SU, ISU, none bound
  • For region boundwords identify CF/ED
  • Post-process to remove insertions
  • Boundary detection decision trees
  • Prosodic features duration, pitch, amp, silence
  • Lexical features POS tags, word/POS tag
    patterns, adjacent filler words

16
Boundary Detection - LM
  • Language model based boundaries
  • Hidden event language model
  • Trigram model with boundary tags
  • Combine with decision tree
  • Use LM value as feature in DT
  • Linear interpolation of DT LM probabilities
  • Jointly model with HMM

17
Edit and Filler Detection
  • Transformation-based learning
  • Baseline predictor, rule templates, objective fn
  • Classify with baseline
  • Use rule templates to generate rules to fix
    errors
  • Add best rule to baseline
  • Training Supervised
  • Features Word, POS, word use, repetition,loc
  • Tag Filled pause, edit, marker, edit term

18
Evaluation
  • SU Best combine all feature types
  • None great
  • CF/ED Best features lexical match, IP
  • Overall SU detection relatively good
  • Better on reference than ASR
  • Most FP errors due to ASR errors
  • DM errors not due to ASR
  • Remainder of tasks problematic

19
SU Detection
Features SU-R SU-P ISU-R ISU-P IP-R IP-P
Prosody only 46.5 74.6 0 8.8 47.2
POS, Pattern,LM 77.3 79.6 30 53.3 64.4 77.4
Pros,POS, Pattern,LM 81.5 80.4 36.5 69.7 66.1 78.7
Allfrag 81.1 81.6 20.1 60.7 80.7 80.4
Write a Comment
User Comments (0)
About PowerShow.com