Automatic Cue-Based Dialogue Act Tagging

About This Presentation

Title:

Automatic Cue-Based Dialogue Act Tagging

Description:

Identify dialogue acts in conversational speech. Spoken corpus: ... Reweight for different accuracies. Slightly better than raw ASR. Integrated Classification ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 20

Provided by: ginal5

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Cue-Based Dialogue Act Tagging

1
Automatic Cue-Based Dialogue Act Tagging

Discourse Dialogue
CMSC 35900-1
November 3, 2006

2
Roadmap

Task Corpus
Dialogue Act Tagset
Automatic Tagging Models
Features
Integrating Features
Evaluation
Comparison Summary

3
Task Corpus

Goal
Identify dialogue acts in conversational speech
Spoken corpus Switchboard
Telephone conversations between strangers
Not task oriented topics suggested
1000s of conversations
recorded, transcribed, segmented

4
Dialogue Act Tagset

Cover general conversational dialogue acts
No particular task/domain constraints
Original set 50 tags
Augmented with flags for task, conv mgmt
220 tags in labeling some rare
Final set 42 tags, mutually exclusive
Agreement K0.80 (high)
1,155 conv labeled split into train/test

5
Common Tags

Statement Opinion declarative /- op
Question Yes/NoDeclarative form, force
Backchannel Continuers like uh-huh, yeah
Turn Exit/Adandon break off, /- pass
Answer Yes/No, follow questions
Agreement Accept/Reject/Maybe

6
Probabilistic Dialogue Models

HMM dialogue models
Argmax U P(U)P(EU) E evidence,UDAs
Assume decomposable by utterance
Evidence from true words, ASR words, prosody
Structured as offline decoding process on
dialogue
States DAs, ObsUtts, P(Obs)P(EiUi),
transP(U)
P(U)
Conditioning on speaker tags improves model
Bigram model adequate, useful

7
DA Classification -Words

Words
Combines notion of discourse markers and
collocations e.g. uh-huhBackchannel
Contrast true words, ASR 1-best, ASR n-best
Results
Best 71- true words, 65 ASR 1-best

8
DA Classification - Prosody

Features
Duration, pause, pitch, energy, rate, gender
Pitch accent, tone
Results
Decision trees 5 common classes
45.4 - baseline16.6
In HMM with DT likelihoods as P(EiUi)
49.7 (vs. 35 baseline)

9
DA Classification - All

Combine word and prosodic information
Consider case with ASR words and acoustics
P(Ai,Wi,FiUi) P(Ai,WiUi)P(FiUi)
Reweight for different accuracies
Slightly better than raw ASR

10
Integrated Classification

Focused analysis
Prosodically disambiguated classes
Statement/Question-Y/N and Agreement/Backchannel
Prosodic decision trees for agreement vs
backchannel
Disambiguated by duration and loudness
Substantial improvement for prosodywords
True words S/Q 85.9-gt 87.6 A/B 81.0-gt84.7
ASR words S/Q 75.4-gt79.8 A/B 78.2-gt81.7
More useful when recognition is iffy

11
Observations

DA classification can work on open domain
Exploits word model, DA context, prosody
Best results for prosodywords
Words are quite effective alone even ASR
Questions
Whole utterance models? more fine-grained
Longer structure, long term features

12
Automatic Metadata Annotation

What is structural metadata?
Why annotate?

13
What is Structural Metadata?

Issue Speech is messy
Sentence/Utterance boundaries not marked
Basic units for dialogue act, etc
Speech has disfluencies
Result Automatic transcripts hard to read
Structural metadata annotation
Mark utterance boundaries
Identify fillers, repairs

14
Metadata Details

Sentence-like units (SU)
Provide basic units for other processing
Not necessarily grammatical sentences
Distinguish full and incomplete SUs
Conversational fillers
Discourse markers, disfluencies um, uh, anyway
Edit disfluencies
Repetitions, repairs, restarts
Mark material that should be excluded from fluent
Interruption point (IP) where corrective starts

15
Annotation Architecture

2 step process
For each word, mark IP, SU, ISU, none bound
For region boundwords identify CF/ED
Post-process to remove insertions
Boundary detection decision trees
Prosodic features duration, pitch, amp, silence
Lexical features POS tags, word/POS tag
patterns, adjacent filler words

16
Boundary Detection - LM

Language model based boundaries
Hidden event language model
Trigram model with boundary tags
Combine with decision tree
Use LM value as feature in DT
Linear interpolation of DT LM probabilities
Jointly model with HMM

17
Edit and Filler Detection

Transformation-based learning
Baseline predictor, rule templates, objective fn
Classify with baseline
Use rule templates to generate rules to fix
errors
Add best rule to baseline
Training Supervised
Features Word, POS, word use, repetition,loc
Tag Filled pause, edit, marker, edit term

18
Evaluation

SU Best combine all feature types
None great
CF/ED Best features lexical match, IP
Overall SU detection relatively good
Better on reference than ASR
Most FP errors due to ASR errors
DM errors not due to ASR
Remainder of tasks problematic

19
SU Detection
Features SU-R SU-P ISU-R ISU-P IP-R IP-P
Prosody only 46.5 74.6 0 8.8 47.2
POS, Pattern,LM 77.3 79.6 30 53.3 64.4 77.4
Pros,POS, Pattern,LM 81.5 80.4 36.5 69.7 66.1 78.7
Allfrag 81.1 81.6 20.1 60.7 80.7 80.4

Write a Comment

User Comments (0)

About PowerShow.com

Automatic Cue-Based Dialogue Act Tagging - PowerPoint PPT Presentation

Automatic Cue-Based Dialogue Act Tagging

Identify dialogue acts in conversational speech. Spoken corpus: ... Reweight for different accuracies. Slightly better than raw ASR. Integrated Classification ... – PowerPoint PPT presentation