Recognizing Structure: Sentence and Topic Segmentation - PowerPoint PPT Presentation

About This Presentation

Title:

Recognizing Structure: Sentence and Topic Segmentation

Description:

Types of Discourse Structure in Spoken Corpora. Domain independent ... Example: reviews of Ipod nano. Cue phrases: now, well, first. Pronominal reference ... – PowerPoint PPT presentation

Number of Views:567

Avg rating:3.0/5.0

Slides: 35

Provided by: juliahir

Learn more at: http://www1.cs.columbia.edu

Category:

more less

Transcript and Presenter's Notes

Title: Recognizing Structure: Sentence and Topic Segmentation

1
Recognizing Structure Sentence and Topic
Segmentation

Julia Hirschberg
CS 4706

2
Types of Discourse Structure in Spoken Corpora

Domain independent
Sentence/utterance boundaries
Speaker turn segmentation
Topic/story segmentation
Domain dependent
Broadcast news
Meetings
Telephone conversations

3
Today

Theoretical studies of discourse structure
Text-based approaches and cues
Speech-based approaches and cues
Practical goals for speech segmentation and
state-of-the-art

4
Hierarchical Structure in Discourse?

Welcome to word processing.
Thats using a computer to type letters and
reports. Make a typo?
No problem.
Just back up, type over the mistake, and its
gone.
?And, it eliminates retyping.
?And, it eliminates retyping.

5
Structures of Discourse Structure (Grosz Sidner
86)

Leading theory of discourse structure
Three components
Linguistic structure
What is said
Assumption divided into appropriate units of
analysis (Discourse Segments)
Intentional structure
How are segments structured into topics,
subtopics
Relations satisfaction-precedence, dominance
Attentional structure

6
How are these relations recognized in a discourse?

Linguistic markers
tense and aspect
cue phrases now, well
Inference of S intentions
Inference from task structure
Intonational Information

7
Structural Information in Text

Example reviews of Ipod nano
Cue phrases now, well, first
Pronominal reference
Orthography and formatting -- in text
Lexical information (Hearst 94, Reynar 98,
Beeferman et al 99)
Domain dependent
Domain independent

8
Methods of Text Segmentation

Lexical cohesion methods vs. multiple source
Vocabulary similarity indicates topic cohesion
Intuition from Halliday Hasan 76
Features to capture
Stem repetition
Entity repetition
Word frequency
Context vectors
Semantic similarity
Word distance
Methods
Sliding window

Lexical chains
Clustering
Combine lexical cohesion with other cues
Features
Cue phrases
Reference (e.g. pronouns)
Syntactic features
Methods
Machine Learning from labeled corpora

10
Choi 2000 An Example

Implements leading methods and compares new
algorithm to them on corpus of 700 concatenated
documents
Comparison algorithms
Baselines
No boundaries
All boundaries
Regular partition
Random of random partitions
Actual of random partitions

Textiling Algorithm (Hearst 94)
DotPlot algorithms (Reynar 98)
Segmenter (Kan et al 98)
Choi 00 proposal
Cosine similarity measure on stems
Same sentence 1 no overlap 0
Similarity matrix ? rank matrix
Replace each cell with of lower-valued neighbor
cells, normalized by of neighboring cells
How likely is this sentence to be a boundary,
compared to other sentences?
Minimize effect of outliers

Divisive clustering based on
D(n) sum of rank values (sI,j) of segment n/
inside area of segment n (j-i1) for i,j the
sentences at the beginning and end of segment n
Homogeneous segments have large rank values
within a small area of the matrix
Keep dividing the corpus
until ?D(n) D(n) - D(n-1) shows little change
Chois algorithm performs best (9-12 error)

13
Acoustic and Prosodic Cues to Segmentation

Intuition
Speakers vary acoustic and prosodic cues to
convey variation in discourse structure
Systematic? In read or spontaneous speech?
Evidence
Observations from recorded corpora
Laboratory experiments
Machine learning of discourse structure from
acoustic/prosodic features

14
Spoken Cues to Discourse/Topic Structure

Pitch range
Lehiste 75, Brown et al 83, Silverman 86,
Avesani Vayra 88, Ayers 92, Swerts et al 92,
Grosz Hirschberg92, Swerts Ostendorf 95,
Hirschberg Nakatani 96
Preceding pause
Lehiste 79, Chafe 80, Brown et al 83,
Silverman 86, Woodbury 87, Avesani Vayra 88,
Grosz Hirschberg92, Passoneau Litman 93,
Hirschberg Nakatani 96

Rate
Butterworth 75, Lehiste 80, Grosz
Hirschberg92, Hirschberg Nakatani 96
Amplitude
Brown et al 83, Grosz Hirschberg92,
Hirschberg Nakatani 96
Contour
Brown et al 83, Woodbury 87, Swerts et al 92

16
A Practical Problem Finding Sentence and
Topic/Story Boundaries in ASR Transcripts

Motivation
Finding sentences critical to further
syntactic/semantic analysis
Topic/story id important to identify common
regions for q/a, extraction
Features
Lexical cues
Domain dependent
Sensitive to ASR performance
Acoustic/prosodic cues
Domain independent
Sensitive to speaker identity
Statistical, Machine Learning approaches with
large segmented corpora (e.g. Broadcast News)

17
ASR Transcription

aides tonight in boston in depth the truth squad
for special series until election day tonight the
truth about the budget surplus of the candidates
are promising the two international flash points
getting worse while the middle east and a new
power play by milosevic and a lifelong a family
tries to say one child life by having another
amazing breakthrough the u s was was told local
own boss good evening uh from the university of
massachusetts in boston the site of the widely
anticipated first of eight between vice president
al gore and governor george w bush with the
election now just five weeks away this is the
beginning of a sprint to the finish and a strong
start here tonight is important this is the stage
for the two candidates will appear before a
national television audience taking questions
from jim lehrer of p b s n b cs david gregory is
here with governor bush claire shipman is
covering the vice president claire you begin
tonight please

18
Speaker segmentation (Diarization)

Speaker 0 - aides tonight in boston in depth the
truth squad for special series until election day
tonight the truth about the budget surplus of the
candidates are promising the two international
flash points getting worse while the middle east
and a new power play by milosevic and a lifelong
a family tries to say one child life by having
another amazing breakthrough the u s was was told
local own boss good evening uh from the
university of massachusetts in boston
Speaker 1 - the site of the widely anticipated
first of eight between vice president al gore and
governor george w bush with the election now
just five weeks away this is the beginning of a
sprint to the finish and a strong start here
tonight is important this is the stage for the
two candidates will appear before a national
television audience taking questions from jim
lehrer of p b s n b cs david gregory is here
with governor bush claire shipman is covering the
vice president claire you begin tonight please

19
Sentence detection, punctuation

Speaker Anchor - Aides tonight in boston. In
depth the truth squad for special series until
election day. Tonight the truth about the budget
surplus of the candidates are promising. The two
international flash points getting worse. While
the middle east. And a new power play by
milosevic and a lifelong a family tries to say
one child life by having another amazing
breakthrough the u. s. was was told local own
boss. Good evening uh from the university of
massachusetts in boston.
Speaker Reporter - The site of the widely
anticipated first of eight between vice president
al gore and governor george w. bush. With the
election now just five weeks away. This is the
beginning of a sprint to the finish. And a strong
start here tonight is important. This is the
stage for the two candidates will appear before a
national television audience taking questions
from jim lehrer of p. b. s. n. b. c.'s david
gregory is here with governor bush. Claire
shipman is covering the vice president claire you
begin tonight please.

20
Story boundary detection

Speaker Anchor - Aides tonight in boston. In
depth the truth squad for special series until
election day. Tonight the truth about the budget
surplus of the candidates are promising. The two
international flash points getting worse. While
the middle east. And a new power play by
milosevic and a lifelong a family tries to say
one child life by having another amazing
breakthrough the u. s. was was told local own
boss. Good evening uh from the university of
massachusetts in boston.
Speaker Reporter - The site of the widely
anticipated first of eight between vice president
al gore and governor george w. bush. With the
election now just five weeks away. This is the
beginning of a sprint to the finish. And a strong
start here tonight is important. This is the
stage for the two candidates will appear before a
national television audience taking questions
from jim lehrer of p. b. s. n. b. c.'s david
gregory is here with governor bush. Claire
shipman is covering the vice president claire you
begin tonight please.

21
Prosodic Cues (Shriberg et al 00)

Text-based segmentation is fineif you have
reliable text
Could prosodic cues perform as well or better at
sentence and topic segmentation in ASR
transcripts? more robust? more general?
Goal identify sentence and topic boundaries at
ASR-defined word boundaries
CART decision trees and LM
HMM combined prosodic and LM results

Features --for each potential boundary location
Pause at boundary (raw and normalized by speaker)
Pause at word before boundary (is this a new
turn or part of continuous speech segment?)
Phone and rhyme duration (normalized by inherent
duration) (phrase-final lengthening?)
F0 (smoothed and stylized) reset, range
(topline, baseline), slope and continuity
Voice quality (halving/doubling estimates as
correlates of creak or glottalization)
Speaker change, time from start of turn, turns
in conversation and gender
Trained/tested on Switchboard and Broadcast News

23
Sentence segmentation results

Prosodic only model
Better than LM for BN
Worse (on hand transcription) and same (for ASR
transcript) on SB
Slightly improves LM on SB
Useful features for BN
Pause at boundary, turn change/no turn change, f0
diff across boundary, rhyme duration
Useful features for SB
Phone/rhyme duration before boundary, pause at
boundary, turn/no turn, pause at preceding word
boundary, time in turn

24
Topic segmentation results (BN only)

Useful features
Pause at boundary, f0 range, turn/no turn,
gender, time in turn
Prosody alone better than LM
Combined model improves significantly

25
Recent Work on Story Segmentation (Rosenberg et
al 07)

Story Segmentation goal Divide each show into
homogenous regions, each about a single topic
Task focussed Q/A
Issue what unit of anlysis should we use in
assessing potential boundaries?

26
TDT-4 Corpus

English 312.5 hours, 250 broadcasts, 6 shows
Arabic 88.5 hours, 109 broadcasts, 2 shows
Mandarin 109 hours, 134 broadcasts, 3 shows
Manually annotated story boundaries
ASR Hypotheses
Speaker Diarization Hypotheses

27
Approach

Identify set of segments which define
Unit of analysis
Candidate boundaries
Classify each candidate boundary based on
features extracted from segments
C4.5 Decision Tree
Model each show-type separately
E.g. CNN Headline News and ABC World News
Tonight have distinct models
Evaluate using WindowDiff with k100

28
Segment Boundary Modeling Features

Acoustic
Pitch Intensity
speaker normalized
min, mean, max, stdev, slope
Speaking Rate
vowels/sec, voiced frames/sec
Final Vowel, Rhyme Length
Pause Length
Lexical
TextTiling scores
LCSeg scores
Story beginning and ending keywords
Structural
Position in show
Speaker participation
First or last speaker turn?

29
Input Segmentations

ASR Word boundaries
No segmentation baseline
Hypothesized Sentence Units
Boundaries with 0.5, 0.3 and 0.1 confidence
thresholds
Pause-based Segmentation
Boundaries at pauses over 500ms and 250ms
Hypothesized Intonational Phrases

30
Hypothesizing Intonational Phrases

30 minutes manually annotated ASR BN from
reserved TDT-4 CNN show.
Phrase was marked if a phrase boundary occurred
since the previous word boundary.
C4.5 Decision Tree
Pitch, Energy and Duration Features
Normalized by hypothesized speaker id and
surrounding context
66.5 F-Measure (p.683, r.647)

31
Story Segmentation Results
32
Input Segmentation Statistics
Exact Story Boundary Coverage (pct.)
Mean Distance to Nearest True Boundary (words)
Segment to True Boundary Ratio
33
Conclusions and Future Work