Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films

About This Presentation

Title:

Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films

Description:

Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films Robert Turetsky rob_at_ee.columbia.edu Columbia Univ / Philips Research – PowerPoint PPT presentation

Number of Views:95

Avg rating:3.0/5.0

Slides: 20

Provided by: rxt7

Category:

more less

Transcript and Presenter's Notes

Title: Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films

1
Screenplay Alignment for Closed-System Speaker
Identification and Analysis of Feature Films

Robert Turetsky
rob_at_ee.columbia.edu
Columbia Univ / Philips Research
ICME 2004 - June 29, 2004

2
Talk Organization

Motivation for Content-based Analysis
Unsupervised Speaker ID on films
Concerning Screenplays
Screenplay Alignment for Label Generation
Experiments, Evaluation, Future Work

3
Technology Impacts Film
Production
Distribution
Consumption
4
Content-Based Analysis Motivation

With so much content out there, users suffer from
information overload
How can they find exactly what they want? How
can they find new things they like?
Content-based analysis attempts to find what is
important and what is unique

5
Voice Fingerprinting forSpeaker Identification

Extremely difficult on film audio!
Many different emotional contexts
Different acoustic environments (room tone)
Noise assumptions do not hold
Sound design/FX leads to burst noises
Noise is correlated with speech (soundtrack)
SNR can be low with soundtrack
Very little published work on film audio!

6
Deliverable

Closed-system speaker identification on any
main character (gt 5 of dialogue)
Completely self-referential, requires no user
intervention
Takes advantage of supervised learning methods
Can be combined with face ID for robust character
detection

7
The Screenplay

Used as a map of the movie for every member of
the cast and crew
Contains description of scenes, characters,
costumes, action and dialogue
Usually formatted very regularly
Available for thousands of movies
An untapped resource in the automatic film
analysis community

Example Screenplay ?
8
(No Transcript)
9
Challenges with Screenplays

No timecode associated with events
Lines/scenes are often cut, shuffled or added
Formatting is a guideline not a standard
Proposed Solution
Parse the screenplay into data structure
Align screenplay with timestamped subtitles
Use timestamped dialogues as ground truth for
multimodal statistical models of salient objects
within the film

10
Character ID Architecture
Audio Features
Statistical Model
Video signal
Closed Captions
Alignment
Character ID
Screenplay
Actor Identification
IMDb.com
11
Screenplay parsing

SCENE . SCENE DIAL_START SLUG
TRANSITION
DIAL_START \t ltCHAR NAMEgt (V.O.O.S.)? \n
\t DIALOGUE PAREN
DIALOGUE \t .? \n\n
PAREN \t (.?)
TRANSITION \t ltTRANS NAMEgt
SLUG
ltSCENE gt?. ltINT/EXTgtltERNAL.gt? - ltLOCgt lt- TIMEgt?

12
Closed Captions Capture

Subtitles stored on DVD as MPEG movie overlay
SubRip 1.17.1 performs video OCR, w/timestamp
Manual Training Period per font
Confusion I and l
Alternative Closed captions from UDF

13
Alignment Similarity Matrix

Binary array compare each word in CC/screenplay
Dialogues that align form diagonal lines
Noise common or repeated words

14
Screenplay vs. CC Distance Matrix

Dynamic programming Find strong diagonal
segments
Median filter on slope to identify properly
aligned long dialogues and discard spurious short
matches

15
Screenplay Alignment Result

Obtain time-stamped dialogues from closed-caption
times
Identify speaker by screenplay label

Screenplay Alignment, Wall Street
16
Analysis of Label Accuracy
CRAIG LESTER LOTTE MALKO MAXINE OTHER
CRAIG 82 0 1 1 0 11
LESTER 0 41 0 0 0 0
LOTTE 0 0 40 0 0 2
MALKO 0 0 0 25 0 2
MAXINE 0 0 1 0 71 4

Being John Malkovich 311/334 (93.1) of closed
captions successfully labeled

17
Coverage The need for Statistical Methods
Movie CCs Accuracy Coverage
Being John Malkovich 1436 311 (93) 334 (23)
L.A. Confidential 1666 522 (95) 548 (33)
Wall Street 2342 850 (89) 954 (41)
Magnolia 2672 747 (89) 843(32)

For each film alignment accuracy is 90, but
only 30 of dialogue is aligned!
Need statistical methods to learn from labeled
examples to label all dialogues

18
Speaker Identification

Alignment allows unsupervised speaker ID using
(much easier!) supervised classifier
Best performing feature .5 sec (40 frames) of
MFCC
8 component GMM per main character, trained using
EM
Winner-take-all likelihood for final voting

CRAIG LESTER LOTTE MALKO MAXINE
CRAIG 57 17 21 27 28
LESTER 5 77 4 15 1
LOTTE 8 0 49 10 13
MALKO 7 2 8 31 4
MAXINE 23 4 19 17 55
19
Summary and Conclusions

The screenplay contains a wealth of detail on
film semantics and the intentions of the
filmmakers
The screenplay can be time-stamped and mined for
salient objects (e.g. characters) and story
descriptors
Incomplete alignment can be used to create models
of objects for further analysis

Write a Comment

User Comments (0)