Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films

Description:

Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films Robert Turetsky rob_at_ee.columbia.edu Columbia Univ / Philips Research – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 20
Provided by: rxt7
Category:

less

Transcript and Presenter's Notes

Title: Screenplay Alignment for Closed-System Speaker Identification and Analysis of Feature Films


1
Screenplay Alignment for Closed-System Speaker
Identification and Analysis of Feature Films
  • Robert Turetsky
  • rob_at_ee.columbia.edu
  • Columbia Univ / Philips Research
  • ICME 2004 - June 29, 2004

2
Talk Organization
  • Motivation for Content-based Analysis
  • Unsupervised Speaker ID on films
  • Concerning Screenplays
  • Screenplay Alignment for Label Generation
  • Experiments, Evaluation, Future Work

3
Technology Impacts Film
Production
Distribution
Consumption
4
Content-Based Analysis Motivation
  • With so much content out there, users suffer from
    information overload
  • How can they find exactly what they want? How
    can they find new things they like?
  • Content-based analysis attempts to find what is
    important and what is unique

5
Voice Fingerprinting forSpeaker Identification
  • Extremely difficult on film audio!
  • Many different emotional contexts
  • Different acoustic environments (room tone)
  • Noise assumptions do not hold
  • Sound design/FX leads to burst noises
  • Noise is correlated with speech (soundtrack)
  • SNR can be low with soundtrack
  • Very little published work on film audio!

6
Deliverable
  • Closed-system speaker identification on any
    main character (gt 5 of dialogue)
  • Completely self-referential, requires no user
    intervention
  • Takes advantage of supervised learning methods
  • Can be combined with face ID for robust character
    detection

7
The Screenplay
  • Used as a map of the movie for every member of
    the cast and crew
  • Contains description of scenes, characters,
    costumes, action and dialogue
  • Usually formatted very regularly
  • Available for thousands of movies
  • An untapped resource in the automatic film
    analysis community

Example Screenplay ?
8
(No Transcript)
9
Challenges with Screenplays
  • No timecode associated with events
  • Lines/scenes are often cut, shuffled or added
  • Formatting is a guideline not a standard
  • Proposed Solution
  • Parse the screenplay into data structure
  • Align screenplay with timestamped subtitles
  • Use timestamped dialogues as ground truth for
    multimodal statistical models of salient objects
    within the film

10
Character ID Architecture
Audio Features
Statistical Model
Video signal
Closed Captions
Alignment
Character ID
Screenplay
Actor Identification
IMDb.com
11
Screenplay parsing
  • SCENE . SCENE DIAL_START SLUG
    TRANSITION
  • DIAL_START \t ltCHAR NAMEgt (V.O.O.S.)? \n
  • \t DIALOGUE PAREN
  • DIALOGUE \t .? \n\n
  • PAREN \t (.?)
  • TRANSITION \t ltTRANS NAMEgt
  • SLUG
  • ltSCENE gt?. ltINT/EXTgtltERNAL.gt? - ltLOCgt lt- TIMEgt?

12
Closed Captions Capture
  • Subtitles stored on DVD as MPEG movie overlay
  • SubRip 1.17.1 performs video OCR, w/timestamp
  • Manual Training Period per font
  • Confusion I and l
  • Alternative Closed captions from UDF

13
Alignment Similarity Matrix
  • Binary array compare each word in CC/screenplay
  • Dialogues that align form diagonal lines
  • Noise common or repeated words

14
Screenplay vs. CC Distance Matrix
  • Dynamic programming Find strong diagonal
    segments
  • Median filter on slope to identify properly
    aligned long dialogues and discard spurious short
    matches

15
Screenplay Alignment Result
  • Obtain time-stamped dialogues from closed-caption
    times
  • Identify speaker by screenplay label

Screenplay Alignment, Wall Street
16
Analysis of Label Accuracy
CRAIG LESTER LOTTE MALKO MAXINE OTHER
CRAIG 82 0 1 1 0 11
LESTER 0 41 0 0 0 0
LOTTE 0 0 40 0 0 2
MALKO 0 0 0 25 0 2
MAXINE 0 0 1 0 71 4
  • Being John Malkovich 311/334 (93.1) of closed
    captions successfully labeled

17
Coverage The need for Statistical Methods
Movie CCs Accuracy Coverage
Being John Malkovich 1436 311 (93) 334 (23)
L.A. Confidential 1666 522 (95) 548 (33)
Wall Street 2342 850 (89) 954 (41)
Magnolia 2672 747 (89) 843(32)
  • For each film alignment accuracy is 90, but
    only 30 of dialogue is aligned!
  • Need statistical methods to learn from labeled
    examples to label all dialogues

18
Speaker Identification
  • Alignment allows unsupervised speaker ID using
    (much easier!) supervised classifier
  • Best performing feature .5 sec (40 frames) of
    MFCC
  • 8 component GMM per main character, trained using
    EM
  • Winner-take-all likelihood for final voting

CRAIG LESTER LOTTE MALKO MAXINE
CRAIG 57 17 21 27 28
LESTER 5 77 4 15 1
LOTTE 8 0 49 10 13
MALKO 7 2 8 31 4
MAXINE 23 4 19 17 55
19
Summary and Conclusions
  • The screenplay contains a wealth of detail on
    film semantics and the intentions of the
    filmmakers
  • The screenplay can be time-stamped and mined for
    salient objects (e.g. characters) and story
    descriptors
  • Incomplete alignment can be used to create models
    of objects for further analysis
Write a Comment
User Comments (0)
About PowerShow.com