Recording Meetings with the CMU Meeting Recorder Architecture - PowerPoint PPT Presentation


PPT – Recording Meetings with the CMU Meeting Recorder Architecture PowerPoint presentation | free to download - id: 162179-MzZmZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Recording Meetings with the CMU Meeting Recorder Architecture


Events are (slowly) uploaded to a central server when there is network access ... Uploads files in a transparent background process ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 20
Provided by: satanjeev
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Recording Meetings with the CMU Meeting Recorder Architecture

Recording Meetings with the CMU Meeting Recorder
  • Satanjeev Banerjee, et al.
  • School of Computer Science
  • Carnegie Mellon University

  • End goal Build conversational agents
  • That understand meetings
  • E.g. Identify action items
  • Make contributions to meetings
  • E.g. Confirm details of action items
  • Part of Project CALO Cognitive Agent that Learns
    and Organizes
  • First goal Create corpus of human meetings
  • Capture data that we expect agents to use
  • E.g. Speech, video, whiteboard markings, etc.

Desirable Properties of the Recorder
  • Need to record meetings anywhere
  • Emphasis on instrumenting user, not room
  • Assume low network bandwidth
  • Should still be able to record in the extreme
    situation where there is no network access!
  • Should be easy to add new data streams
  • Easy low time to incorporate new stream
  • Should be able to support major OS-es

The Recorder Architecture
  • Information stream is discretized into events
  • Either a sequence of events, e.g. utterances
  • Or one long event, e.g. video data
  • Each event is given start/end time stamps
  • Coincide for instantaneous events, e.g. keystroke
  • Events are stored on local disks
  • Laptops, shuttle PCs, etc.
  • Events are (slowly) uploaded to a central server
    when there is network access

Event Identification and Logging
  • Each recorded event has the following identifying
    information associated with it
  • Start and stop time stamps
  • Name of the meeting and the user
  • Modality (speech, video, hand-writing, etc.)
  • After recording an event, its identification
    information is sent to a logging server
  • Server creates a list of all the events in a
  • Good for book-keeping (but not essential)

Architecture of Meeting Recorder
DATA_BLOCK session OTTER user
arudnicky datatype SPEECH file
\\spot\data\u1.raw Start 200309171827.600 End
Synchronizing the Time Stamps
  • All event time stamps must be synchronized
  • We use the Simplified Network Time Protocol
  • Query a central NTP server for the time
  • Use the reply and the round-trip time to estimate
    time difference between local machine and server
  • Use this to create server-time time stamps
  • Rough experiments reveal 10ms variance
  • Caveat Experiments done on high speed network
  • What if there is no network access?

Aggregating the Data
  • Upon network access availability, data is
    transferred from all sites to a central location
  • Current recording sites CMU and Stanford
  • Implemented a cross-platform version of the MS
    Background Intelligent Transfer Service
  • Uploads files in a transparent background process
  • Throttles bandwidth use as users activity goes
  • Pauses if network connection is lost
  • Resumes once network access is restored

Data Collection Process (proposed)
Independent cross-site collection
Background data transmission
Capturing Close-Talking Speech
  • Implemented Meeting Recorder Cross Platform
    (MRCP) to record speech and notes
  • Speech recorded using head-mounted mics
  • 11.025 kHz sampling rate used for portability
  • End pointing done using CMU Sphinx 3 ASR
  • Each end-pointed utterance is an event
  • Utterance is recorded to local disk (wav format)
  • Time stamps are generated using Simple NTP
  • Utterances identifying information is sent to
    logging server, utterance is queued for upload

Capturing Typed Notes
  • Users type notes in clients note-taking area
  • Snapshots of notes are taken at each carriage
  • Each snapshot is an event
  • Each snapshot is saved to disk, time-stamped,
    logged, and queued for upload
  • Demonstration of MRCP

More Details about MRCP
  • Implemented using cross platform libraries
  • wxWidgets for GUI, file access, networking
  • PortAudio for audio libraries
  • Currently compiles on Windows, Macintosh OS-X and
    Linux operating systems
  • Windows version distributed to other Project CALO
  • Macintosh and Linux versions in beta-testing
  • WinCE version in development

Capturing Whiteboard Pen Strokes
  • We use Mimio to capture whiteboard pen strokes
  • Strokes consist of all the x-y coordinates
    between pen-down and pen-up
  • Each stroke is an event. It is recorded,
    time-stamped, logged, queued for upload.

Capturing Power Point Slides Information
  • We use MSs PowerPoint API to capture slide
    change timing information, and slide contents
  • Events slide changes
  • Event data content of the new slide
  • Content is in the form of all the text, and all
    the shapes on the slide
  • Events are instantaneous
  • Start and stop time stamps coincide
  • Events are processed as before

Capturing Panoramic Video
  • We capture panoramic video using a 4-camera CAMEO
  • Developed by the Physical Awareness group at CMU
  • Video recording done in MPEG-4 format
  • One long event is produced and uploaded

Current Status of Data Collection
  • Recorded meetings vary widely in size…
  • From 2 to 10 person meetings
  • …in meeting type
  • Scheduling meetings, presentations, brain storms
  • …in content
  • Speech group meetings, dialog group meetings,
    physical awareness group meetings
  • Currently have a total of more than 11,000
    utterances (including cross talk)

Using the Data Some Initial Research
  • Question Can we detect the state of a meeting,
    and the roles of participants from simple speech
  • Introduced a taxonomy of meeting states and
    participant roles

Detection Methods and Initial Results
  • Used Anvil to hand annotate 45 minutes of
    meeting video with states and roles
  • Trained decision tree classifier from 30 minutes
    of data
  • Input features
  • speakers, lengths of utterances, pauses and
    interruptions within a short history of the
  • Initial results About 50 detection accuracy on
    separate 15 minutes of test data

  • Thanks to DARPA grant NBCH-D-02-0010