Recording Meetings with the CMU Meeting Recorder Architecture - PowerPoint PPT Presentation

Loading...

PPT – Recording Meetings with the CMU Meeting Recorder Architecture PowerPoint presentation | free to download - id: 162179-MzZmZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Recording Meetings with the CMU Meeting Recorder Architecture

Description:

Events are (slowly) uploaded to a central server when there is network access ... Uploads files in a transparent background process ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 20
Provided by: satanjeev
Learn more at: http://www.cs.cmu.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Recording Meetings with the CMU Meeting Recorder Architecture


1
Recording Meetings with the CMU Meeting Recorder
Architecture
  • Satanjeev Banerjee, et al.
  • School of Computer Science
  • Carnegie Mellon University

2
Goals
  • End goal Build conversational agents
  • That understand meetings
  • E.g. Identify action items
  • Make contributions to meetings
  • E.g. Confirm details of action items
  • Part of Project CALO Cognitive Agent that Learns
    and Organizes
  • First goal Create corpus of human meetings
  • Capture data that we expect agents to use
  • E.g. Speech, video, whiteboard markings, etc.

3
Desirable Properties of the Recorder
  • Need to record meetings anywhere
  • Emphasis on instrumenting user, not room
  • Assume low network bandwidth
  • Should still be able to record in the extreme
    situation where there is no network access!
  • Should be easy to add new data streams
  • Easy low time to incorporate new stream
  • Should be able to support major OS-es

4
The Recorder Architecture
  • Information stream is discretized into events
  • Either a sequence of events, e.g. utterances
  • Or one long event, e.g. video data
  • Each event is given start/end time stamps
  • Coincide for instantaneous events, e.g. keystroke
  • Events are stored on local disks
  • Laptops, shuttle PCs, etc.
  • Events are (slowly) uploaded to a central server
    when there is network access

5
Event Identification and Logging
  • Each recorded event has the following identifying
    information associated with it
  • Start and stop time stamps
  • Name of the meeting and the user
  • Modality (speech, video, hand-writing, etc.)
  • After recording an event, its identification
    information is sent to a logging server
  • Server creates a list of all the events in a
    meeting
  • Good for book-keeping (but not essential)

6
Architecture of Meeting Recorder
DATA_BLOCK session OTTER user
arudnicky datatype SPEECH file
\\spot\data\u1.raw Start 200309171827.600 End
200309171835.357
master
7
Synchronizing the Time Stamps
  • All event time stamps must be synchronized
  • We use the Simplified Network Time Protocol
  • Query a central NTP server for the time
  • Use the reply and the round-trip time to estimate
    time difference between local machine and server
  • Use this to create server-time time stamps
  • Rough experiments reveal 10ms variance
  • Caveat Experiments done on high speed network
  • What if there is no network access?

8
Aggregating the Data
  • Upon network access availability, data is
    transferred from all sites to a central location
  • Current recording sites CMU and Stanford
  • Implemented a cross-platform version of the MS
    Background Intelligent Transfer Service
  • Uploads files in a transparent background process
  • Throttles bandwidth use as users activity goes
    up
  • Pauses if network connection is lost
  • Resumes once network access is restored

9
Data Collection Process (proposed)
preparation
Independent cross-site collection
integration
Background data transmission
research
10
Capturing Close-Talking Speech
  • Implemented Meeting Recorder Cross Platform
    (MRCP) to record speech and notes
  • Speech recorded using head-mounted mics
  • 11.025 kHz sampling rate used for portability
  • End pointing done using CMU Sphinx 3 ASR
  • Each end-pointed utterance is an event
  • Utterance is recorded to local disk (wav format)
  • Time stamps are generated using Simple NTP
  • Utterances identifying information is sent to
    logging server, utterance is queued for upload

11
Capturing Typed Notes
  • Users type notes in clients note-taking area
  • Snapshots of notes are taken at each carriage
    return
  • Each snapshot is an event
  • Each snapshot is saved to disk, time-stamped,
    logged, and queued for upload
  • Demonstration of MRCP

12
More Details about MRCP
  • Implemented using cross platform libraries
  • wxWidgets for GUI, file access, networking
  • PortAudio for audio libraries
  • Currently compiles on Windows, Macintosh OS-X and
    Linux operating systems
  • Windows version distributed to other Project CALO
    sites
  • Macintosh and Linux versions in beta-testing
  • WinCE version in development

13
Capturing Whiteboard Pen Strokes
  • We use Mimio to capture whiteboard pen strokes
  • Strokes consist of all the x-y coordinates
    between pen-down and pen-up
  • Each stroke is an event. It is recorded,
    time-stamped, logged, queued for upload.

14
Capturing Power Point Slides Information
  • We use MSs PowerPoint API to capture slide
    change timing information, and slide contents
  • Events slide changes
  • Event data content of the new slide
  • Content is in the form of all the text, and all
    the shapes on the slide
  • Events are instantaneous
  • Start and stop time stamps coincide
  • Events are processed as before

15
Capturing Panoramic Video
  • We capture panoramic video using a 4-camera CAMEO
    device
  • Developed by the Physical Awareness group at CMU
  • Video recording done in MPEG-4 format
  • One long event is produced and uploaded

16
Current Status of Data Collection
  • Recorded meetings vary widely in size…
  • From 2 to 10 person meetings
  • …in meeting type
  • Scheduling meetings, presentations, brain storms
  • …in content
  • Speech group meetings, dialog group meetings,
    physical awareness group meetings
  • Currently have a total of more than 11,000
    utterances (including cross talk)

17
Using the Data Some Initial Research
  • Question Can we detect the state of a meeting,
    and the roles of participants from simple speech
    data?
  • Introduced a taxonomy of meeting states and
    participant roles

18
Detection Methods and Initial Results
  • Used Anvil to hand annotate 45 minutes of
    meeting video with states and roles
  • Trained decision tree classifier from 30 minutes
    of data
  • Input features
  • speakers, lengths of utterances, pauses and
    interruptions within a short history of the
    meeting
  • Initial results About 50 detection accuracy on
    separate 15 minutes of test data

19
Questions?
  • Thanks to DARPA grant NBCH-D-02-0010
About PowerShow.com