Final Year Project 20032004 LYU0302 PVCAIS Personal Video Conference Archives Indexing System - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Final Year Project 20032004 LYU0302 PVCAIS Personal Video Conference Archives Indexing System

Description:

Department of Computer Science and Engineering, CUHK. Final Year Project 2003/2004 ... Extracts channel data and forms media files ... – PowerPoint PPT presentation

Number of Views:271
Avg rating:3.0/5.0
Slides: 31
Provided by: Test272
Category:

less

Transcript and Presenter's Notes

Title: Final Year Project 20032004 LYU0302 PVCAIS Personal Video Conference Archives Indexing System


1
Final Year Project 2003/2004LYU0302PVCAIS
Personal Video Conference Archives Indexing
System
  • Supervisor Prof Michael Lyu
  • Presented by Lewis Ng, Philip Chan
  • 25 November 2003

2
Outline
  • Introduction
  • Motivation
  • Architecture of PVCAIS
  • - Media Acquisition Module
  • - Archive Indexing Module
  • - Videoconference Accessing Module
  • Implementation in First Term
  • Future Work
  • Conclusion

3
Introduction
  • PVCAIS stands for
  • Personal Video Conference Archives Indexing
    System
  • A system that provides the convenient searching
    and browsing support for videoconferencing users
    on past videoconference archives

4
Introduction
  • What is video conference?
  • A real-time communication technology which
    combines different media
  • audio, video, text chat, file transfer,
    whiteboard and shared communications
  • - More precisely is multimedia conference

5
Motivation
  • Videoconference is becoming popular in
  • education, business, personal communication
  • Participants wish to keep videoconference
    archives for later references
  • Normal video and audio files are neither
    searchable nor helpful to recall their contents
  • Indexing of videoconference archives has not been
    investigated till now

6
Architecture of PVCAIS
  • Consists of 3 modules
  • - Media Acquisition Module
  • - Archive Indexing Module
  • - Videoconference Accessing Module

7
Architecture of PVCAIS
Archive Indexing
Media Acquisition
Videoconference Accessing
8
Media Acquisition
  • Extracts channel data and forms media files
  • Videoconferencing physically contains 4 types of
    channels Audio, Video, Data and Control
  • Audio and Video channels transmit incoming/
    outgoing audio and video information
  • Data channel carries information for user
    application such as Text Chat, Whiteboard and
    File Transfer
  • Control channel transmits system control
    information such as Member Information

9
Media Acquisition
  • Video-in and Video-out channel
  • Reduce redundancy just store key-frames
  • Detect scene change in real time
  • Each key frame picture is stored with a timestamp

10
Media Acquisition
  • Audio-in and Audio-out channel
  • mixed into one stream after videoconference
  • will be used for Speech Recognition
  • Text Chat channel
  • sender, receiver
  • message
  • store with timestamp

11
Media Acquisition
  • Whiteboard channel
  • Consists of a text-based index file and a number
    of snapshot pictures
  • Index file records timestamp for each whiteboard
    update event and the path of the corresponding
    snapshot picture
  • Update of this channel happens in a period of
    time -gt need to detect when update begins and
    ends by monitoring data transfer in this channel

12
Media Acquisition
  • File Transfer channel
  • Will have a copy of the sent/received files to
    the directory of archive and an index file
  • Index file includes senders and recipients user
    names and the path of the files
  • Control channel
  • Contains timestamp and information of each event
    such as member joined and member left

13
Media Acquisition

14
Archive Indexing
  • 7 raw files are extracted in Media Acquisition
    Module
  • Need to implement some indexing functions to
    retrieve more informationThese includes Face
    Detection, Face Recognition, Speech Recognition,
    OCR, Time-based Text Merging, Keyword Selection,
    Title Generation

15
Archive Indexing
  • Face Detection
  • - distinguish between Slides and Faces
  • - if face is detected, find out the face region

Face Detection
Slide
Face Detection
Face
16
Archive Indexing
  • Face Recognition
  • - Associate human faces in Video-in with name
  • - Need to keep a face base
  • - If no match in the face base, ask remote user
    to enter the name

17
Archive Indexing
  • Speech Recognition
  • - Generate speech script from audio archive-
    Speech of a videoconferencing contains the most
    information
  • - Can use commercial library Microsoft SAPI,
    IBM Via Voice
  • OCR
  • - Take the slide archive as input and recognizes
    text from them
  • - Need to identify and localize text on the
    complex background

18
Archive Indexing
  • Time-based Text Merging
  • - Merge the Speech transcript, Chat script,
    Whiteboard script and slide text archive to Text
    source according to their timestamp
  • Keyword Selection
  • - takes the Text source as input
  • - generates keyword for the videoconference

19
Archive Indexing
  • Title Generation
  • - takes the Text source as input
  • - automatically generates a title for the
    videoconference
  • Generate XML index file
  • - integrates all the archives
  • - stores all the related files of a
    videoconference into a single directory

20
Videoconference Accessing
  • Provides an interface for user to manage, search
    and review all indexed conference.
  • Allows user to modify the content of a
    conference, such as editing title or keywords, or
    delete a conference.
  • Allows user to search for a conference by
    different criteria, such as member name or
    keyword.
  • Allows user to review a conference by playing
    back the audio or the key frames.

21
Implementation
  • NetMeeting 3.0
  • A Windows feature that provide Internet
    conferencing function.
  • Support video, audio and data conferencing
    including application sharing, chat, whiteboard
    and file transfer.
  • Other features include remote desktop sharing.

22
Implementation
  • NetMeeting 3.0 SDK
  • An extension of NetMeeting, provides an interface
    for programmers and Web developers to integrate
    conferencing capabilities into their
    applications.
  • API is in the form of COM interfaces and
    functions.

23
Implementation
  • A simple NetMeeting compatible videoconference
    program built on top of the NetMeeting 3.0 SDK.
  • Support
  • Video
  • Audio
  • Text message
  • File Transfer
  • Whiteboard

24
Implementation
  • By directly using the functions of the API, the
    following raw data can be obtained
  • the members information
  • file transfer record
  • text messages record
  • Video, audio and whiteboard data cannot be
    directly obtained.

25
Implementation
  • Video
  • create a thread to check the display of the video
    windows
  • if scene change is detected, the video will be
    captured and stored as a still image.
  • the stored images are key frames of the
    conference and will be used for face detection
    and recognition after the conference.

26
Implementation
  • Audio
  • create a thread to record the local audio from
    the microphone.
  • when certain amount of audio data is recorded,
    send the audio data to all members of the
    conference.
  • all the received audio files and locally recorded
    audio files will be combined to generate a single
    audio file.
  • the final audio file will be used for voice
    recognition, the voice engine used is Microsoft
    SAPI.

27
Implementation
  • Whiteboard
  • cannot capture the NetMeeting whiteboard
    information because the format of the data is not
    stated in the API.
  • solution create our own whiteboard function and
  • data format.

28
Conclusion
  • We developed a videoconferencing agent
  • All channel data except whiteboard can be
    collected.
  • Speech Recognition and Face Detection
    Recognition is integrated into the system but
    accuracy needs to be improved
  • Simple searching can be performed on stored
    archives

29
Future Work
  • Whiteboard
  • Improve accuracy of Voice Recognition
  • XML
  • Better searching method
  • OCR for slide in video
  • Improve User Interface

30
  • Q A Session
Write a Comment
User Comments (0)
About PowerShow.com