Concepts of Multimedia Processing and Transmission - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Concepts of Multimedia Processing and Transmission

Description:

... Karhunen-Loeve Transform (KLT), Probability Density Function (pdf) Estimation ... Do not need to recognize spoken word to achieve audio-to-visual mapping ... – PowerPoint PPT presentation

Number of Views:883
Avg rating:3.0/5.0
Slides: 47
Provided by: teal9
Category:

less

Transcript and Presenter's Notes

Title: Concepts of Multimedia Processing and Transmission


1
Concepts of Multimedia Processing and Transmission
  • IT 481, Lecture 1
  • Dennis McCaughey, Ph.D.
  • 28 August, 2006

2
Outline
  • Course Description
  • Instructor
  • Student Survey
  • Exams, Homework and Project
  • Grading
  • General Policies
  • Lecture Schedule

3
Course Description
  • Topics
  • The fundamentals of signal and image processing,
    including algorithms for signal processing that
    have applications to multimedia
  • Techniques for voice coding and recognition, CD
    and DVD technology, streaming video, WANs and
    LANs, and videoconferencing technology
  • Text Multimedia Communication Systems
    Techniques, Standards, and Networks, K. R. Rao,
    Zoran S. Bojkovic, Dragorad A. Milovanovic, 
    Prentice Hall PTR 1st edition (April 26, 2002),
    ISBN 013031398X.

4
Instructor
  • Dennis McCaughey
  • Contact Information
  • 703-263-7425 (Office)
  • 703-624-6830 (Cell)
  • dgm_at_rincon.com (e-mail)
  • Office Hours one hour before class
  • Background
  • PhD in EE University of Southern California 1977
  • Thesis Degrees of Freedom for Projection Imaging

5
Student Survey
  • Name
  • Contact Information
  • Last Degree along with current Degree Objective
    i. e.
  • Undergrad seeking Bachelors, Grad seeking
    MS/PhD, Other
  • Mathematical Background
  • Calculus?
  • Differential Equations?
  • Linear Algebra?
  • Probability, Statistics, Random Processes?

6
Student Survey Contd
  • Systems Background
  • Linear Systems?
  • Signal Processing
  • Image processing
  • Programming Languages
  • C or C?
  • MATLAB?

7
Exams, Homework and Project
  • Mid-Term 1 Hour Closed Book
  • Cover the key topics covered in class and
    homework
  • Final Format To Be Determined
  • Homework 1) Reading assignments, 2) Written
    answers to selected questions based on reading
    assignments, 3) Some limited math problems
  • Project Format (Preliminary) MATLAB
    implementation of a multimedia processing
    application.

8
More on the Project
  • A course project will be required exploring
    aspects of multimedia signal processing which may
    computer based using MATLAB.
  • Project topics will be of the students choice
    subject to review by the instructor.
  • Each student will also be required to present a
    short briefing on the results.
  • Projects will be evaluated on the content of the
    presentation and not on the briefing itself.
  • Details regarding topics, content, and format
    will be provided during the course.

9
Grading
  • The final grade will be determined by a weighted
    average of the homework assignments, a mid-term
    exam, a final exam and a project

Homework 10
Mid-Term 20
Project 30
Final 40
10
General Policies
  • Collaboration
  • Students are permitted and encouraged to
    collaborate on homework assignments. 
  • All graded work, however, must be the original
    effort of the student submitting the paper. 
  • Homework
  • Homework will be collected at the beginning of
    each class period.  Note  Late homework will be
    accepted provided the reason for the delay is
    coordinated with the instructor within 2 days of
    its assignment. Homework solutions will be
    discussed in class.
  •   Make-up Exams
  • Make-up exams will not be given unless detailed
    written clarification accompanied by
    documentation for the absence is provided. If
    this information is not provided an F grade will
    be given for the exam. The location and time for
    a make-up exam will be decided by the instructor.
    Also, students are expected to be in class and
    on-time for every class.

11
Lecture Schedule (Preliminary)
Week Date Chapter Topic Reading Homework
1 8/28 1, 2 Lecture 1 Introduction to Multimedia Communications 4
2 9/11 4 Lecture 2 Networks and Multimedia Applications 3  
3 9/18 3 Lecture 3 Signal Processing Fundamentals 3  
4 9/25 3 Lecture 4 Audio Coding MATLAB Tutorial 3  
5 10/2 3 Lecture 5 Video Coding 1 3  
6 10/9 3 Lecture 6 Video Coding 2 Review    
7 10/17 1-4 Mid-Term Exam Project Review    
8 10/30 5 Lecture 7 MPEG-1 5  
9 11/6 5 Lecture 8 MPEG-2 5  
10 11/13 5 Lecture 9 MPEG-4 5  
11 11/20   Lecture 10 MPEG-4, MPEG-7, MPEG-21    
12 11/27 6 Lecture 11 Audio and video streaming 6  
13 12/4 Lecture 12  
14 12/11 Final Exam Review 6  
15 12/18 Final Exam 5-6  
12
Multimedia Communications
13
What is Multimedia?
  • Multimedia is a combination of text, art, sound,
    animation, and video.

Slide Courtesy, Hung Nguyen
14
Multimedia Components Simplified
  • Multimedia can be viewed as they combination of
    audio, video, data and how they interact with the
    user (more than the sum of the individual
    components)

15
Background
  • Fast paced emergence in applications in medicine,
    education, travel etc
  • Characterized by large documents that must be
    communicated with short delays
  • Glamorous applications such as distance learning,
    video teleconferencing
  • Applications that are enhanced by Video are often
    seen as driver for development of multimedia
    networks

16
Forces Driving Communications That Facilitate
Multimedia Communications
  • Evolution of communications and data networks
  • Increasing availability of almost unlimited
    bandwidth demand
  • Availability of ubiquitous access to the network
  • Ever increasing amount of memory and
    computational power
  • Sophisticated terminals
  • Digitization of virtually everything

17
New Information System Paradigm
Slide Courtesy, Hung Nguyen
18
Elements of Multimedia Systems
  • Two key communication modes
  • Person-to-person
  • Person-to-machine

Slide Courtesy, Hung Nguyen
19
Multimedia Networks
  • The world has been wrapped in copper and glass
    fiber and can be viewed as a hair ball with
    physical, wireless and satellite entry/exit
    points.
  • Physical LAN-WAN connections
  • Wireless Cellular telephony, wireless PC
    connectivity
  • Satellite INMARSAT, THURYA, ACeS etc

20
Multimedia Communication Model
  • Partitioning of information objects into distinct
    types, e.g., text, audio, video
  • Standardization of service components per
    information type
  • Creation of platforms at two levels network
    service and multimedia communication
  • Define general applications for multiple use in
    various multimedia environments
  • Define specific applications, e.g. e-commerce,
    tele-training, using building blocks from
    platform and general applications

21
Requirements
  • User Requirements
  • Fast preparation and presentation
  • Dynamic control of multimedia applications
  • Intelligent support to users
  • Standardization
  • Network Requirements
  • High speed and variable bit rates
  • Multiple virtual connections using the same
    access
  • Synchronization of different information types
  • Suitable standardized services along with support

22
Network Requirements
  • ATM-BISDN and SS7 have enabled the switching
    based communications capabilities over the PSTN
    that support the necessary services
  • ATM-BISDN-SS7 will evolve to all optical
    switchless networks based on packet transfer

23
Packet Transfer Concept
  • Allows voice, video and data to be dealt with in
    a common format
  • More flexible than circuit switching which it can
    emulate while allowing the multiplexing of varied
    bit rate data streams
  • Dynamic allocation of bandwidth
  • Handle Variable Bit Rate (VBR) directly

24
Considerations
  • Buffering required for constant bit rate data
    such as audio
  • Re-sequencing and recovery capabilities must be
    provided over networks where packets may be
    received either in an order different from that
    transmitted or dropped
  • In an ATM network some packets can be dropped
    while others may not (i.e. voice vs bank transfer
    data packets)
  • Optimum packet lengths for voice video and data
    differ in an ATM network
  • IP packets over the internet may arrive in a
    different order or be dropped.

25
Digital Video Signal Transport
  • Decoder
  • De-quantization
  • Entropy decode
  • Inv Trans
  • Loss conceal
  • Post process
  • Encoder
  • Transformation
  • Quantization
  • Entropy Coding
  • Bit-Rate Control
  • Application
  • Data Structuring
  • Application
  • Re-Synch

Network Multiplexing/Routing
Video
Users
  • Error detection
  • Loss detection
  • Error correction
  • Erasure correction
  • Overhead (FEC)
  • Re-Trans

26
Quality of Service (QoS)
  • The set of parameters that defines the properties
    of media streams
  • Can define four QoS layers
  • User QoS Perception of the multimedia data at
    the user interface (qualitative)
  • Application QoS Parameters such as end-to-end
    delay (quantitative)
  • System QoS Requirements on the communications
    services derived from the application QoS
  • Network QoS Parameters such as network load and
    performance

27
Audio-Visual Integration
28
Importance of Interaction
  • Multimedia is more than the combination of text,
    audio, video and data
  • Interaction among media is important
  • Consider a poorly dubbed movie
  • Audio not synchronized with video
  • Lip movements inconsistent with language
  • Audio dynamic range inconsistent with the scene

Slide Courtesy, Hung Nguyen
29
Media Interaction
  • Process and Model

Compression Synthesis 3D Sound
Audio
Lip synch Face Animation Joint A/V Coding
Speech Recognition Text-to-Speech
Multimedia
Text
Image Video
Sign language Lip reading
Compression, Graphics Database indexing/retrieval
Translation Natural language
Slide Courtesy, Hung Nguyen
30
Bimodality of Human Speech
  • Human speech is produced by vibration of the
    vocal cord, configuration of the vocal tract with
    muscles that generate facial expressions

Audio Visual ? Perceived
ba ga da
pa ga ta
ma ga na
Slide Courtesy, Hung Nguyen
31
Basic Definitions
  • The basic unit of acoustic speech is called a
    phoneme
  • In the visual domain, the basic unit of mouth
    movement is called viseme
  • A viseme is the smallest visibly distinguishable
    unit of speech
  • Can contain several phonemes and thus form one
    viseme group
  • A many-to-one mapping between phonemes and visemes

Slide Courtesy, Hung Nguyen
32
Lip Reading System
  • Application to support hearing-impaired person
  • People learn to understand spoken language by
    combining visual content with lexical, syntactic,
    semantic and programmatic information
  • Automated lip reading systems
  • Speech recognition possible using only visual
    information
  • Integrated with speech recognition systems to
    improve accuracy

Slide Courtesy, Hung Nguyen
33
Lip Synchronization
  • Applications
  • In VTC (video teleconferencing) where video frame
    is dropped (low bandwidth requirement) but audio
    must still be continuous
  • In non-real-time use such as dubbing in studio
    where recorded voice full of background noise
  • Time-warping commonly used in both audio and
    video modes
  • Time-frequency analysis
  • Video time-warping could be used for VTC
  • Audio time-warping could be used for dubbing

Slide Courtesy, Hung Nguyen
34
Lip Tracking
  • To prevent too much jerkiness in the motion
    rendering and too much loss in lip
    synchronization
  • Involved real-time analysis on 3-dimensional of
    the video signal plus one temporal dimension
  • Produce meaningful parameters
  • Classification of mouth images into visemes
  • Measures of dimension, e.g. mouth widths and
    heights
  • Analysis tools Fourier Transform,
    Karhunen-Loeve Transform (KLT), Probability
    Density Function (pdf) Estimation

Slide Courtesy, Hung Nguyen
35
Audio-to-Visual Mapping for Lip Tracking
  • Conversion of acoustic speech to mouth shape
    parameters
  • A mapping of phonemes to visemes
  • Could be most precisely implemented with a
    complete speech recognizer followed by a look-up
    table
  • High computational overhead plus table look-up
    complexity
  • Do not need to recognize spoken word to achieve
    audio-to-visual mapping
  • Physical relationships exist between vocal tract
    shape and sound produced ? functional
    relationships exist between speech and visual
    parameters

Slide Courtesy, Hung Nguyen
36
Classification-Based Conversion Approaches for
Lip Tracking
  • Two-step process
  • Classification of acoustic signal using VQ
    (vector quantization), HMM (hidden Markov model)
    and NN (neural network)
  • Mapping of the acoustic classes into
    corresponding visual outputs, then averaged to
    get centroid
  • Shortcomings
  • Error resulting from averaging visual vector to
    get visual centroid
  • Not a continuous mapping finite output levels

Slide Courtesy, Hung Nguyen
37
Classification-Based Conversion
Slide Courtesy, Hung Nguyen
38
Audio and Visual Integration for Lip Reading
Applications
  • Three major steps
  • Audio-visual pre-processing Principal Component
    Analysis (PCA) has been used for feature
    extraction
  • Pattern recognition strategy (HMM, NN,
    time-warping)
  • Integration strategy (decision making)
  • Heuristic rules to incorporate knowledge of
    phonemes about the two modalities
  • Combination of independent evaluation score for
    each modalities

Slide Courtesy, Hung Nguyen
39
Application in Biometrics Bimodal Person
Verification
  • Existing methods for person verification are
    mainly based on a single modality which would
    have limitation in security and robustness
  • Audio visual integration using a camera and
    microphone makes person verification a more
    reliable product

Slide Courtesy, Hung Nguyen
40
Joint Audio-Video Coding
  • Correlation between audio and video can be used
    to achieve more efficient coding
  • Predictive coding of audio and video information
    used to construct estimate of current frame
    (cross-modal redundancy)
  • Difference between original and estimated signal
    can be transmitted as parameters
  • Decision on what and how to send is based on Rate
    Distortion (R-D) criteria
  • Reconstruction done at receiver according to
    agreed-upon decoding rules

Slide Courtesy, Hung Nguyen
41
Cross-Model Predictive Coding
Visual Analysis
Parameter X
Decision Module (R-D)
Nothing
Parameter X
A-to-V Mapping
Slide Courtesy, Hung Nguyen
42
Applications of Multimedia
  • Business - Business applications for multimedia
    include presentations training, marketing,
    advertising, product demos, databases,
    catalogues, instant messaging, and networked
    communication.
  • Schools - Educational software can be developed
    to enrich the learning process.

Slide Courtesy, Hung Nguyen
43
Applications of Multimedia
  • Home - Most multimedia projects reach the homes
    via television sets or monitors with built-in
    user inputs.
  • Public places - Multimedia will become available
    at stand-alone terminals or kiosks to provide
    information and help.

Slide Courtesy, Hung Nguyen
44
Compact Disc Read-Only (CD-ROM)
  • CD-ROM is the most cost-effective distribution
    medium for multimedia projects.
  • It can contain up to 80 minutes of full-screen
    video or sound.
  • CD burners are used for reading discs and
    converting the discs to audio, video, and data
    formats.

Slide Courtesy, Hung Nguyen
45
Digital Versatile Disc (DVD)
  • Multilayered DVD technology increases the
    capacity of current optical technology to 18 GB.
  • DVD authoring and integration software is used to
    create interactive front-end menus for films and
    games.
  • DVD burners are used for reading discs and
    converting the disc to audio, video, and data
    formats.

Slide Courtesy, Hung Nguyen
46
Multimedia Communications
  • Multimedia communications is the delivery of
    multimedia to the user by electronic or digitally
    manipulated means.

Slide Courtesy, Hung Nguyen
Write a Comment
User Comments (0)
About PowerShow.com