CS598kn Basic Concepts of Audio, Video and Compression - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

CS598kn Basic Concepts of Audio, Video and Compression

Description:

CS598kn Basic Concepts of Audio, Video and Compression. Klara Nahrstedt. 01/21/2005 ... Flicker Effect. is a periodic fluctuation of brightness perception. ... – PowerPoint PPT presentation

Number of Views:234

Avg rating:3.0/5.0

Slides: 40

Provided by: yuanyu1

Category:

more less

Transcript and Presenter's Notes

Title: CS598kn Basic Concepts of Audio, Video and Compression

1
CS598kn Basic Concepts of Audio, Video and
Compression

Klara Nahrstedt
01/21/2005

2
Content

Introduction on Multimedia
Audio encoding
Video encoding
Compression

3
Video on Demand

Video On Demand (a) ADSL vs. (b) cable

4
Multimedia Files

A movie may consist of several files

5
Multimedia Issues

Analog to digital
Problem need to be acceptable by ears or eyes
Jitter
Require high data rate
Large storage
Compression
Require real-time playback
Scheduling
Quality of service
Resource reservation

6
Audio

Sound is a continuous wave that travels through
the air.
The wave is made up of pressure differences.

7
How do we hear sound?

Sound is detected by measuring the pressure level
at a point
When an acoustic signal reaches the otter- ear
(Pinna), the generated wave will be transformed
into energy and filtered through the middle-ear.
The inner-ear (Cochlea) transforms the energy
into nerve activity.
In similar way, when an acoustic wave strikes a
microphone, the microphone generates an
electrical signal, representing the sound
amplitude as a function of time.

8
Basic Sound Concepts

Frequency represents the number of periods in a
second (measured in hertz, cycles/second)
Human hearing frequency range 20 Hz - 20 kHz
(audio), voice is about 500 Hz to 2 kHz.
Amplitude of a sound is the measure of
displacement of the air pressure wave from its
mean.

9
Computer Representation of Audio

Speech is analog in nature and it is converted to
digital form by an analog-to-digital converter
(ADC).
A transducer converts pressure to voltage levels.
Convert analog signal into a digital stream by
discrete sampling
Discretization both in time and amplitude
(quantization)

10
Audio Encoding (1)

Audio Waves Converted to Digital
electrical voltage input
sample voltage levels at intervals to get a
vector of values (0, 0.2, 0.5, 1.1, 1.5, 2.3,
2.5, 3.1, 3.0, 2.4,...)
A computer measures the amplitude of the waveform
at regular time intervals to produce a series of
numbers (samples).
The ADC process is governed by four factors
sampling rate, quantization, linearity, and
conversion speed. binary number as output

11
Audio Encoding (2)

Sampling Rate rate at which a continuous wave is
sampled (measured in Hertz)
Examples CD standard - 44100 Hz, Telephone
quality - 8000 Hz
The audio industry uses 5.0125 kHz, 11.025 kHz,
22.05 kHz, and 44.1 kHz as the standard sampling
frequencies. These frequencies are supported by
most sound cards.
Question How often do you need to sample a
signal to avoid losing information?

12
Audio Encoding (3)

Answer It depends on how fast the signal is
changing,. Real Answer twice per cycle (this
follows from Nyquist sampling theorem)
Nyquist Sampling Theorem If a signal f(t) is
sampled at regular intervals of time and at a
rate higher than twice the highest significant
signal frequency, then the samples contain all
the information of the original signal.
Example CD's actual sampling frequency is 22050
Hz, but because of Nyquist's Theorem, we need to
sample the signal twice, therefore the sampling
frequency is 44100Hz.

13
Audio Encoding (4)

The best-known technique for voice digitization
is Pulse-Code Modulation (PCM).
PCM is based on the sampling theorem.
If voice data are limited to 4000 Hz, then PCM
samples 8000 samples/second which is sufficient
for the input voice signal.
PCM provides analog samples which must be
converted to digital representation. Each of
these analog samples must be assigned a binary
code. Each sample is approximated by being
quantized as explained above.

14
Audio Encoding (5)

Quantization (sample precision) the resolution
of a sample value.
Samples are typically stored as raw numbers
(linear PCM format) or as logarithms (u-law or
A-law)
Quantization depends on the number of bits used
measuring the height of the waveform
Example 16-bit CD quality quantization results
in over 65536 values
Audio Formats are described by the sample rate
and quantization
Voice quality 8-bit quantization, 8000 Hz u-law
mono (8kBytes/s)
22 kHz 8-bit linear mono (22 kBytes/second) and
stereo (44 kBytes/s)
CD quality 16-bit quantization, 44100 Hz linear
stereo (176.4 kBytes/s 44100 samples x 16
bits/sample x 2 (two channels)/8000)

15
Audio Formats

Audio formats are characterized by four
parameters
Sample Rate sampling frequency per second
Encoding Audio data representation
U-law CCITT G.711 standard for voice data in
telephone companies (USA, Canada, Japan)
A-law CCITT G.711 standard for voice data in
telephony elsewhere (Europe,)
A-law and u-law are sampled at 8000
samples/second with precision of 12 bits and
compressed to 8 bit samples.
Linear PCM uncompressed audio where samples are
proportional to audio signal voltage
Precision number of bits used to store each
audio sample.
Channel Multiple channels of audio may be
interleaved at sample boundaries

16
Basic Concepts of Video

Visual Representation Video Encoding
Objective is to offer the viewer a sense of
presence in the scene and of participation in the
events portrayed
Transmission
Video signals are transmitted to a receiver
through a single television channel
Digitalization
Analog to digital conversion, sampling of
gray/color level - Quantization

17
Visual Representation (1)

Video Signals are generated at the output of a
camera by scanning a two-dimensional moving scene
and converting them into a onedimensional
electric signal
A moving scene is a collection of individual
images, where each scanned picture generates a
frame of the picture
Scanning starts at the top-left corner of the
picutre and ends at the bottom-right.
Aspect Ratio ratio of picture width and height
Pixel discrete picture element digitized
light point in a frame
Vertical Frame Resolution number of pixels in
picture height
Horizontal Frame Resolution number of pixels in
picture width
Spatial Resolution Vertical x Horizontal
Resolution
Temporal Resolution Rapid succession of
different frames

18
Visual Representation (2)

Continuity of Motion
Minimum 15 frames per second
NTSC 29.97 Hz repetition rate, 30 frames/sec
PAL 25 HZ, 25 frames/sec
HDTV 59.94 Hz, 60 frames/sec
Flicker Effect
is a periodic fluctuation of brightness
perception. To avoid this effect, we need at
least 50 refresh cycles per second. This is done
by display devices using display refresh buffer
Picture Scanning
Progressive scanning single scanning of a
picture
Interlaced scanning the frame is formed by
scanning two pictures at different times, with
the lines interleaved, such that two consecutive
lines of a frame belong to alternative field
(scan odd and even lines separately)
NTSC TV uses interlaced scanning to trade-off
vertical with temporal resolution
HDTV and computer displays are high
spatio-temporal videos and use progressive
scanning.

19
Video Color Encoding (3)

During the scanning, a camera creates three
signals RGB (red, greed and blue) signals.
For compatibility with black-and-white video and
because of the fact that the three color signals
are highly correlated, a new set of signals of
different space are generated.
The new color systems correspond to the standards
such as NTCS, PAL, SECAM.
For transmission of the visual signal we use
three signals 1 luminance (brightness- basic
signal) and 2 chrominance (color signals).
YUV Signal (Y 0.3R0.59G0.11B, U (B-Y) x
0.493, V(R-Y) x 0.877)
Coding ratio between components Y,U,V 422
In NTSC signal the luminance and chrominance
signals are interleaved
The goal at the receiver is (1) separate
luminance from chrominance components, and (2)
avoid interference between them (cross-color,
cross luminance)

20
Basic Concepts of Image Formats

Important Parameters for Captured Image Formats
Spatial Resolution (pixels x pixels)
Color encoding (quantization level of a pixel
e.g., 8-bit, 24-bit)
Examples SunVideo' Video Digitizer Board allows
pictures of 320 by 240 pixels with 8-bit
gray-scale or color resolution.
For a precise demonstration of image basic
concepts try the program xv which displays images
and allows to show, edit and manipulate the image
characteristics.
Important Parameters for Stored Image Formats
Images are stored as a 2D array of values where
each value represents the data associated with a
pixel in the image (bitmap or a color image).
The stored images can use flexible formats such
as the RIFF (Resource Interchange File Format).
RIFF includes formats such as bitmats,
vector-representations, animations, audio and
video.
Currently, most used image storage formats are
GIF (Graphics Interchange Format), XBM (X11
Bitmap), Postscript, JPEG (see compression
chapter), TIFF (Tagged Image File Format), PBM
(Portable Bitmap), BMP (Bitmap).

21
Digital Video

The process of digitizing analog video involves
Filtering, sampling, quantization
Filtering is employed to avoid the aliasing
artifacts of the follow-up sampling process
Filtered luminance and chrominance signals are
sampled to generate a discrete time signal
Digitization means sampling gray/color levels in
the frame at MxN array of points
The minimum rate at which each component (YUV)
can be sampled is the Nyquist rate and
corresponds to twice the signal bandwidth
Once the points are sampled , they are quantized
into pixels i.e., the sampled value is mapped
into integer. The quantization level depends on
how many bits do we allocate to present the
resulting integer (e.g., 8 bits per pixel, or 24
bits per pixel)

22
Digital Transmission Bandwidth

Bandwidth requirements for Images
Raw Image Transmission Bandwidth size of the
image spatial resolution x pixel resolution
Compressed Image Transmission Bandwidth
depends on the compression scheme(e.g., JPEG) and
content of the image
Symbolic Image Transmission bandwidth size of
the instructions and variables carrying graphics
primitives and attributes.
Bandwidth Requirements for Video
Uncompressed Video Bandwidth image size x frame
rate
Compressed Video Bandwidth depends on the
compression scheme(e.g., Motion JPEG, MPEG) and
content of the video (scene changes).
Example Assume the following video
characteristics - 720,000 pixels per
image(frame), 8 bits per pixel quantization, and
60 frames per second frame rate. The Video
Bandwidth 720,000 pixels per frame x 8 bits per
pixel x 60 fps
which results in HDTV data rate of 43,200,000
bytes per second 345.6 Mbps When we use MPEG
compression, the bandwidth goes to 34 Mbps wit
some loss in image/video quality.

23
Compression Classification

Compression is important due to limited bandwidth
All compression systems require two algorithms
Encoding at the source
Decoding at the destination
Entropy Coding
Lossless encoding
Used regardless of medias specific
characteristics
Data taken as a simple digital sequence
Decompression process regenerates data completely
Examples Run-length coding, Huffman coding,
Arithmetic coding
Source Coding
Lossy encoding
Takes into account the semantics of data
Degree of compression depends on data content
Examples DPCM, Delta Modulation
Hybrid Coding
Combined entropy coding with source coding
Examples JPEG, MPEG, H.263,

24
Compression (1)
Uncompressed Picture
Picture Preparation
Picture Processing
Adaptive Feedback
Quantization
Entropy Encoding
25
Compression (2)

Picture Preparation
Analog-to-digital conversion
Generation of appropriate digital representation
Image division into 8x8 blocks
Fix the number of bits per pixel
Picture Processing (Compression Algorithm)
Transformation from time to frequency domain
(e.g., Discrete Cosine Transformation DCT)
Motion vector computation for motion video
Quantization
Mapping real numbers to integers (reduction in
precision)
Entropy Coding
Compress a sequential digital stream without loss

26
Compression (3)(Entropy Encoding)

Simple lossless compression algorithm is the
Run-length Coding, where multiple occurring bytes
are grouped together as Number-OccuranceSpecial-Ch
aracterCompressed-Byte. For example,
'AAAAAABBBBBDDDDDAAAAAAAA' can be encoded as
6!A5!B5!D8!A', where !' is the special
character. The compression ratio is 50
(12/24100).
Fixed-length Coding each symbol gets allocated
the same number of bits independent of frequency
(L log2(N)), N number of symbols
Statistical Encoding each symbol has a
probability of frequency (e.g., P(A) 0.16, P(B)
0.51, P(C) 0.33)
The theoretical minimum average number of bits
per codeword is known as Entropy (H). According
to Shannon
H SPilog2Pi bits per codeword

27
Huffman Coding
P(ACB) 1
0
1
P(AC) 0.49
P(B) 0.51
0
1
Symbol Code A 00 C
01 B 1
P(A) 0.16
P (C) 0.33
28
JPEG Joint Photographic Experts Group

6 major steps to compress an image (1) block
preparation, (2) DCT (Discrete Cosine Transform)
transformation, (3) quantization, (4) further
compression via differential compression, (5)
zig-zag scanning and run-length coding
compression, (6) Huffman coding compression
Quantization step represents the lossy step where
we loose data in a non-invertible fashion.
Differential compression means that we consider
similar blocks in the image and encode only the
first block and for the rest of the similar
blocks, we encode only differences between the
previous block and current block. The hope is
that the difference is a much smaller value,
hence we need less bits to represent it. Also
often the differences end up close to 0 and can
be very well compressed by the next compression -
run-length coding.
Huffman compression is a lossless statistical
encoding algorithm which takes into account
frequency of occurrence (not each byte has the
same weight)

29
JPEG Block Preparation

RGB input data and block preparation
Eyes responds to luminance (Y) more than
chrominance (I and Q)

30
Image Processing

After image preparation we have
Uncompressed image samples grouped into data
units of 8x8 pixels
Precision 8bits/pixel
Values are in the range of 0,255
Steps in image processing
Pixel values are shifted into the range
-128,127 with center 0
DCT maps values from time to frequency domain
S(u,v) ¼ C(u)C(v)SSS(x,y)cos(2x1)up/16
cos(2y1)vp/16
S(0,0) lowest frequency in both directions DC
coefficient determines the fundamental color of
the block
S(0,1), , S(7,7) AC coefficients

31
Quantization

Goal of quantization is to throw out bits
Consider example 1011012 45 (6bits) we can
truncate this string to 4 bits 10112 11 or to
3 bits 1012 5 (original value 40) or 1102 6
(original value 48)
Uniform quantization is achieved by dividing DCT
coefficient value S(u,v) by N and round the
result.
JPEG uses quantization tables

32
Entropy Encoding

After image processing we have quantized DC and
AC coefficients
Initial step of entropy encoding is to map 8x8
plane into 64 element vector using Zig-Zag

DC Coefficient Processing use
Difference coding
AC Coefficient Processing apply
Run-length coding
Apply Huffman Coding on DC and
AC coefficients

33
MPEG Motion Picture Experts Group

MPEG-1 was designed for video recorder-quality
output (320x240 for NTSC) using the bit rate of
1.2 Mpbs.
MPEG-2 is for broadcast quality video into
4-6Mbps (it fits into the NTSC or PAL broadcast
channel)
MPEG takes advantage of temporal and spatial
redundancy. Temporal redundancy means that two
neighboring frames are similar, almost identical.
MPEG-2 output consists of three different kinds
of frames that have to be processed
I (Intracoded) frames - self-contained
JPEG-encoded still pictures
P (Predictive) frames - Block-by-block difference
with the last frame
B (Bidirectional) frames - Differences with the
last and next frames

34
The MPEG Standard

I frames - self-contained, hence they are used
for fast forward and rewind operations in VOD
applications
P frames code interfame differences. The
algorithm searches for similar macroblocks in the
current and previous frame, and if they are only
slightly different, it encodes only the
difference and the motion vector in order to find
the position of the macroblock for decoding.
B frames - encoded if three frames are available
at once the past one, the current one and the
future one. Similar to P frame, the algorithm
takes a macroblock in the current frame and looks
for similar macroblocks in the past and future
frames.
MPEG is suitable for stored video because it is
an asymmetric lossy compression. The encoding
takes long time, but the decoding is very fast.
The frames are delivered at the receiver in the
dependency order rather than display order, hence
we need buffering to reorder the frames.

35
MPEG/Video I-Frames

I frames (intra-coded images)
MPEG uses JPEG compression algorithm for I-frame
encoding
I-frames use 8x8 blocks defined within a
macro-block. On these blocks, DCT is performed.
Quantization is done by a constant value for all
DCT coefficients, it means no quantization tables
exist as it is the case in JPEG

36
MPEG/Video P-Frames

P-frames (predictive coded frames) requires
previous I-frame and/or previous P-frame for
encoding and decoding
Use motion estimation method at the encoder
Define match window within a given search window.
Match window corresponds to macro-block, search
window corresponds to an arbitrary window size
depending how far away we are willing to look.
Matching methods
SSD correlation uses SSD Si(xi-yi)2
SDA correlation uses SAD Sixi-yi

37
MPEG/Video B-Frame

B-frames (bi-directionally predictive-coded
frames) require information of the previous and
following I and/or P-frame

- ½ x (
)
DCT Quant. RLE
0001110
Motion Vectors (two)
Huffman Coding
38
MPEG/Audio Encoding

Precision is 16 bits
Sampling frequency is 32 KHz, 44.1 KHz, 48 KHz
3 compression methods exist Layer 1, Layer 2,
Layer 3 (MP3)

Decoder is accepting layer 2 and layer 1 32
kbps-320kpbs, target 64 kbps Decoder is
accepting layer 1 32kbps-384kbps, target 128
kbps 32kbps-448kbps, target 192 kbps
Layer 3
Layer 2
Layer 1
39
MPEG/System Data Stream