Basic Concepts of Audio, Video and Compression - PowerPoint PPT Presentation


PPT – Basic Concepts of Audio, Video and Compression PowerPoint presentation | free to download - id: 72a8bd-ZDgwN


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Basic Concepts of Audio, Video and Compression


Basic Concepts of Audio, Video and Compression Anupriya Sharma ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 40
Provided by: Yuan76
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Basic Concepts of Audio, Video and Compression

Basic Concepts of Audio, Video and Compression
  • Anupriya Sharma

  • Introduction on Multimedia
  • Audio encoding
  • Video encoding
  • Compression

Video on Demand
  • Video On Demand (a) ADSL vs. (b) cable

Multimedia Files
  • A movie may consist of several files

Multimedia Issues
  • Analog to digital
  • Problem need to be acceptable by ears or eyes
  • Jitter
  • Require high data rate
  • Large storage
  • Compression
  • Require real-time playback
  • Scheduling
  • Quality of service
  • Resource reservation

  • Sound is a continuous wave that travels through
    the air.
  • The wave is made up of pressure differences.

How do we hear sound?
  • Sound is detected by measuring the pressure level
    at a point
  • When an acoustic signal reaches the otter- ear
    (Pinna), the generated wave will be transformed
    into energy and filtered through the middle-ear.
    The inner-ear (Cochlea) transforms the energy
    into nerve activity.
  • In similar way, when an acoustic wave strikes a
    microphone, the microphone generates an
    electrical signal, representing the sound
    amplitude as a function of time.

Basic Sound Concepts
  • Frequency represents the number of periods in a
    second (measured in hertz, cycles/second)
  • Human hearing frequency range 20 Hz - 20 kHz
    (audio), voice is about 500 Hz to 2 kHz.
  • Amplitude of a sound is the measure of
    displacement of the air pressure wave from its

Computer Representation of Audio
  • Speech is analog in nature and it is converted to
    digital form by an analog-to-digital converter
  • A transducer converts pressure to voltage levels.
  • Convert analog signal into a digital stream by
    discrete sampling
  • Discretization both in time and amplitude

Audio Encoding (1)
  • Audio Waves Converted to Digital
  • electrical voltage input
  • sample voltage levels at intervals to get a
    vector of values (0, 0.2, 0.5, 1.1, 1.5, 2.3,
    2.5, 3.1, 3.0, 2.4,...)
  • A computer measures the amplitude of the waveform
    at regular time intervals to produce a series of
    numbers (samples).
  • The ADC process is governed by four factors
    sampling rate, quantization, linearity, and
    conversion speed. binary number as output

Audio Encoding (2)
  • Sampling Rate rate at which a continuous wave is
    sampled (measured in Hertz)
  • Examples CD standard - 44100 Hz, Telephone
    quality - 8000 Hz
  • The audio industry uses 5.0125 kHz, 11.025 kHz,
    22.05 kHz, and 44.1 kHz as the standard sampling
    frequencies. These frequencies are supported by
    most sound cards.
  • Question How often do you need to sample a
    signal to avoid losing information?

Audio Encoding (3)
  • Answer It depends on how fast the signal is
    changing,. Real Answer twice per cycle (this
    follows from Nyquist sampling theorem)
  • Nyquist Sampling Theorem If a signal f(t) is
    sampled at regular intervals of time and at a
    rate higher than twice the highest significant
    signal frequency, then the samples contain all
    the information of the original signal.
  • Example CD's actual sampling frequency is 22050
    Hz, but because of Nyquist's Theorem, we need to
    sample the signal twice, therefore the sampling
    frequency is 44100Hz.

Audio Encoding (4)
  • The best-known technique for voice digitization
    is Pulse-Code Modulation (PCM).
  • PCM is based on the sampling theorem.
  • If voice data are limited to 4000 Hz, then PCM
    samples 8000 samples/second which is sufficient
    for the input voice signal.
  • PCM provides analog samples which must be
    converted to digital representation. Each of
    these analog samples must be assigned a binary
    code. Each sample is approximated by being
    quantized as explained above.

Audio Encoding (5)
  • Quantization (sample precision) the resolution
    of a sample value.
  • Samples are typically stored as raw numbers
    (linear PCM format) or as logarithms (u-law or
  • Quantization depends on the number of bits used
    measuring the height of the waveform
  • Example 16-bit CD quality quantization results
    in over 65536 values
  • Audio Formats are described by the sample rate
    and quantization
  • Voice quality 8-bit quantization, 8000 Hz u-law
    mono (8kBytes/s)
  • 22 kHz 8-bit linear mono (22 kBytes/second) and
    stereo (44 kBytes/s)
  • CD quality 16-bit quantization, 44100 Hz linear
    stereo (176.4 kBytes/s 44100 samples x 16
    bits/sample x 2 (two channels)/8000)

Audio Formats
  • Audio formats are characterized by four
  • Sample Rate sampling frequency per second
  • Encoding Audio data representation
  • U-law CCITT G.711 standard for voice data in
    telephone companies (USA, Canada, Japan)
  • A-law CCITT G.711 standard for voice data in
    telephony elsewhere (Europe,)
  • A-law and u-law are sampled at 8000
    samples/second with precision of 12 bits and
    compressed to 8 bit samples.
  • Linear PCM uncompressed audio where samples are
    proportional to audio signal voltage
  • Precision number of bits used to store each
    audio sample.
  • Channel Multiple channels of audio may be
    interleaved at sample boundaries

Basic Concepts of Video
  • Visual Representation Video Encoding
  • Objective is to offer the viewer a sense of
    presence in the scene and of participation in the
    events portrayed
  • Transmission
  • Video signals are transmitted to a receiver
    through a single television channel
  • Digitalization
  • Analog to digital conversion, sampling of
    gray/color level - Quantization

Visual Representation (1)
  • Video Signals are generated at the output of a
    camera by scanning a two-dimensional moving scene
    and converting them into a onedimensional
    electric signal
  • A moving scene is a collection of individual
    images, where each scanned picture generates a
    frame of the picture
  • Scanning starts at the top-left corner of the
    picutre and ends at the bottom-right.
  • Aspect Ratio ratio of picture width and height
  • Pixel discrete picture element digitized
    light point in a frame
  • Vertical Frame Resolution number of pixels in
    picture height
  • Horizontal Frame Resolution number of pixels in
    picture width
  • Spatial Resolution Vertical x Horizontal
  • Temporal Resolution Rapid succession of
    different frames

Visual Representation (2)
  • Continuity of Motion
  • Minimum 15 frames per second
  • NTSC 29.97 Hz repetition rate, 30 frames/sec
  • PAL 25 HZ, 25 frames/sec
  • HDTV 59.94 Hz, 60 frames/sec
  • Flicker Effect
  • is a periodic fluctuation of brightness
    perception. To avoid this effect, we need at
    least 50 refresh cycles per second. This is done
    by display devices using display refresh buffer
  • Picture Scanning
  • Progressive scanning single scanning of a
  • Interlaced scanning the frame is formed by
    scanning two pictures at different times, with
    the lines interleaved, such that two consecutive
    lines of a frame belong to alternative field
    (scan odd and even lines separately)
  • NTSC TV uses interlaced scanning to trade-off
    vertical with temporal resolution
  • HDTV and computer displays are high
    spatio-temporal videos and use progressive

Video Color Encoding (3)
  • During the scanning, a camera creates three
    signals RGB (red, greed and blue) signals.
  • For compatibility with black-and-white video and
    because of the fact that the three color signals
    are highly correlated, a new set of signals of
    different space are generated.
  • The new color systems correspond to the standards
    such as NTCS, PAL, SECAM.
  • For transmission of the visual signal we use
    three signals 1 luminance (brightness- basic
    signal) and 2 chrominance (color signals).
  • YUV Signal (Y 0.3R0.59G0.11B, U (B-Y) x
    0.493, V(R-Y) x 0.877)
  • Coding ratio between components Y,U,V 422
  • In NTSC signal the luminance and chrominance
    signals are interleaved
  • The goal at the receiver is (1) separate
    luminance from chrominance components, and (2)
    avoid interference between them (cross-color,
    cross luminance)

Basic Concepts of Image Formats
  • Important Parameters for Captured Image Formats
  • Spatial Resolution (pixels x pixels)
  • Color encoding (quantization level of a pixel
    e.g., 8-bit, 24-bit)
  • Examples SunVideo' Video Digitizer Board allows
    pictures of 320 by 240 pixels with 8-bit
    gray-scale or color resolution.
  • For a precise demonstration of image basic
    concepts try the program xv which displays images
    and allows to show, edit and manipulate the image
  • Important Parameters for Stored Image Formats
  • Images are stored as a 2D array of values where
    each value represents the data associated with a
    pixel in the image (bitmap or a color image).
  • The stored images can use flexible formats such
    as the RIFF (Resource Interchange File Format).
    RIFF includes formats such as bitmats,
    vector-representations, animations, audio and
  • Currently, most used image storage formats are
    GIF (Graphics Interchange Format), XBM (X11
    Bitmap), Postscript, JPEG (see compression
    chapter), TIFF (Tagged Image File Format), PBM
    (Portable Bitmap), BMP (Bitmap).

Digital Video
  • The process of digitizing analog video involves
  • Filtering, sampling, quantization
  • Filtering is employed to avoid the aliasing
    artifacts of the follow-up sampling process
  • Filtered luminance and chrominance signals are
    sampled to generate a discrete time signal
  • Digitization means sampling gray/color levels in
    the frame at MxN array of points
  • The minimum rate at which each component (YUV)
    can be sampled is the Nyquist rate and
    corresponds to twice the signal bandwidth
  • Once the points are sampled , they are quantized
    into pixels i.e., the sampled value is mapped
    into integer. The quantization level depends on
    how many bits do we allocate to present the
    resulting integer (e.g., 8 bits per pixel, or 24
    bits per pixel)

Digital Transmission Bandwidth
  • Bandwidth requirements for Images
  • Raw Image Transmission Bandwidth size of the
    image spatial resolution x pixel resolution
  • Compressed Image Transmission Bandwidth
    depends on the compression scheme(e.g., JPEG) and
    content of the image
  • Symbolic Image Transmission bandwidth size of
    the instructions and variables carrying graphics
    primitives and attributes.
  • Bandwidth Requirements for Video
  • Uncompressed Video Bandwidth image size x frame
  • Compressed Video Bandwidth depends on the
    compression scheme(e.g., Motion JPEG, MPEG) and
    content of the video (scene changes).
  • Example Assume the following video
    characteristics - 720,000 pixels per
    image(frame), 8 bits per pixel quantization, and
    60 frames per second frame rate. The Video
    Bandwidth 720,000 pixels per frame x 8 bits per
    pixel x 60 fps
  • which results in HDTV data rate of 43,200,000
    bytes per second 345.6 Mbps When we use MPEG
    compression, the bandwidth goes to 34 Mbps wit
    some loss in image/video quality.

Compression Classification
  • Compression is important due to limited bandwidth
  • All compression systems require two algorithms
  • Encoding at the source
  • Decoding at the destination
  • Entropy Coding
  • Lossless encoding
  • Used regardless of medias specific
  • Data taken as a simple digital sequence
  • Decompression process regenerates data completely
  • Examples Run-length coding, Huffman coding,
    Arithmetic coding
  • Source Coding
  • Lossy encoding
  • Takes into account the semantics of data
  • Degree of compression depends on data content
  • Examples DPCM, Delta Modulation
  • Hybrid Coding
  • Combined entropy coding with source coding
  • Examples JPEG, MPEG, H.263,

Compression (1)
Uncompressed Picture
Picture Preparation
Picture Processing
Adaptive Feedback
Entropy Encoding
Compression (2)
  • Picture Preparation
  • Analog-to-digital conversion
  • Generation of appropriate digital representation
  • Image division into 8x8 blocks
  • Fix the number of bits per pixel
  • Picture Processing (Compression Algorithm)
  • Transformation from time to frequency domain
    (e.g., Discrete Cosine Transformation DCT)
  • Motion vector computation for motion video
  • Quantization
  • Mapping real numbers to integers (reduction in
  • Entropy Coding
  • Compress a sequential digital stream without loss

Compression (3)(Entropy Encoding)
  • Simple lossless compression algorithm is the
    Run-length Coding, where multiple occurring bytes
    are grouped together as Number-OccuranceSpecial-Ch
    aracterCompressed-Byte. For example,
    'AAAAAABBBBBDDDDDAAAAAAAA' can be encoded as
    6!A5!B5!D8!A', where !' is the special
    character. The compression ratio is 50
  • Fixed-length Coding each symbol gets allocated
    the same number of bits independent of frequency
    (L log2(N)), N number of symbols
  • Statistical Encoding each symbol has a
    probability of frequency (e.g., P(A) 0.16, P(B)
    0.51, P(C) 0.33)
  • The theoretical minimum average number of bits
    per codeword is known as Entropy (H). According
    to Shannon
  • H SPilog2Pi bits per codeword

Huffman Coding
P(ACB) 1
P(AC) 0.49
P(B) 0.51
Symbol Code A 00 C
01 B 1
P(A) 0.16
P (C) 0.33
JPEG Joint Photographic Experts Group
  • 6 major steps to compress an image (1) block
    preparation, (2) DCT (Discrete Cosine Transform)
    transformation, (3) quantization, (4) further
    compression via differential compression, (5)
    zig-zag scanning and run-length coding
    compression, (6) Huffman coding compression
  • Quantization step represents the lossy step where
    we loose data in a non-invertible fashion.
  • Differential compression means that we consider
    similar blocks in the image and encode only the
    first block and for the rest of the similar
    blocks, we encode only differences between the
    previous block and current block. The hope is
    that the difference is a much smaller value,
    hence we need less bits to represent it. Also
    often the differences end up close to 0 and can
    be very well compressed by the next compression -
    run-length coding.
  • Huffman compression is a lossless statistical
    encoding algorithm which takes into account
    frequency of occurrence (not each byte has the
    same weight)

JPEG Block Preparation
  • RGB input data and block preparation
  • Eyes responds to luminance (Y) more than
    chrominance (I and Q)

Image Processing
  • After image preparation we have
  • Uncompressed image samples grouped into data
    units of 8x8 pixels
  • Precision 8bits/pixel
  • Values are in the range of 0,255
  • Steps in image processing
  • Pixel values are shifted into the range
    -128,127 with center 0
  • DCT maps values from time to frequency domain
  • S(u,v) ¼ C(u)C(v)SSS(x,y)cos(2x1)up/16
  • S(0,0) lowest frequency in both directions DC
    coefficient determines the fundamental color of
    the block
  • S(0,1), , S(7,7) AC coefficients

  • Goal of quantization is to throw out bits
  • Consider example 1011012 45 (6bits) we can
    truncate this string to 4 bits 10112 11 or to
    3 bits 1012 5 (original value 40) or 1102 6
    (original value 48)
  • Uniform quantization is achieved by dividing DCT
    coefficient value S(u,v) by N and round the
  • JPEG uses quantization tables

Entropy Encoding
  • After image processing we have quantized DC and
    AC coefficients
  • Initial step of entropy encoding is to map 8x8
    plane into 64 element vector using Zig-Zag
  • DC Coefficient Processing use
  • Difference coding
  • AC Coefficient Processing apply
  • Run-length coding
  • Apply Huffman Coding on DC and
  • AC coefficients

MPEG Motion Picture Experts Group
  • MPEG-1 was designed for video recorder-quality
    output (320x240 for NTSC) using the bit rate of
    1.2 Mpbs.
  • MPEG-2 is for broadcast quality video into
    4-6Mbps (it fits into the NTSC or PAL broadcast
  • MPEG takes advantage of temporal and spatial
    redundancy. Temporal redundancy means that two
    neighboring frames are similar, almost identical.
  • MPEG-2 output consists of three different kinds
    of frames that have to be processed
  • I (Intracoded) frames - self-contained
    JPEG-encoded still pictures
  • P (Predictive) frames - Block-by-block difference
    with the last frame
  • B (Bidirectional) frames - Differences with the
    last and next frames

The MPEG Standard
  • I frames - self-contained, hence they are used
    for fast forward and rewind operations in VOD
  • P frames code interfame differences. The
    algorithm searches for similar macroblocks in the
    current and previous frame, and if they are only
    slightly different, it encodes only the
    difference and the motion vector in order to find
    the position of the macroblock for decoding.
  • B frames - encoded if three frames are available
    at once the past one, the current one and the
    future one. Similar to P frame, the algorithm
    takes a macroblock in the current frame and looks
    for similar macroblocks in the past and future
  • MPEG is suitable for stored video because it is
    an asymmetric lossy compression. The encoding
    takes long time, but the decoding is very fast.
  • The frames are delivered at the receiver in the
    dependency order rather than display order, hence
    we need buffering to reorder the frames.

MPEG/Video I-Frames
  • I frames (intra-coded images)
  • MPEG uses JPEG compression algorithm for I-frame
  • I-frames use 8x8 blocks defined within a
    macro-block. On these blocks, DCT is performed.
    Quantization is done by a constant value for all
    DCT coefficients, it means no quantization tables
    exist as it is the case in JPEG

MPEG/Video P-Frames
  • P-frames (predictive coded frames) requires
    previous I-frame and/or previous P-frame for
    encoding and decoding
  • Use motion estimation method at the encoder
  • Define match window within a given search window.
    Match window corresponds to macro-block, search
    window corresponds to an arbitrary window size
    depending how far away we are willing to look.
  • Matching methods
  • SSD correlation uses SSD Si(xi-yi)2
  • SDA correlation uses SAD Sixi-yi

MPEG/Video B-Frame
  • B-frames (bi-directionally predictive-coded
    frames) require information of the previous and
    following I and/or P-frame

- ½ x (
DCT Quant. RLE
Motion Vectors (two)
Huffman Coding
MPEG/Audio Encoding
  • Precision is 16 bits
  • Sampling frequency is 32 KHz, 44.1 KHz, 48 KHz
  • 3 compression methods exist Layer 1, Layer 2,
    Layer 3 (MP3)

Decoder is accepting layer 2 and layer 1 32
kbps-320kpbs, target 64 kbps Decoder is
accepting layer 1 32kbps-384kbps, target 128
kbps 32kbps-448kbps, target 192 kbps
Layer 3
Layer 2
Layer 1
MPEG/System Data Stream
  • Video is interleaved with audio.
  • Audio consists of three layers
  • Video consists of 6 layers
  • (1) sequence layer
  • (2) group of pictures layer (Video Param,
    Bitstream Param, )
  • (3) picture layer (Time code, GOP Param, )
  • (4) slice layer (Type, Buffer Param, Encode
    Param, .)
  • (5) macro-block layer (Qscale, )
  • (6) block layer (Type, Motion Vector, Qscale,)