Title: Sections 14.1 - 14.4 Streaming Media on Demand and Live Broadcast Multimedia over IP and wireless networks: compression, networking, and systems Mihaela van der Schaar
1Sections 14.1 - 14.4Streaming Media on Demand
and Live BroadcastMultimedia over IP and
wireless networks compression, networking, and
systemsMihaela van der Schaar Philip A. Chou
- Presented by
- H. Mark Okada
- CMPT 820
- February 18, 2009
2Streaming Media
- Media on demand a user scenario characterised by
audio or video playback locally from a CD or DVD - interactive controls fast forward, pause, seek,
etc. - Live broadcast a user scenario characterised by
tuning into a radio or television program - only has ability to join or leave a session
- Both are prevalent in the internet today
- Eg.
- interactive music and video playback
- internet radio
- chapter 14 looks at how these services are
available - Sections 14.2-14.4 will only cover media on demand
3Overview
- Section 14.2
- Overview of
- Architectures
- Protocols
- Format issues
- Section 14.3
- Buffering and timing fundamentals
- Section 14.4
- How media data is communicated for streaming on
demand - NOT COVERED - Section 14.5
- Live broadcast
4Architectures - 14.2.1
- Streaming media on demand and live broadcast
require different architectures
Figure 14.1
5Streaming media on demand
- source of media is encoded off line to a media
file - streaming using different protocols (Section
14.2.2) - media file may be specialized to support various
modes of streaming (discussed in Section 14.2.3) - client temporarily buffers encoded media into
decoder buffer - temporarily buffers decoded media in a render
buffer - fairly short (a frame or two) as it has large
decoded frames - enable experience through playback commands
- play, FF, stop, seek
- Communication between server client tailored to
- clients resources
- network connection
Figure 14.1a
6Progressive downloading
- type of streaming - media can be streamed faster
than playback. i.e. downloading entire file - If able to decode sequentially
- progressive downloading can be done through
simple file transfer protocols - eg. FTP, HTTP both over TCP/IP (i.e. over FTP or
through a web server) - If limited buffer
- progressive downloading can be done using simple
TCP flow control - allows client to accept data from TCP only if
there is space in media buffer - popularised by SHOUTcast, an early music
streaming service
network bandwidth gt media content bit rate (the
source coding rate)
7Progressive downloading
- type of streaming - media can be streamed faster
than playback. i.e. downloading entire file - need to account for network jitter, temporary
interferences - want highest possible source coding rate (not
less than worst case network bandwidth) - These are much of the issues for media on demand,
and the communication protocol between the client
and server
network bandwidth gt media content bit rate (the
source coding rate)
8Live broadcast
- encoder may be directly connected to the server
through an encoder buffer - encoder buffer contains limited data to maintain
fixed and short end-to-end delay - server accesses data at the playback point, not
in any arbitrary data in a file - restricts adaptivity, important for multiple
receivers - not possible to have interactive access to media
- difficult to adapt transmission rate of varying
clients - difficult for server to use retrans-
- mission-based error control
- due to negative acknowledgement
- (NAK) implosion problem
- error becomes delicate issue for live
- broadcast
receiver-driven layered multicast (RLM) allows
adaptation of transmission rate Also see S. R.
McCanne. Scalable Compression and Transmission of
Internet Multicast Video. Ph.D. thesis, The
University of California, Berkeley, CA, December
1996. S. R. McCanne, V. Jacobson, and M.
Vetterli. Receiver-Driven Layered Multicast,
in Proc. SIGCOM, pages 117130, Stanford, CA,
August 1996. ACM.
9Protocols - 14.2.2
- streaming on demand requires many protocols at
different levels - This section covers a subset of the protocols
described in week 2 of this class - RTP Real-Time Protocol
- RTSP Real-Time Streaming Protocol
- RTCP Real-Time Control Protocol
- SIP Session Initiation Protocol
10Real-time streaming protocol (RTSP)
- RFC 2326
- At the topmost level
- application level protocol
- protocols for content discovery
- connection to specific streaming media server
- Content discovery is done out of band
- eg. http//www.microsoft.com/directory/contentname
.asx - http//www.realnetworks.com/directory/contentnam
e.ram - http//www.apple.com/directory/contentname.mov
- URL pointing to metadata that references a
separate file on a webserver - different for each type asx, ram, mov
- Client contacts server using URL for the content.
- eg. rtsp//wms.microsoft.com/directory/contentname
.wmv - rtsp//helixserver.example.com/audio1.rm?start5
5end125 - rtsp//qtserver.apple.com/directory/contentname.
mov - Prefix indicates the streaming protocol used
- Suffix info to the server, eg. seek, play speed,
etc.
11Example of auxiliary file
- Microsoft ASX file
- ltASX Version"3.0"gt
- ltENTRYgt
- ltREF HREF"mms//streamingmedia/studios/0505/24721
/MTV_XBOX_preview_160k.wmv" /gt - lt/ENTRYgt
- ltENTRYgt
- ltREF HREF"mms//winmedianw/studios/0505/24721/MTV
_XBOX_preview_160k.wmv" /gt - lt/ENTRYgt
- lt/ASXgt
- RealNetworks RAM file
- First URL that opens a related info pane.
- rtsp//helixserver.example.com/video3.rm?rpcontext
height350 - rpcontextwidth300rpcontexturl"http//www.examp
le.com/relatedinfo2.html" - rpcontexttime5.5rpvideofillcolorrgb(30,60,200)
-
- Second URL that keeps the same related info
pane, - but changes the media playback panes
background color. - rtsp//helixserver.example.com/video4.rm?rpcontext
url_keep
Figure 14.2
12Streaming protocol
- commands typically sent reliably over TCP
connection (many forms) - Real Time Streaming
- Protocol (RTSP)
- is widely adopted
- (RFC 2326)
- Idea is simple but SET_PARAMETER can be
complicated - a media file may have multiple streams for audio
and video for different languages, subtitles,
source coding rates, etc.
13Real-time protocol (RTP)
- Client is able to specify which lower level data
transport protocol to use - data transport is usually either
- RTP over UDP, or
- RTP over TCP
- Both are preferred for bandwidth efficiency
- RTP over UDP - must be a means of transmission
rate and error control - no standard means of transmission rate and error
control for RTP - HTTP over TCP may be used when avoiding firewall
issues
14Real time control protocol (RTCP)
- RFC 3551
- often used with RTP
- often receivers provide statistical feedback to
sender (reports) - the interoperable and proprietary features limit
the use as a standard
15Windows Media system
- RTP over UDP
- normally transmission rate control based on
source coding rate of content - client can detect congestion
- signal server to lower or increase source coding
rate
16Alternative methods of transmission rate control
- 1) TFRC TCP-friendly rate control
- 2) TCP-like congestion control algorithm
- Both are being standardised as two profiles in
Datagram congestion control protocol (DCCP) - Must be paired with a source coding algorithm so
that coding rate is same as transmission rate - Source coding rate control algorithm
- Eg. rate-distortion optimised (RaDiO) scheduling
algorithm - error control in Windows Media use selective
retransmission - gaps sends a NAK to the server (negative
acknowledgement), causing retransmission - audio has higher priority than video
- Windows media players stalls if missing audio
packets and waits for arrival
17File formats - 14.2.3
- Challenging to adapt fixed media file to
- various network and client conditions
- encoding must be done before streaming (no
knowledge of context) - allow flexibility into media file
- Unrealistic to
- compress or transcode to needs of every client
- best way is to allow server to select which parts
of the file to stream
18Some streaming formats
- The Major players
- MPEG-4 format
- QuickTime format (MPEG-4 is based)
- RealMedia format
- Microsoft Advanced streaming format (ASF)
- All have ability to contain/multiplex multiple
media and versions of each medium - recorded into a track (MPEG-4/QT) or stream (ASF)
- data units made of chunks (MPEG-4/QT) or packets
(ASF)
19Streaming formats
- Each has a header containing metadata relating to
overall file and specific tracks or streams - title, author, date, encryption, right
managements, table of contents, track/stream
enumeration their descriptions - Information on individual track/stream properties
- start time, duration, bit rate, buffer size,
sampling rate, picture size, scalability
capabilities - Time-varying metadata can be associated with each
track/stream - network packetisation, decoding and presentation
time stamps, SMPTE time codes, key frame, switch
frame - Two types of metadata
- static metadata size independent of length of
data, inexpensive to transmit over the network - time-varying metadata size grows with data,
expensive to transmit
20Streaming formats
-
- provides a structure to allow a method to select
parts of data to transmit - Either
- course grained server streams only a particular
subset of streams to client - fine grained in addition allows fraction of the
data to be chosen - Can set a Lagrange multiplier parameter which
determines which data units are not transmitted
21Encoding media into a stream
- Two methods
- 1) Multibit rate (MBR)
- multiple independent encodings (each with varying
coding rates) are stored in separate streams (in
same file) - choice in which streams to play
- 2) scalable coding
- later on section 14.3.3
22Data units
- use packets
- eg. H.264/AVC use Network Adaption Layer (NAL)
- In general, local playback/storage not suitable
for streaming - hard for server to choose the right portions of
the file to stream - difficult to randomly access (seek) arbitrary
points in the stream
23Overview
- Section 14.2
- Overview of
- Architectures
- Protocols
- Format issues
- Section 14.3
- Buffering and timing fundamentals
- Section 14.4
- How media data is communicated for streaming on
demand - NOT COVERED - Section 14.5
- Live broadcast
24Fundamental abstractions - 14.3
- Fundamental abstractions of streaming media on
demand (Section 14.3) - Section covers
- leaky bucket models of bit streams
- constant bit rate (CBR) vs. variable bit rate
(VBR) - compound (multiple media) streams
- preroll delay
- playback speed timing
- timing
- clocks
- decoder and presentation timestamps
- Should know when it is safe for client to begin
playback
25Buffering and leaky bucket models
- Scenario 1 - constant bit rate (CBR)
- isochronous noiseless communication channel
- encoder buffer in between encoder and channel
- decoder buffer in between channel and decoder
- schedule sequence of bits which successive bits
in an encoded bit stream pass a given point in
pipeline
isochronous - equal amounts of data are
communicated in equal amounts of time
Figure 14.3
Figure 14.4
B bits Encoding buffer Decoding buffer
Encoding buffer
Decoding buffer
26Buffer tube
- Can view previous as a buffer tube
- Characterised with 3 parameters
- R - slope
- B - height in bits
- Fe - offset/fullness from bottom of tube
- Or by Fd - offset from top of tube
- Fd B - Fe Can view previous as a buffer tube
- From a buffer point of view
- overflow in of encoder buffer gt decoder buffer
underflow - underflow in of encoder buffer gt decoder buffer
overflow - B encoder buffer decoder buffer
- Fe - initial fullness of encoder buffer
- managed by a rate control algorithm
- assigns a number of bits b(n) to each frame n
27Buffer tube
- Managed by a rate control algorithm
- assigns a number of bits b(n) to each frame n
- B encoder buffer decoder buffer
- Fe - initial fullness of encoder buffer
- De initial delay before entering channel De
Fe/R - Dd Fd/R delay after data extracted by the
decoder from the channel
(R,B,F) tube
Aim to keep decoder buffer delay Dd Fd/R low
Figure 14.5
28Variable bit rate stream (VBR)
- Scenario 2 - variable bit rate stream (VBR)
- Unlike CBR, VBR has a variable amount of data per
time segment - higher bitrate for complex segments
- lower bitrate for less complex segments
- tend to have wider buffer streams
- gt larger start-up delay
- part of an overall problem difficult to
determine the average bit rate of system
29Variable bit rate stream (VBR)
- Recall the (R,B,F) tube
- each parameter is not unique
- for a given bit stream
- Definitions of average rate is non trivial
- fit the closest slope along the stairwell, or
- number of bits in stream / duration of stream
30Variable bit rate
- encoder does not use channel continuously
- channel has peak transmission rate R higher than
average stream bit rate - when needed, sends packets at rate R
- otherwise at 0
- typical of packet network and shared channels
- best modelled by leaky bucket
- Defined by (R, B, Fe)
- n frame number
- b(n) number of bits placed in leaky bucket
- t(n) time that frame n is processed
- R bit rate of data leaked out of bucket
- Fe(n) fullness of en. buffer before frame n added
- Be(n) fullness of en. buffer after frame n added
- has schedule
31Leaky bucket
- Be(n) fullness of encoder buffer after frame n
added to bucket - Fe(n) fullness of encoder buffer before frame n
added to bucket - Be(n) lt B for all n 0, 1, N
- Aim is to find smallest decoder buffer size and
smallest decoder buffer delay
32Leaky bucket
- For a given stream, define
- Minimum bucket capacity with leak rate R and
given initial fullness Fe - Bmin(R,Fe) minnBe(n)
- Initial decoder buffer fullness
- Derives that there is a minimum capacity B as
well as minimum decoder buffer delay Dd Fd / R,
provided it starts with initial fullness Fe
Femin (R) - Source coding rate (Rc) maximum leak rate R such
that a leaky bucket (R, B, Fe) does not underflow
with initial fullness Fe Femin(R) - larger leak rates R gt smaller required capacity
33Leaky bucket
- If transmission rate R gt source coding rate Rc
- Decoder buffer reduced
- Decoder buffer delay
- also reduced
- client can determine required
- buffer size and preroll delay
- use functions Bmin(R) and Fdmin(R)
- computed off line at set of transmission rates
- R, R1 lt R2 lt lt RL
- stored in the bit stream header as a set of leaky
bucket parameters (Ri , Bi , Fi ) - where Bi Bmin(Ri) and Fi Fdmin(Ri)
- each i ? L represents the breakpoints in
piecewise linear function in Bmin(R) and Fdmin(R) - can estimate by linear interpolation (and
extrapolation at ends) at any point R can
estimate Bmin(R) and Fdmin(R)
Figure 14.7
34Leaky bucket
- Linear interpolation of Bmin(R) and Fdmin(R)
35Compound streams (section 14.3.2)
- Compound streams encapsulate many streams meant
to played and streamed concurrently - view as a single compound stream and a set of
leaky buckets - a leaky bucket (B,F,R) is the sum of its
component leaky buckets - eg. If audio has bucket (Ra,Ba,Fa), and video has
bucket (Rv,Bv,Fv), then parameters sum - R Ra Rv
- B Ba Bv
- F Fa Fv
- Find a combination of each leaky bucket s.t. the
combined leaky bucket wont overflow
36Compound streams
- Find a combination of each leaky bucket s.t. the
combined leaky bucket wont overflow - combination of i in La and j in Lv
- minimising using Lagrangian shows that there are
at most La Lv index pairs, that lie on set - can extend this into M concurrent media streams
37Multibit rate (MBR)
- multiple independent encodings (each with varying
coding rates) are stored in separate streams (in
same file) - choice in which streams to play
- mutually independent, each at different source
coding rates - combining all possible mutually exclusive streams
(eg. audio Na and video Nv) each with a different
leaky bucket - most combinations of Na Nv not likely,
typically are Na Nv - use distortion rate approach
38Distortion-rate approach
- Decide which streams to pair
- assign a distortion Dia and source coding rate
Ria to each audio stream in i 0 Na - assign a distortion Djv and source coding rate
Rjv to each video stream in j 0 Nv - For each (i,j) combined stream, define distortion
and source coding rate - Where a arbitrary weight relative to video
distortion - using Lagrangian again, can find the lowest total
distortion among all combinations with same or
lower total bit rate - can extend this to other sets of media
39Temporal coordinate systems and timestamps
(section 14.3.4)
- Each frame has a decoder timestamp (DTS) in (MPEG
terminology) - instructs client when to decode it
- also acts as a decoding deadline
- presentation buffer holds decoded frames before
the renderer - assigned presetation timestamp (PTS), instructs
when to play - critical in synchronising different streams
- PTS are a layer above the DTS
- Note that presentation order ? decoding order
- Eg. I0, B1, B2, P3, B4, B5, P6, ... (presentation
order) - I0, P3, B1, B2, P6, B4, B5, ... (decoding
order) - assumed that frames are time stamped with DTS and
PTS - book will only use DTS
40clocks (temporal coordinate system)
- media time t clock for device used to capture
and timestamp original content (real time) - client time t clock for device playing content
- eg.
- tDTS(0), tDTS(1), etc.
- tDTS(0), tDTS(1), etc.
- Converting is done by
- Where
- v is the playback rate (v2 gt playing 2x the
speed) - t0 and t0 are common initial events (first frame
after seeking/rebuffering)
41Leaky bucket update
- Leaky bucket update becomes
- where
- R Rv is the arrival rate of bits into client
(unit bits/client time) - R R/v rate that must be used to compute
required buffer size Bemin(R) and initial
decoder buffer fullness - preroll delay is Fdmin(R)/R Fdmin(R)/Rv
- larger playback speed gt smaller preroll delay
42Overview
- Section 14.2
- Overview of
- Architectures
- Protocols
- Format issues
- Section 14.3
- Buffering and timing fundamentals
- Section 14.4
- How media data is communicated for streaming on
demand - NOT COVERED - Section 14.5
- Live broadcast
43Packet networks - 14.4
- RC source coding rate
- RS sending rate - rate at which data injected
into transport layer - Measured in bits/s of client time
- RX transmission rate - rate which data injected
into network layer (TCP or UDP) - RX - RS error control overhead
- RS / RX channel coding rate
- Ra arrival rate
- assumed to be RS
- usually set to Ra vRc
- Decoupling Rc and Ra has advantages
Figure 14.8a
44 Decoupling Ra vRc
- Adjusting source coding rate defined by problem
source coding rate control - Choose Rc as a function of Ra
- Change client buffer duration and history
- Have variety of average bit rates R(1), R(2),
- Each with tight buffer tube (R(i),B(i),Fe(i))
- Can delay playback to ensure guaranteed
continuous playback
45Control theoretic model - 14.4.2.1
- Client buffer - gap between frame arrival time
ta(n) and its playback deadline td(n) - Overflow when gap too large
- Underflow when gap too small
- If gap shrinks, must reduce Rc to adjust tb(n)
Figure 14.9
46Control Objective - 14.4.2.2
- Underflow prevented by previous section
- Quality fluctuates to complexity of content
- Target schedule has a margin of safety
- Introduces a penalty to the cost function
- Deviation of buffer tube from target schedule
- Coding rate difference between successive frames
47Target schedule design - 14.4.2.3
- Want smallest client buffer duration
- Start with small delay, and increase gap
- Slope is the average source coding rate to the
average arrival rate - If upper bound aligns
- with target schedule
- tb(n) tT(n)
- Eventually want logarithmic growth of buffer
Figure 14.10
48Controller design - 14.4.2.4
- Adjust source coding rate
- Controller needs to change n2 frame at time n
- Uses notion of an error e(n) and a vector
feedback gain G - Optimal G is solved
49Controller interpretation - 14.4.2.6
- Virtual frame rate is used to reduce feedback
rate and as it is difficult to specify a frame
rate for merged streams - Start with source coding rate 1/2 of arrival rate
to build up the client buffer duration
Figure 14.11a