Understanding User Behavior in Large Scale VideoonDemand Systems - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Understanding User Behavior in Large Scale VideoonDemand Systems

Description:

... Barbara ... UC Santa Barbara. Source of Data. 21,498,338 sessions in 219 days; 7, ... UC Santa Barbara. User arrival rate. 0-27 arrivals per 5 seconds, do ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 40
Provided by: Qing83
Category:

less

Transcript and Presenter's Notes

Title: Understanding User Behavior in Large Scale VideoonDemand Systems


1
Understanding User Behavior in Large Scale
Video-on-Demand Systems
  • Hongliang Yu, Dongdong Zheng, Ben Y. Zhao, Weimin
    Zheng
  • Tsinghua University and UC Santa Barbara
  • Eurosys Conference 2006

2
Motivation
  • VOD the future of media networks
  • Select your favorite movies as you like, any
    time, anywhere, impressive
  • In China, up to Jan. 2005, 8 million VOD users, 5
    million of them use it frequently, increasing
    with a rate of 35 per year (China
    Telecommunication Newspaper).
  • In global view, 90 million VOD users in 2003, 138
    million users in 2005, 327 million users
    estimated in 2010 (Information Media Group)
  • Most of current system are not True VOD Business
    Reasons? Technical Reasons?

3
Motivation
  • Characteristics of VOD
  • Multi data source
  • Asynchronous data stream
  • High interactivity, VCR
  • Challenges
  • High Network Bandwidth
  • High Random I/O capacity
  • Technical approaches
  • Caching Policies
  • Data replication
  • Distributed content delivery
  • Providing VOD service to a huge number of clients
    in a scalable way still unsolved

4
Motivation
  • Challenging to address user behavior model for
    VOD system optimization
  • Little knowledge about the user behavior of
    deployed large scale VOD system, chicken and egg?
  • Current researchers based their studies on rental
    data from video stores, or small scale VOD
    systems, or web streaming services
  • Video rental lack of enough video title, limited
    physical copy
  • Web streaming narrow band service, smaller file
    size, bad video quantities, affects user behavior
    much

5
The focus of this paper
  • Things useful to video streaming system design
    and maintenance
  • How about the user-arrival rate in such a system
  • In what situation, people like to keep their
    patience
  • What part of content people tend to visit
  • How user interests change over time
  • What features should we keep in such services?

6
Source of Data
  • Log data from an infrastructure based large scale
    video on demand service deployed in China
  • The total user of the system is over 1.5 million
    users, use a regional data contains about 150
    thousand users
  • 21,498,338 sessions in 219 days 7,036 movies
    involved
  • Movie length 38.23, gt90min 41.76, 45-90min
  • Average data rate is about 384Kbps (512K ADSL
    support)

7
Outline
  • Motivation
  • Source of Data
  • Poisson Distribution
  • Session Length
  • User Interests
  • Summary

8
User arrival rate
P R O B
User Arrivals per 5 sec
  • 0-27 arrivals per 5 seconds, do not match the
    Poisson

9
User arrival rate
P R O B
User Arrivals per 5 sec
  • 0-27 arrivals per 5 seconds, do not match the
    Poisson
  • Guess System Idle time may be responsible for
    the failure of Poisson

10
User Arrival Pattern
P R O B
User Arrivals per 5 sec
  • Using data from rush hour(6PM to 9PM), similar
    shape with Poisson

11
User Arrival Pattern
P R O B
User Arrivals per 5 sec
  • Using data from rush hour(6PM to 9PM), similar
    shape with Poisson

12
User Arrival Pattern
P R O B
User Arrivals per 5 sec
  • Using data from rush hour(6PM to 9PM), similar
    shape with Poisson
  • Modified version of Poisson fit well with real
    workload

, X0,1,2,
13
Indication
  • The Poisson distribution underestimates the
    possibility of small arrival cases and it
    over-estimates the probability of large arrivals,
    inefficient resource reservation
  • With modified model, you can design the maximum
    user arrival rate (N) according to user
    requirement and investment plan

14
Outline
  • Motivation
  • Source of Data
  • Poisson Distribution
  • Session Length
  • User Interests
  • Summary

15
Session length impatient audience
C D F
Session Length (Minutes)
  • 37 users terminate their session in the first 5
    minutes
  • 52.55 in 10 minutes
  • 75 in 25 minutes

16
Session length related with popularity?
C D F
NSL
  • NSL a ratio of SessionLength / VideoLength
  • Expected Movies with higher popularity have
    longer session length.

17
Session length related with popularity?
C D F
NSL
  • NSL a ratio of SessionLength / VideoLength

18
Session length related with popularity?
C D F
NSL
  • NSL a ratio of SessionLength / VideoLength
  • Movies with HIGHER popularity tend to have
    SHORTER session length!

Surprise!!!
19
Session length related with popularity?
C D F
NSL
  • NSL a ratio of SessionLength / VideoLength
  • The relation between movie popularity and session
    length does exists, but not so strong

20
Example caching optimization
  • Movie A

A0
A1
A2
A3
  • Movie A is the most popular movie, movie B
    second, Movie C last

Movie B
B0
B1
B2
B3
Movie C
C0
C1
C2
C3
Caching Priority
A0
A1
A2
A3
B0
B1
B2
B3
C0
C1
C2
C3
A0
A1
B0
C0
B1
C1
A2
B2
B3
C2
A3
C3
  • The latter priority list is more reasonable
  • Not all part of the most popular movie should be
    stressed

21
Example ALM optimization
  • Movie 1,2,3,4 from least popular to most popular

Viewing movie 2
Viewing movie 4
C
A
Viewing movie 1
Viewing movie 3
D
B
Viewing movie 3
Viewing movie 2
Viewing movie 4
Viewing movie 1
B
A
C
D
The right ALM tree has a better chance to be
stable
22
Indication
  • Caching the prefix is effective
  • Popularity may not reflect the potential of the
    content
  • In Ebay, high reputation user concede much higher
    reputation in latter time than they owned
  • In Powerinfo VOD, high reputation movies are not
    always so attractive, people only attracted by
    its reputation
  • Caching policy based on content segment
    popularity counting is more effective
  • Set the node viewing relative colder contents
    to the position near the root of ALM tree will be
    effective

23
Outline
  • Motivation
  • Source of Data
  • Poisson Distribution
  • Session Length
  • User Interests
  • Summary

24
User interests distribution
C D F
Movie Index (sorted by popularity)
  • 10 objects covering 60 of accesses
  • 23 objects got 80 of the hits

25
User interests transferring
R A T E
Hour
  • User interest changes slowly

26
Understanding popularity recommendation
A D A
video sorted by maximum daily access
  • ADA average daily access / maximum daily access

27
Indication
  • User interests change slowly
  • Interest inducement user interests can be
    induced, with mechanisms like movie
    recommendation
  • Features like movie recommendation are
    performance benefit

28
Summary
  • Indications
  • Poisson over-estimates the probability of large
    arrivals
  • Caching and forwarding with regards to content
    popularity will be necessary
  • Use features like content recommendation benefits
    caching policy much
  • Future work
  • VCR studies
  • Data set open
  • Optimization deployment

29
Thanks!!!
  • Any Questions?

30
Backup
  • Eurosys Conference 2006

31
Global Infrastructure
Edge Server
Edge Server
Regional Server
Regional Server
Regional Server
Edge Server
Regional Server
Edge Server
WAN
Regional Server
Central Server
Central Server
Regional Server
Regional Server
Central Server
Central Server
Regional Server
Central Server
Regional Server
Regional Server
Regional Server
Regional Server
Regional Server
Edge Server
Regional Server
Edge Server
http//www.powerinfo.com.cn
Edge Server
32
Session length a close view
  • Three kinds of spikes 1mins, 5mins, whole length

33
Session length related with popularity?
A S L
  • There is no strong relations between movie
    popularity and session length

34
User request distribution
  • Differ with Gummadi and Gribble in Kazaa log
    analysis fetch-at-most-once model
  • Fit with Zipf farely well except for the ending
    part, big tail

35
User request distribution
  • Many suspicions from different works to Zipf
    distribution
  • The Kolmogorov-Smirnov test is very useful to
    decide if a sample comes from a population with a
    specific distribution, and it is defined as

36
User request distribution
0 am to 12am, 07/08/2004, 23,484
sessions, coldest dayskew factor 0.18783
11am to 23 pm, 10/01/2004, 76,771
sessions, hottest dayskew factor 0.21712
  • Checking the validation of Zipf by using
    Kolmogorov-Smirnov Goodness-of-Fit test
  • In total 219 days, skew factor changes between 0
    and 0.34847, average skew factor is 0.1987

37
Understanding popularity External factors
  • Surprise 1 sudden drop from top 15
  • Surprise 2 old movie review

38
User interests transferring
R A T E
Week Day
  • User interest changes slowly
  • In a system with a fixed set of object
    candidates, there will probably be little
    transferring of user interest

39
Understanding popularity recommendation
A D A
video sorted by maximum daily access
  • ADA average daily access / maximum daily access
  • Recommendation has great impact on popularity
Write a Comment
User Comments (0)
About PowerShow.com