Potential Costs and Benefits of Longterm Prefetching for CDNs - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Potential Costs and Benefits of Longterm Prefetching for CDNs

Description:

... Computer Sciences, UT ... Dept. of Computer Sciences, UT Austin. 21 June 2001 ... Department of Computer Sciences, UT Austin. Short-term vs. long-term ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 25
Provided by: aru448
Category:

less

Transcript and Presenter's Notes

Title: Potential Costs and Benefits of Longterm Prefetching for CDNs


1
Potential Costs and Benefits of Long-term
Prefetching for CDNs
  • Arun Venkataramani, Praveen Yalagandula, Ravi
    Kokku
  • Sadia Sharif, Mike Dahlin
  • Laboratory for Advanced Systems Research (LASR)
  • Dept. of Computer Sciences, UT Austin
  • 21 June 2001
  • WCW01, Boston

2
Talk focus
  • Aggressive replication can significantly improve
    hit rates at modest costs.
  • Simple selection algorithm
  • -Key parameters
  • Object popularity
  • Object update rate

3
Outline
  • Motivation
  • Our approach
  • Evaluation
  • Conclusions and future work

4
Motivation (contd)
  • Passive caching limited
  • - typical hit rates - 20 to 40
  • Impact of prefetching

5
Technology trends favor aggressive replication
  • Storage is cheap
  • Today less than 200/100GB
  • Network prices are falling
  • Improving at gt 100 per year
  • New technologies
  • - Lower cost of prefetch traffic Byers98,
    Crovella98
  • User time is valuable

6
Short-term vs. long-term prefetching(LTP)
  • Short-term prefetching
  • Use last few requests to predict next few
    requests
  • Widely studied Bestavros95, Padm96, Cunha97,
    Cohen98
  • Long-term prefetching
  • Replicate and update globally popular objects
  • Future work
  • combine short-term and long-term prefetching

7
Naive algorithm fails
Bandwidth consumed (Kbps)
Object Hit Rate
Number of objects
8
Bandwidth Equilibrium
  • Equilibrium
  • Rate of incoming objects Rate of outgoing
    objects

9
Bandwidth Equilibrium
Prefetch increases rate
New Equilibrium
Rate of object insertion (ReqRateMissRate(X))
Objects/second
Original Equilibrium
Prefetching long-lived objects reduces
invalidation rate
Rate of object invalidation
X (Number of fresh objects in cache)
10
Threshold Algorithm
  • Threshold probability that a prefetched object
    is accessed before it changes
  • PgoodFetch 1 (1 Pi)lf(i)req_rate ,
  • lf(i) object is expected
    lifetime,
  • req_rate avg. arrival rate of
    requests.
  • Pi object is probability of access
  • Prefetch object i if
  • PgoodFetch(i) gt T
  • Bandwidth blow-up is at most 1/T
  • Equivalent to the value-density heuristic for the
    0-1 Knapsack problem (NP-complete)
  • - within a constant factor 2x, of the optimal

11
Evaluation Methodology
  • Analytic evaluation
  • knowledge of global popularity
  • lacks temporal locality
  • Trace-based simulation
  • exhibits temporal locality
  • real object sizes, arrival rates/patterns
  • opportunity to test predictors

12
Analytic Evaluation
  • Assumptions
  • - Poisson model of request arrival Cho00
  • - Fixed universe of one billion objects
  • - Zipf popularity distribution, with parameter
    -0.982 Breslau99
  • - Sizes follow a lognormal pareto distribution
    Crovella98
  • - Object lifetimes distribution obtained from
    Douglis97
  • - No correlation between lifetimes, sizes,
    popularities Crovella98, Breslau99

13
Analytic results hit rate
Arrival rate (req/sec)
  • Significant improvements in hit rate for small
    thresholds
  • e.g. for T0.1, ar10/sec hit rate improvement
    13
  • Benefits across a wide range of cache sizes

14
Analytic results - costs
1e5
1e5
1e4
1e4
T0.01
T0.1
T0.01
1000
1000
Steady State Bandwidth(in Kbps)
Steady State Cache Size(in GB)
T0.1
100
100
T0.5
T0.5
T0.9
T0.9
Demand
Demand
10
10
0.1
1
10
100
0.1
1
10
100
Arrival rate (req/sec)
(Arrival rate req/sec)
  • increase is modest for large caches
  • At T 0.1, ar 10/sec
  • Total bandwidth lt 2demand bandwidth
  • Total cache size lt 2steady-state cache size

15
Trace-based simulation
  • Input 12 day (Mar00) trace obtained from UC
    Squid cache
  • 10 million accesses, 4.2 million unique objects
  • LRU based demand cache
  • Query URLs considered uncacheable
  • Object lifetimes synthetically generated
  • Predictors
  • 1 has future knowledge, precomputes
    popularities
  • 2 learns popularity from past observations
  • Performance comparision against
  • Demand cache
  • EverFresh

16
Trace results hit rate
1
0.8
Predictor1
EverFresh
0.6
Object Hit Rate
Demand
0.4
0.2
Predictor2
0
0.1
1
Threshold
  • Predictor2 performs close to EverFresh
  • Predictor1 gives huge improvements
  • At T0.1, hit rate improvement 10 for
    Predictor2

17
Trace-based results - bandwidth
1000
800
Predictor1
600
Predictor2
Bandwidth (in Kbps)
400
Demand
200
0
0.1
1
1/Threshold
  • Both predictors are within a 2X blow-up for T gt
    0.1
  • Extremely low Ts are conceivable gt aggressive
    replication

18
Conclusions
  • LT prefetching must consider both popularity and
    lifetime
  • LT prefetching can significantly improve hit
    rates at modest costs
  • Analysis shows benefits for a wide range of cache
    sizes
  • Simple predictors work well

19
Research Agenda
  • Two level prefetching system
  • - long-term prefetcher at the proxy/CDN
  • - short-term prefetcher at browser level
  • Improve statistics gathering and prediction
  • - cooperating caches
  • - server assisted hints
  • Extension to a cooperative caching system
  • - object placemnet problem - podc01
  • Minimize interference of prefetching on demand
    traffic

20
Threshold vs. cache size
60
50
40
Overall cache size(in GB)
30
20
10
Demand
Predictor1
0
0.1
1
1/Threshold
21
Reduction to 0-1 Knapsack
  • Problem Input
  • A universe of n objects
  • Popularity distribution pi , i in 1,n
  • Lifetime distribution li
  • Available bandwidth B, infinite cache
  • Goal To compute set S of objects to prefetch
  • To maximize hit rate
  • value PgoodFetch size(i) / lf(i)
  • cost size(i) / lf(i)
  • Value-density PgoodFetc

22
Trace based results - bandwidth
1000
800
Predictor2
600
Bandwidth (in Kbps)
400
Demand
Predictor1
200
0
0.1
1
1/Threshold
  • Both predictors remain within a 2X blow-up

23
Challenge Large working set
  • Zipf popularity distribution
  • pi C / i a

24
Motivation
  • Passive caching limited
  • - typical hit rates - 20 to 40
Write a Comment
User Comments (0)
About PowerShow.com