Title: Potential Costs and Benefits of Longterm Prefetching for CDNs
1Potential Costs and Benefits of Long-term
Prefetching for CDNs
- Arun Venkataramani, Praveen Yalagandula, Ravi
Kokku - Sadia Sharif, Mike Dahlin
- Laboratory for Advanced Systems Research (LASR)
- Dept. of Computer Sciences, UT Austin
- 21 June 2001
- WCW01, Boston
2Talk focus
- Aggressive replication can significantly improve
hit rates at modest costs. - Simple selection algorithm
- -Key parameters
- Object popularity
- Object update rate
3Outline
- Motivation
- Our approach
- Evaluation
- Conclusions and future work
4Motivation (contd)
- Passive caching limited
- - typical hit rates - 20 to 40
- Impact of prefetching
5Technology trends favor aggressive replication
- Storage is cheap
- Today less than 200/100GB
- Network prices are falling
- Improving at gt 100 per year
- New technologies
- - Lower cost of prefetch traffic Byers98,
Crovella98 -
- User time is valuable
6Short-term vs. long-term prefetching(LTP)
- Short-term prefetching
- Use last few requests to predict next few
requests - Widely studied Bestavros95, Padm96, Cunha97,
Cohen98 - Long-term prefetching
- Replicate and update globally popular objects
- Future work
- combine short-term and long-term prefetching
7Naive algorithm fails
Bandwidth consumed (Kbps)
Object Hit Rate
Number of objects
8Bandwidth Equilibrium
- Equilibrium
- Rate of incoming objects Rate of outgoing
objects
9Bandwidth Equilibrium
Prefetch increases rate
New Equilibrium
Rate of object insertion (ReqRateMissRate(X))
Objects/second
Original Equilibrium
Prefetching long-lived objects reduces
invalidation rate
Rate of object invalidation
X (Number of fresh objects in cache)
10Threshold Algorithm
- Threshold probability that a prefetched object
is accessed before it changes - PgoodFetch 1 (1 Pi)lf(i)req_rate ,
- lf(i) object is expected
lifetime, - req_rate avg. arrival rate of
requests. - Pi object is probability of access
- Prefetch object i if
- PgoodFetch(i) gt T
- Bandwidth blow-up is at most 1/T
- Equivalent to the value-density heuristic for the
0-1 Knapsack problem (NP-complete) - - within a constant factor 2x, of the optimal
11Evaluation Methodology
- Analytic evaluation
- knowledge of global popularity
- lacks temporal locality
- Trace-based simulation
- exhibits temporal locality
- real object sizes, arrival rates/patterns
- opportunity to test predictors
12Analytic Evaluation
-
- Assumptions
- - Poisson model of request arrival Cho00
- - Fixed universe of one billion objects
- - Zipf popularity distribution, with parameter
-0.982 Breslau99 - - Sizes follow a lognormal pareto distribution
Crovella98 - - Object lifetimes distribution obtained from
Douglis97 - - No correlation between lifetimes, sizes,
popularities Crovella98, Breslau99 -
13Analytic results hit rate
Arrival rate (req/sec)
- Significant improvements in hit rate for small
thresholds - e.g. for T0.1, ar10/sec hit rate improvement
13 - Benefits across a wide range of cache sizes
14Analytic results - costs
1e5
1e5
1e4
1e4
T0.01
T0.1
T0.01
1000
1000
Steady State Bandwidth(in Kbps)
Steady State Cache Size(in GB)
T0.1
100
100
T0.5
T0.5
T0.9
T0.9
Demand
Demand
10
10
0.1
1
10
100
0.1
1
10
100
Arrival rate (req/sec)
(Arrival rate req/sec)
- increase is modest for large caches
- At T 0.1, ar 10/sec
- Total bandwidth lt 2demand bandwidth
- Total cache size lt 2steady-state cache size
15Trace-based simulation
- Input 12 day (Mar00) trace obtained from UC
Squid cache - 10 million accesses, 4.2 million unique objects
- LRU based demand cache
- Query URLs considered uncacheable
- Object lifetimes synthetically generated
- Predictors
- 1 has future knowledge, precomputes
popularities - 2 learns popularity from past observations
- Performance comparision against
- Demand cache
- EverFresh
16Trace results hit rate
1
0.8
Predictor1
EverFresh
0.6
Object Hit Rate
Demand
0.4
0.2
Predictor2
0
0.1
1
Threshold
- Predictor2 performs close to EverFresh
- Predictor1 gives huge improvements
- At T0.1, hit rate improvement 10 for
Predictor2
17Trace-based results - bandwidth
1000
800
Predictor1
600
Predictor2
Bandwidth (in Kbps)
400
Demand
200
0
0.1
1
1/Threshold
- Both predictors are within a 2X blow-up for T gt
0.1 - Extremely low Ts are conceivable gt aggressive
replication
18Conclusions
- LT prefetching must consider both popularity and
lifetime - LT prefetching can significantly improve hit
rates at modest costs - Analysis shows benefits for a wide range of cache
sizes - Simple predictors work well
19Research Agenda
- Two level prefetching system
- - long-term prefetcher at the proxy/CDN
- - short-term prefetcher at browser level
- Improve statistics gathering and prediction
- - cooperating caches
- - server assisted hints
- Extension to a cooperative caching system
- - object placemnet problem - podc01
- Minimize interference of prefetching on demand
traffic -
20Threshold vs. cache size
60
50
40
Overall cache size(in GB)
30
20
10
Demand
Predictor1
0
0.1
1
1/Threshold
21Reduction to 0-1 Knapsack
- Problem Input
- A universe of n objects
- Popularity distribution pi , i in 1,n
- Lifetime distribution li
- Available bandwidth B, infinite cache
- Goal To compute set S of objects to prefetch
- To maximize hit rate
- value PgoodFetch size(i) / lf(i)
- cost size(i) / lf(i)
- Value-density PgoodFetc
22Trace based results - bandwidth
1000
800
Predictor2
600
Bandwidth (in Kbps)
400
Demand
Predictor1
200
0
0.1
1
1/Threshold
- Both predictors remain within a 2X blow-up
23Challenge Large working set
- Zipf popularity distribution
- pi C / i a
24Motivation
- Passive caching limited
- - typical hit rates - 20 to 40
-