Title: On the scale and performance of cooperative Web proxy caching
1On the scale and performanceof cooperative Web
proxy caching
2Related Work
- Static Analysis
- request rate,
- number of requests
- diversity of population
- Trace driven cache simulation
- Temporal locality of proxy traces
- Web page requests follow Ziph-like distribution
- the number of requests to the ith
- most popular document is proportional to 1/i_at_
for some constant _at_
3Related Work
- Hit ratio for a Web proxy grows logarithmically
with the - Client population of the proxy and the number of
requests seen by the proxy
4Coop Web Caching Work
- reduce access latency
- Bandwidth consumption
- Hierachical
- Directory based
- Multicast based
- Upshot cooperation requires cache manager to
determine when to cooperate
5WebDoc Sharing and Caching
- 4 questions
- 1. What is the best performance one could achieve
- perfect cooperative caching?
- 2. For what range of client populations can
cooperative - caching work effectively?
- 3. Does the way in which clients are assigned to
- matter?
- 4. What cache hit rates are necessary to achieve
worthwhile decreases in document access latency?
6WebDoc Sharing and Caching
- Two sites analyzed
- UW
- 200 organizations
- Microsoft Campus
- Big corporation large population, number of
proxies
7WebDoc Sharing and Caching
- Work specifically not done
- Did not investigate the effects of cooperative
caching on server load - General
- Hot spot conditions
8 9Trace Collection
- Existing traces inadequate
- New design
- UW has few proxies
- Traces are anonymized
- Grouped based on organizational membership
- Track organizations across the campus
- Table 1 shows that 7 days gave 108 million
requests by 60,000 clients to 360,000 servers in
the Microsoft trace.
10Simulation methodology
- Real caches will incur misses due to capacity
limitations that we do not model. - Capacity misses are rarely the bottleneck for Web
caches. - For example, only 3 of the requests to the MS
Web proxies missed due to the finite capacity of
the proxies (which have 9GB of RAM and 180GB of
disk capacity).
11Simulation Methodology
- Practical cache
- models the cacheability of documents according to
the algorithms in the Squid V2 implementation - Ideal cache
- All documents are cacheable i.e. equal
cacheability for all - Upper bound of improvement on workloads due to
improvements in internet protocols
12Population size
- In a cooperative-caching scheme, a proxy forwards
a missing request to other proxies to determine
if - Another proxy holds the requested document
- The document can be returned faster than a
request to the server.
13Population size
- A collection of cooperating caches will achieve
the hit-rate of a single proxy acting over the
combined population of all the proxies. - Proxies will pay the overheads of inter-proxy
communication latency. - Examining a single, top-level proxy thus gives us
an upper on cooperative-caching performance.
14Population size
15Population size
16Hit rate vs. latency and bandwidth
- Latency, not hit rate is crucial to clients
- To ISPs hit-rate bandwidth savings improving
congestion
17Hit rate vs. latency and bandwidth
18Hit rate vs. latency and bandwidth
19Proxies and organizations
- If high localily small population
- Then achieve max hit rate
20Question
- What benefit would clients in real organizations
see if their proxies were to cooperate with other
real organizational proxies? - UW environment is an attempted answer
- each org acts like business, with own proxy
sitting on its connection to the Internet. - Each org categorized into 1 of 200 UW
organizations
21Proxies and organizations
22- Question is grouping of clients to proxies, for
example, one based on each clients document
interests, better? - Soln
- Clustering algo used optimize intracluster
sharing - randomly assigned clusters have a consistently
lower hit rate than the optimally clustered
organizations.
23Proxies and organizations
24Impact of larger population size
- Recall cooperative caching can increase hit rate
- This indicates that there is little correlation
between sharing and the cacheability of documents
for the UW population. - cooperative caching among populations larger than
2.4 million does not increase the hit rate to
cacheable documents
25Impact of larger population size
- Experiment have MS and UW cooperate
- Results
- When scaled by equal factors, MS gains more
benefit by cooperating with the UW population
than the UW population gains by cooperating with
MS. - Unpopular documents are universally unpopular ?
a request in UW or MS will not find either proxy - 1/500 (first access) has hit rate increase
regardless of popularity
26Proxies and organizations
27Impact of larger population size
28Docs and Proxy Sharing Summary
- 1. The behavior of cooperative caching is
characterized - by two different regions of the hit rate vs.
population - curve. For smaller populations, hit rate
increases - rapidly with population it is in this region
that cooperative - caching can be used effectively. However, these
- population sizes can be handled by a single
proxy. - Cooperative caching is only necessary to adapt to
proxy assignments made for political or
geographical reasons.
29Docs and Proxy Sharing Summary
- 2. Larger populations (beyond the knee of the
population vs. hit rate curve), cooperative
caching is unlikely to provide significant
benefit. - Simultaneous traces of the MS and UW populations
show that via cooperative caching - UW 4x increase of population via cooperative
caching netted only a 2.7 increase in cacheable
hit rate.
30Docs and Proxy Sharing Summary
- 3. MS and others show clustering does occur but
that cooperative caching specialized to interest
groups is unlikely to be effective.
31Docs and Proxy Sharing Summary
- 4. Previous work has hinted at the general
trends, but CC end conclusions have not been show
yet
32An analytic model of Web accesses
- Steady-state performance
- The model
- Model parameters
- Performance of large scale proxy caching
- Summary
33Steady-state performance
34The Model
- Population has N clients
- n total documents
- The important characteristic of a Zipf-like
distribution is that it is heavy-tailed a
significant fraction of the probability mass is
concentrated in the tail, which in this case
means that a significant fraction of requests go
to the relatively unpopular documents.
35The Model
- Ziph Distribution Cotd
- Popularity of document is proportional to 1 / i_at_
- As _at_ increases, the distribution becomes less
heavy-tailed, and a larger fraction of the
probability mass is concentrated on the most
popular documents
36The Model
- The probability that a requested document is
cacheable is pc. - Avg document size is E(S). Document size is
independent of document popularity, latency, and
change rate - The last-byte latency to the server that houses
that document has average value E(L). Last-byte
latency is independent of document popularity and
document change rate.
37The Model
- Performance Characteristics
38The Model
- Performance characteristics continued
- The expected last-byte latency to serve a request
is given by - average bandwidth savings per request due to
proxy caching
39The Model
- Differences between new and previous work
- we consider the steady state behavior of caching
systems rather than caching behavior based on a
finite request sequence - Incorporate document change rate into the model
rather than assuming that documents are static - Goal use our model to understand the performance
of large-scale, cooperative-caching schemes in
terms of hit rate, latency, bandwidth savings,
and storage consumed.
40Model parameters
41Model parameters
42Performance of large scale proxy caching
- Hit rate, latency, and bandwidth
- Document rate of change
- Client request rate
- Document popularity and size of the Web
43Performance of large scale proxy caching
44Document popularity and size of the Web
- Zipf parameter _at_
- documents n
- Increasing _at_ (alpha) skews the distribution
towards popular documents significantly
increasing hit rates for slower rates of change - Slight increase in hit rates for faster rates of
change. - Increase the number of documents n shifts the
curves for slow and fast rates of change to
larger populations - This population shift is in proportion to the
increase in n - n3.2 billion ? the slow curve reaches a 90 hit
rate at a population of 250,000 - n32 billion ? the slow curve reaches a 90 hit
rate at a population of 25 million - n320 billion ? the slow curve reaches a 90 hit
rate at a population of 250 million.
45Model Summary
- analytic model used to examine the steady-state
performance of cooperative caching schemes. - small populations achieve most of the performance
benefits of cooperative caching.
46(No Transcript)
47(No Transcript)
48(No Transcript)
49Wrap Up
- 1. Without client behavior changes
- little point in continuing design and evaluation
of highly scalable, cooperative-caching schemes - Cooperative caching makes sense up to the level
of a medium-sized city
50Wrap Up
- The largest benefit for cooperative caching is
achieved for relatively small populations. - Analysis of cooperation among small organizations
within the university environment. - Traces of UW and MS confirmed marginal benefit of
cooperative caching among organizations with
populations of 20K clients or more. (large) - Scale this big only in in very high-bandwidth,
low-latency environments.
51Wrap Up
- Performance of cooperative caching limited by
document cacheability. - Increasing cacheability of documents is the main
challenge for future Web cache behavior research
52Wrap Up
- Cluster-based analysis of client access patterns
indicate - cooperative-caching organizations based on mutual
interest offer no obvious advantages over
randomly assigned or organization-based
groupings.
53Wrap Up
- Fundamentally, the usefulness of cooperative Web
proxy caching depends upon the scale at which it
is being applied. - Whether or not they use cooperative caching
locally, large organizations should use proxy
caching for their user populations. - Concern cooperative caching only marginally
helpful
54Wrap Up
- Results shown are on static data
- Shift in user workflow will change i.e.
streaming media - Average size is magnitudes larger
- Reveal better utilization of network resources
necessary - Size and time of transfer for streaming objects
shows that multicast methods might be good
55Assumptions
- Static objects like web pages, documents
- User populations not too large
- Network latency, performance medium
56Questions?