On the scale and performance of cooperative Web proxy caching - PowerPoint PPT Presentation

About This Presentation

Title:

On the scale and performance of cooperative Web proxy caching

Description:

Hit ratio for a Web proxy grows logarithmically with the ... Two sites analyzed. UW. 200 organizations. Microsoft Campus ... Another proxy holds the requested ... – PowerPoint PPT presentation

Number of Views:179

Avg rating:3.0/5.0

Slides: 57

Provided by: informatio125

Learn more at: http://fac-staff.seattleu.edu

Category:

more less

Transcript and Presenter's Notes

Title: On the scale and performance of cooperative Web proxy caching

1
On the scale and performanceof cooperative Web
proxy caching

2/3/06

2
Related Work

Static Analysis
request rate,
number of requests
diversity of population
Trace driven cache simulation
Temporal locality of proxy traces
Web page requests follow Ziph-like distribution
the number of requests to the ith
most popular document is proportional to 1/i_at_
for some constant _at_

3
Related Work

Hit ratio for a Web proxy grows logarithmically
with the
Client population of the proxy and the number of
requests seen by the proxy

4
Coop Web Caching Work

reduce access latency
Bandwidth consumption
Hierachical
Directory based
Multicast based
Upshot cooperation requires cache manager to
determine when to cooperate

5
WebDoc Sharing and Caching

4 questions
1. What is the best performance one could achieve
perfect cooperative caching?
2. For what range of client populations can
cooperative
caching work effectively?
3. Does the way in which clients are assigned to
matter?
4. What cache hit rates are necessary to achieve
worthwhile decreases in document access latency?

6
WebDoc Sharing and Caching

Two sites analyzed
UW
200 organizations
Microsoft Campus
Big corporation large population, number of
proxies

7
WebDoc Sharing and Caching

Work specifically not done
Did not investigate the effects of cooperative
caching on server load
General
Hot spot conditions

First results

9
Trace Collection

Existing traces inadequate
New design
UW has few proxies
Traces are anonymized
Grouped based on organizational membership
Track organizations across the campus
Table 1 shows that 7 days gave 108 million
requests by 60,000 clients to 360,000 servers in
the Microsoft trace.

10
Simulation methodology

Real caches will incur misses due to capacity
limitations that we do not model.
Capacity misses are rarely the bottleneck for Web
caches.
For example, only 3 of the requests to the MS
Web proxies missed due to the finite capacity of
the proxies (which have 9GB of RAM and 180GB of
disk capacity).

11
Simulation Methodology

Practical cache
models the cacheability of documents according to
the algorithms in the Squid V2 implementation
Ideal cache
All documents are cacheable i.e. equal
cacheability for all
Upper bound of improvement on workloads due to
improvements in internet protocols

12
Population size

In a cooperative-caching scheme, a proxy forwards
a missing request to other proxies to determine
if
Another proxy holds the requested document
The document can be returned faster than a
request to the server.

13
Population size

A collection of cooperating caches will achieve
the hit-rate of a single proxy acting over the
combined population of all the proxies.
Proxies will pay the overheads of inter-proxy
communication latency.
Examining a single, top-level proxy thus gives us
an upper on cooperative-caching performance.

14
Population size
15
Population size
16
Hit rate vs. latency and bandwidth

Latency, not hit rate is crucial to clients
To ISPs hit-rate bandwidth savings improving
congestion

17
Hit rate vs. latency and bandwidth
18
Hit rate vs. latency and bandwidth
19
Proxies and organizations

If high localily small population
Then achieve max hit rate

20
Question

What benefit would clients in real organizations
see if their proxies were to cooperate with other
real organizational proxies?
UW environment is an attempted answer
each org acts like business, with own proxy
sitting on its connection to the Internet.
Each org categorized into 1 of 200 UW
organizations

21
Proxies and organizations
22

Question is grouping of clients to proxies, for
example, one based on each clients document
interests, better?
Soln
Clustering algo used optimize intracluster
sharing
randomly assigned clusters have a consistently
lower hit rate than the optimally clustered
organizations.

23
Proxies and organizations
24
Impact of larger population size

Recall cooperative caching can increase hit rate
This indicates that there is little correlation
between sharing and the cacheability of documents
for the UW population.
cooperative caching among populations larger than
2.4 million does not increase the hit rate to
cacheable documents

25
Impact of larger population size

Experiment have MS and UW cooperate
Results
When scaled by equal factors, MS gains more
benefit by cooperating with the UW population
than the UW population gains by cooperating with
MS.
Unpopular documents are universally unpopular ?
a request in UW or MS will not find either proxy
1/500 (first access) has hit rate increase
regardless of popularity

26
Proxies and organizations
27
Impact of larger population size
28
Docs and Proxy Sharing Summary

1. The behavior of cooperative caching is
characterized
by two different regions of the hit rate vs.
population
curve. For smaller populations, hit rate
increases
rapidly with population it is in this region
that cooperative
caching can be used effectively. However, these
population sizes can be handled by a single
proxy.
Cooperative caching is only necessary to adapt to
proxy assignments made for political or
geographical reasons.

29
Docs and Proxy Sharing Summary

2. Larger populations (beyond the knee of the
population vs. hit rate curve), cooperative
caching is unlikely to provide significant
benefit.
Simultaneous traces of the MS and UW populations
show that via cooperative caching
UW 4x increase of population via cooperative
caching netted only a 2.7 increase in cacheable
hit rate.

30
Docs and Proxy Sharing Summary

3. MS and others show clustering does occur but
that cooperative caching specialized to interest
groups is unlikely to be effective.

31
Docs and Proxy Sharing Summary

4. Previous work has hinted at the general
trends, but CC end conclusions have not been show
yet

32
An analytic model of Web accesses

Steady-state performance
The model
Model parameters
Performance of large scale proxy caching
Summary

33
Steady-state performance
34
The Model

Population has N clients
n total documents
The important characteristic of a Zipf-like
distribution is that it is heavy-tailed a
significant fraction of the probability mass is
concentrated in the tail, which in this case
means that a significant fraction of requests go
to the relatively unpopular documents.

35
The Model

Ziph Distribution Cotd
Popularity of document is proportional to 1 / i_at_
As _at_ increases, the distribution becomes less
heavy-tailed, and a larger fraction of the
probability mass is concentrated on the most
popular documents

36
The Model

The probability that a requested document is
cacheable is pc.
Avg document size is E(S). Document size is
independent of document popularity, latency, and
change rate
The last-byte latency to the server that houses
that document has average value E(L). Last-byte
latency is independent of document popularity and
document change rate.

37
The Model

Performance Characteristics

38
The Model

Performance characteristics continued
The expected last-byte latency to serve a request
is given by
average bandwidth savings per request due to
proxy caching

39
The Model

Differences between new and previous work
we consider the steady state behavior of caching
systems rather than caching behavior based on a
finite request sequence
Incorporate document change rate into the model
rather than assuming that documents are static
Goal use our model to understand the performance
of large-scale, cooperative-caching schemes in
terms of hit rate, latency, bandwidth savings,
and storage consumed.

40
Model parameters

UW Trace

41
Model parameters
42
Performance of large scale proxy caching

Hit rate, latency, and bandwidth
Document rate of change
Client request rate
Document popularity and size of the Web

43
Performance of large scale proxy caching
44
Document popularity and size of the Web

Zipf parameter _at_
documents n
Increasing _at_ (alpha) skews the distribution
towards popular documents significantly
increasing hit rates for slower rates of change
Slight increase in hit rates for faster rates of
change.
Increase the number of documents n shifts the
curves for slow and fast rates of change to
larger populations
This population shift is in proportion to the
increase in n
n3.2 billion ? the slow curve reaches a 90 hit
rate at a population of 250,000
n32 billion ? the slow curve reaches a 90 hit
rate at a population of 25 million
n320 billion ? the slow curve reaches a 90 hit
rate at a population of 250 million.

45
Model Summary

analytic model used to examine the steady-state
performance of cooperative caching schemes.
small populations achieve most of the performance
benefits of cooperative caching.

46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
Wrap Up

1. Without client behavior changes
little point in continuing design and evaluation
of highly scalable, cooperative-caching schemes
Cooperative caching makes sense up to the level
of a medium-sized city

50
Wrap Up

The largest benefit for cooperative caching is
achieved for relatively small populations.
Analysis of cooperation among small organizations
within the university environment.
Traces of UW and MS confirmed marginal benefit of
cooperative caching among organizations with
populations of 20K clients or more. (large)
Scale this big only in in very high-bandwidth,
low-latency environments.

51
Wrap Up

Performance of cooperative caching limited by
document cacheability.
Increasing cacheability of documents is the main
challenge for future Web cache behavior research

52
Wrap Up

Cluster-based analysis of client access patterns
indicate
cooperative-caching organizations based on mutual
interest offer no obvious advantages over
randomly assigned or organization-based
groupings.

53
Wrap Up

Fundamentally, the usefulness of cooperative Web
proxy caching depends upon the scale at which it
is being applied.
Whether or not they use cooperative caching
locally, large organizations should use proxy
caching for their user populations.
Concern cooperative caching only marginally
helpful

54
Wrap Up

Results shown are on static data
Shift in user workflow will change i.e.
streaming media
Average size is magnitudes larger
Reveal better utilization of network resources
necessary
Size and time of transfer for streaming objects
shows that multicast methods might be good

55
Assumptions