On the scale and performance of cooperative Web proxy caching - PowerPoint PPT Presentation

About This Presentation
Title:

On the scale and performance of cooperative Web proxy caching

Description:

Hit ratio for a Web proxy grows logarithmically with the ... Two sites analyzed. UW. 200 organizations. Microsoft Campus ... Another proxy holds the requested ... – PowerPoint PPT presentation

Number of Views:179
Avg rating:3.0/5.0
Slides: 57
Provided by: informatio125
Category:

less

Transcript and Presenter's Notes

Title: On the scale and performance of cooperative Web proxy caching


1
On the scale and performanceof cooperative Web
proxy caching
  • 2/3/06

2
Related Work
  • Static Analysis
  • request rate,
  • number of requests
  • diversity of population
  • Trace driven cache simulation
  • Temporal locality of proxy traces
  • Web page requests follow Ziph-like distribution
  • the number of requests to the ith
  • most popular document is proportional to 1/i_at_
    for some constant _at_

3
Related Work
  • Hit ratio for a Web proxy grows logarithmically
    with the
  • Client population of the proxy and the number of
    requests seen by the proxy

4
Coop Web Caching Work
  • reduce access latency
  • Bandwidth consumption
  • Hierachical
  • Directory based
  • Multicast based
  • Upshot cooperation requires cache manager to
    determine when to cooperate

5
WebDoc Sharing and Caching
  • 4 questions
  • 1. What is the best performance one could achieve
  • perfect cooperative caching?
  • 2. For what range of client populations can
    cooperative
  • caching work effectively?
  • 3. Does the way in which clients are assigned to
  • matter?
  • 4. What cache hit rates are necessary to achieve
    worthwhile decreases in document access latency?

6
WebDoc Sharing and Caching
  • Two sites analyzed
  • UW
  • 200 organizations
  • Microsoft Campus
  • Big corporation large population, number of
    proxies

7
WebDoc Sharing and Caching
  • Work specifically not done
  • Did not investigate the effects of cooperative
    caching on server load
  • General
  • Hot spot conditions

8
  • First results

9
Trace Collection
  • Existing traces inadequate
  • New design
  • UW has few proxies
  • Traces are anonymized
  • Grouped based on organizational membership
  • Track organizations across the campus
  • Table 1 shows that 7 days gave 108 million
    requests by 60,000 clients to 360,000 servers in
    the Microsoft trace.

10
Simulation methodology
  • Real caches will incur misses due to capacity
    limitations that we do not model.
  • Capacity misses are rarely the bottleneck for Web
    caches.
  • For example, only 3 of the requests to the MS
    Web proxies missed due to the finite capacity of
    the proxies (which have 9GB of RAM and 180GB of
    disk capacity).

11
Simulation Methodology
  • Practical cache
  • models the cacheability of documents according to
    the algorithms in the Squid V2 implementation
  • Ideal cache
  • All documents are cacheable i.e. equal
    cacheability for all
  • Upper bound of improvement on workloads due to
    improvements in internet protocols

12
Population size
  • In a cooperative-caching scheme, a proxy forwards
    a missing request to other proxies to determine
    if
  • Another proxy holds the requested document
  • The document can be returned faster than a
    request to the server.

13
Population size
  • A collection of cooperating caches will achieve
    the hit-rate of a single proxy acting over the
    combined population of all the proxies.
  • Proxies will pay the overheads of inter-proxy
    communication latency.
  • Examining a single, top-level proxy thus gives us
    an upper on cooperative-caching performance.

14
Population size
15
Population size
16
Hit rate vs. latency and bandwidth
  • Latency, not hit rate is crucial to clients
  • To ISPs hit-rate bandwidth savings improving
    congestion

17
Hit rate vs. latency and bandwidth
18
Hit rate vs. latency and bandwidth
19
Proxies and organizations
  • If high localily small population
  • Then achieve max hit rate

20
Question
  • What benefit would clients in real organizations
    see if their proxies were to cooperate with other
    real organizational proxies?
  • UW environment is an attempted answer
  • each org acts like business, with own proxy
    sitting on its connection to the Internet.
  • Each org categorized into 1 of 200 UW
    organizations

21
Proxies and organizations
22
  • Question is grouping of clients to proxies, for
    example, one based on each clients document
    interests, better?
  • Soln
  • Clustering algo used optimize intracluster
    sharing
  • randomly assigned clusters have a consistently
    lower hit rate than the optimally clustered
    organizations.

23
Proxies and organizations
24
Impact of larger population size
  • Recall cooperative caching can increase hit rate
  • This indicates that there is little correlation
    between sharing and the cacheability of documents
    for the UW population.
  • cooperative caching among populations larger than
    2.4 million does not increase the hit rate to
    cacheable documents

25
Impact of larger population size
  • Experiment have MS and UW cooperate
  • Results
  • When scaled by equal factors, MS gains more
    benefit by cooperating with the UW population
    than the UW population gains by cooperating with
    MS.
  • Unpopular documents are universally unpopular ?
    a request in UW or MS will not find either proxy
  • 1/500 (first access) has hit rate increase
    regardless of popularity

26
Proxies and organizations
27
Impact of larger population size
28
Docs and Proxy Sharing Summary
  • 1. The behavior of cooperative caching is
    characterized
  • by two different regions of the hit rate vs.
    population
  • curve. For smaller populations, hit rate
    increases
  • rapidly with population it is in this region
    that cooperative
  • caching can be used effectively. However, these
  • population sizes can be handled by a single
    proxy.
  • Cooperative caching is only necessary to adapt to
    proxy assignments made for political or
    geographical reasons.

29
Docs and Proxy Sharing Summary
  • 2. Larger populations (beyond the knee of the
    population vs. hit rate curve), cooperative
    caching is unlikely to provide significant
    benefit.
  • Simultaneous traces of the MS and UW populations
    show that via cooperative caching
  • UW 4x increase of population via cooperative
    caching netted only a 2.7 increase in cacheable
    hit rate.

30
Docs and Proxy Sharing Summary
  • 3. MS and others show clustering does occur but
    that cooperative caching specialized to interest
    groups is unlikely to be effective.

31
Docs and Proxy Sharing Summary
  • 4. Previous work has hinted at the general
    trends, but CC end conclusions have not been show
    yet

32
An analytic model of Web accesses
  • Steady-state performance
  • The model
  • Model parameters
  • Performance of large scale proxy caching
  • Summary

33
Steady-state performance
34
The Model
  • Population has N clients
  • n total documents
  • The important characteristic of a Zipf-like
    distribution is that it is heavy-tailed a
    significant fraction of the probability mass is
    concentrated in the tail, which in this case
    means that a significant fraction of requests go
    to the relatively unpopular documents.

35
The Model
  • Ziph Distribution Cotd
  • Popularity of document is proportional to 1 / i_at_
  • As _at_ increases, the distribution becomes less
    heavy-tailed, and a larger fraction of the
    probability mass is concentrated on the most
    popular documents

36
The Model
  • The probability that a requested document is
    cacheable is pc.
  • Avg document size is E(S). Document size is
    independent of document popularity, latency, and
    change rate
  • The last-byte latency to the server that houses
    that document has average value E(L). Last-byte
    latency is independent of document popularity and
    document change rate.

37
The Model
  • Performance Characteristics

38
The Model
  • Performance characteristics continued
  • The expected last-byte latency to serve a request
    is given by
  • average bandwidth savings per request due to
    proxy caching

39
The Model
  • Differences between new and previous work
  • we consider the steady state behavior of caching
    systems rather than caching behavior based on a
    finite request sequence
  • Incorporate document change rate into the model
    rather than assuming that documents are static
  • Goal use our model to understand the performance
    of large-scale, cooperative-caching schemes in
    terms of hit rate, latency, bandwidth savings,
    and storage consumed.

40
Model parameters
  • UW Trace

41
Model parameters
42
Performance of large scale proxy caching
  • Hit rate, latency, and bandwidth
  • Document rate of change
  • Client request rate
  • Document popularity and size of the Web

43
Performance of large scale proxy caching
44
Document popularity and size of the Web
  • Zipf parameter _at_
  • documents n
  • Increasing _at_ (alpha) skews the distribution
    towards popular documents significantly
    increasing hit rates for slower rates of change
  • Slight increase in hit rates for faster rates of
    change.
  • Increase the number of documents n shifts the
    curves for slow and fast rates of change to
    larger populations
  • This population shift is in proportion to the
    increase in n
  • n3.2 billion ? the slow curve reaches a 90 hit
    rate at a population of 250,000
  • n32 billion ? the slow curve reaches a 90 hit
    rate at a population of 25 million
  • n320 billion ? the slow curve reaches a 90 hit
    rate at a population of 250 million.

45
Model Summary
  • analytic model used to examine the steady-state
    performance of cooperative caching schemes.
  • small populations achieve most of the performance
    benefits of cooperative caching.

46
(No Transcript)
47
(No Transcript)
48
(No Transcript)
49
Wrap Up
  • 1. Without client behavior changes
  • little point in continuing design and evaluation
    of highly scalable, cooperative-caching schemes
  • Cooperative caching makes sense up to the level
    of a medium-sized city

50
Wrap Up
  • The largest benefit for cooperative caching is
    achieved for relatively small populations.
  • Analysis of cooperation among small organizations
    within the university environment.
  • Traces of UW and MS confirmed marginal benefit of
    cooperative caching among organizations with
    populations of 20K clients or more. (large)
  • Scale this big only in in very high-bandwidth,
    low-latency environments.

51
Wrap Up
  • Performance of cooperative caching limited by
    document cacheability.
  • Increasing cacheability of documents is the main
    challenge for future Web cache behavior research

52
Wrap Up
  • Cluster-based analysis of client access patterns
    indicate
  • cooperative-caching organizations based on mutual
    interest offer no obvious advantages over
    randomly assigned or organization-based
    groupings.

53
Wrap Up
  • Fundamentally, the usefulness of cooperative Web
    proxy caching depends upon the scale at which it
    is being applied.
  • Whether or not they use cooperative caching
    locally, large organizations should use proxy
    caching for their user populations.
  • Concern cooperative caching only marginally
    helpful

54
Wrap Up
  • Results shown are on static data
  • Shift in user workflow will change i.e.
    streaming media
  • Average size is magnitudes larger
  • Reveal better utilization of network resources
    necessary
  • Size and time of transfer for streaming objects
    shows that multicast methods might be good

55
Assumptions
  • Static objects like web pages, documents
  • User populations not too large
  • Network latency, performance medium

56
Questions?
Write a Comment
User Comments (0)
About PowerShow.com