Cooperative Web Caching Using ServerDirected Proxy Sharing - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Cooperative Web Caching Using ServerDirected Proxy Sharing

Description:

2. Proxy looks up object cache sites in its Cache Site Directory. If found, then skip next step. ... 5. Proxy requests object from fastest site. Site returns ... – PowerPoint PPT presentation

Number of Views:251
Avg rating:3.0/5.0
Slides: 55
Provided by: sandy5
Category:

less

Transcript and Presenter's Notes

Title: Cooperative Web Caching Using ServerDirected Proxy Sharing


1
Cooperative Web Caching Using Server-Directed
Proxy Sharing
  • Sandra G. Dykes
  • Ph.D. Dissertation Proposal
  • University of Texas at San Antonio

2
Talk Outline
  • Problem Internet performance
  • Dissertation overview
  • Web server workloads
  • Taxonomy related research
  • SDP cache mesh
  • Simulation
  • Prototype

3
Internet Performance
  • Slow, variable response time
  • Denial-of-service (Cannot connect)
  • due to
  • Congestion at network routers, NAPs, MAEs
  • Overloaded servers
  • Interaction of TCP/IP with HTTP

4
Internet Caching
Server
Internet
Proxy Server
Proxy Server
User User
User
User User
User
5
Internet Caching
  • The Internet needs a scalable cache system.
  • Cache designs should consider network traffic.
  • Simulations should consider network traffic.

6
Dissertation Proposal
  • Design, simulate and implement a protocol for
    cooperative web caching
  • Server-Directed Proxy Sharing (SDP).
  • Develop methods to improve simulation of network
    cache designs.

7
Phases of Work
  • Collect and analyze Web server traces
  • Specify web cache design
  • Simulate protocol
  • Implement prototype

8
Web Server Workloads Arlitt Williamson,
Bestavros,et.al., Mogul, Gwertzman Seltzer
  • Exponential growth
  • Skewed popularity 90 requests ? 10 files
  • Small objects Mean
  • HTML and images 90 req, 54-92 bytes
  • Remote requests 70
  • Duplicate requests 97
  • Long lifetimes 50 - 100 days
  • Bursty arrivals
  • Geography affects popularity
  • Flash crowds and hot spots

9
UTSA Web Server Traces
  • CS Division
  • May 1997 - Sept 1997
  • 561,292 requests
  • 3.8 GB (24 MB/day)
  • Visualization and Image Processing Lab
  • May 1996 - Sept 1997
  • 552,046 requests
  • 4.2 GB (8 MB/day)

10
Growth in Server Load UTSA-CS
11
Popularity SkewUTSA-CS, UTSA-VIS
12
Object Size - TransfersUTSA-CS, UTSA-VIS
13
Workload Comparison
UTSA-CS UTSA-VIS
Literature
  • Growth
  • 10 objects satisfy
  • Object size
  • Mean
  • Distribution
  • HTML and images
  • Requests
  • Bytes
  • Remote requests
  • Duplicate transfers

Yes Yes 69 req. 79
req. il 92 96 77
45 82 78 99
99
Exponential 90 req.
90 52 - 94 70 97
14
Object Types
UTSA
Literature
Transfers Avg Size
Transfers Avg Size
HTML Image Audio Video Dynamic
43 4 KB 51 12 KB 0.3
200 KB 0.5 452 KB 2 1 KB
42 4 KB 54 13 KB 5
179 KB 0.4 2300 KB 0-9 5 KB
42 3.3

Embedded Images Images per page
( 3 - 5 ? )
14
15
Implications for Web caching
  • Web caching is viable
  • Small percent of objects satisfy most requests
  • Long object lifetimes
  • Dynamic objects are small fraction of load
  • Bandwidth and latency are both important
  • Consider geography and network topology
  • Adapt quickly to shifts in popularity
  • Use Web page structure (embedded images)

16
Current Internet Caching
  • Proxy server caching
  • Hierarchical proxy server caching

17
Proxy Server Caching
Web Server
Proxy Server
Proxy Server
Proxy Server
18
Proxy server caching is not enough
  • Hit rates depend upon overlap in user requests,
    so need many users or users accessing the same
    objects.
  • Maximum achievable hit rates (8 cache, 2-4
    months)
  • Boston Univ. 53
  • Virginia Tech. 29, 30, 46
  • DEC - Glassman 30 - 50

19
Hierarchical Proxy Server Caching (Harvest /
Squid)
Web Server
NLANR

Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
20
Network Caching Research
  • Taxonomy for network caching
  • Research projects

21
Taxonomy for Network Caching
Discovery
Dissemination
Delivery
  • Fixed Cache
  • Group Query
  • Manual
  • Automatic
  • Cache Site Directory

Client-initiated Server-initiated
Direct Indirect
21
22
Taxonomy for Network Caching
Discovery
Dissemination
Delivery
  • Fixed Cache
  • Group Query
  • Manual
  • Automatic
  • Cache Site Directory

Client-initiated Server-initiated
Direct Indirect
22
23
Taxonomy for Network Caching
Discovery
Dissemination
Delivery
  • Fixed Cache
  • Group Query
  • Manual
  • Automatic
  • Cache Site Directory

Client-initiated Server-initiated
Direct Indirect
23
24
Taxonomy for Network Caching
Discovery
Dissemination
Delivery
  • Fixed Cache
  • Group Query
  • Manual
  • Automatic
  • Cache Site Directory

Client-initiated Server-initiated
Direct Indirect
24
25
Web Caching Projects
Project
Delivery
Discovery
Dissemination
Direct Indirect Indirect Indirect Direct Direct Di
rect Direct
Client-initiated Client-initiated Client-initiated
Client-initiated Server-initiated Server-initiate
d Server-initiated Client-initiated
Proxy server cache Harvest / Squid Zhang, Floyd,
Jacobson Malpani, Lorch, Berger Gwertzman
Seltzer Bestavros, et.al. Tewari, Dahlin, Vin,
... SDP
Fixed cache Group query Manual Group query
Auto Group query Auto Directory
Centralized Directory Centralized Directory
Hierarchical Directory Lazy mesh
26
SDP Design Choices
  • Discovery Cache Site Directory
  • ?(1) discovery if hit in local cache site
    directory
  • Fixed cache and non-hierarchical group query do
    not scale.
  • Cache hierarchies have large miss penalty
    (response time)
  • Dissemination Client-initiated
  • Automatically adapts to shifts in popularity.
  • Uses current data while server-initiated uses
    historic data.
  • Delivery Direct
  • Less network traffic and lower response times.
  • Indirect delivery under HTTP is store-and-forward.

27
Cache Site Directory
  • Discovery, dissemination delivery of location
    info
  • Similar to object caching, but smaller size makes
    prefetching viable.
  • Directory organization
  • Cache consistency
  • Overhead
  • Keep low - dont spend more time getting location
    info than objects!

28
Server-Directed Proxy Sharing (SDP)
  • Proxy server caches share objects
  • Flat mesh of caches

28
29
Web Server
NLANR

Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy
Proxy Caching
Hierarchical Proxy Caching
Web Server
Proxy
Proxy
Proxy
SDP
30
SDP Components
Cache
Server
Proxy
Proxy Table
Cache Site Directory
Popular List
30
31
SDP Protocol
1. Proxy looks for object in its cache. If
found, then done.
Cache
Proxy
31
32
SDP Protocol
2. Proxy looks up object cache sites in its
Cache Site Directory. If found, then skip
next step.
Proxy
Cache Site Directory
32
33
SDP Protocol
3. Proxy sends request to Server. Server
returns object and/or list of cache sites.
Cache Object
REQUEST
Server
Proxy
OBJECT and/or Cache Sites
Proxy Table Request info
Cache Site Directory Cache Sites for object
33
34
SDP Protocol
4. Proxy determines fastest sites.
ECHO
ECHO
1st REPLY
ECHO
34
35
SDP Protocol
5. Proxy requests object from fastest site.
Site returns object Popular List.
Cache Object
Cache
REQUEST
Proxy i
OBJECT POPULAR LIST
Popular List
Cache Site Directory Proxy i Popular List
35
36
SDP Protocol
For Web pages, Proxy requests embedded images in
parallel from different cache sites, using the
fastest sites.
REQUEST GIF 1
Proxy i
GIF 1 POPULAR LIST
REQUEST GIF 2
Proxy j
GIF 2 POPULAR LIST

36
37
Advantages of Design
  • Technical
  • Reduces Denial-of-service.
  • Designed to reduce server load AND response time.
  • Network correct - designed to reduce
    congestion.
  • Practical
  • No change to routers or HTTP protocol.
  • No change to local cache policy.
  • No central administration, equipment or
    personnel.

38
Design Issues Questions
  • Cache consistency
  • Stale object
  • Object removed from cache
  • What are hit rates at Cache Site Directories?
  • Use r.t. measurements or static criteria to
    choose cache site?

39
Simulation
  • Analytical workload
  • Preliminary isolated single server model
  • Next Step
  • model network traffic, multiple servers and cache
    sites
  • consider using Berkeley ns network simulator

40
Why Analytical Workloads?
  • Model the functional dependence of performance
    metrics on workload variables.
  • Separate the effects of workload variables.
  • Predict behavior for different workloads.

41
Single Server Simulation
  • Sessions Ta varied Exp (a)
  • Embedded images Ta 221 ms
    Log-Normal (a,b)
  • Connection duration Ts 289 ms Log-Normal
    (a,b)
  • Session Probability P0.59
  • HTML with images P 0.13
  • HTML no images P 0.30
  • Non-HTML P 0.16
  • Embedded image P 0.41

41
42
Size and Type Distributions
  • Object type P_request Mean Size Size
    Distribution
  • HTML 0.430 4 KB Pareto (a)
  • Image 0.506 11 KB Pareto (a)
  • Audio 0.003 140 KB Pareto (a)
  • Video 0.004 452 KB Pareto (a)
  • Application 0.007 260 KB Pareto (a)
  • Dynamic 0.019 1 KB Pareto (a)
  • Other 0.031 11 KB Pareto (a)

42
43
Server Load - MB Transferred
44
Server Load - Requests received
45
Connection Refusals
46
Effect on network congestion
  • ?

47
Prototype Design
SDP Proxy Server
SDP Server
HTTP Proxy Server
HTTP Server
HTTP Proxy Server
SDP PROTOCOL
HTTP
HTTP
48
Contributions
  • Taxonomy
  • Network cache design
  • Scalable design Direct delivery AND
    Internet-wide sharing
  • Cache site directory Lazy mesh organization
  • Cache site selection Dynamic vs. static
    criteria
  • Web page structure Concurrent requests from
    different sites
  • Network cache simulation
  • Analytical workload
  • Model network traffic, multiple servers caches
  • Estimate relative response times
  • Implementation

48
49
  • Clinton Jeffery
  • Samir Das
  • Garry Bernal
  • Shannon Williams

50
(No Transcript)
51
Cache Site Selection
  • Static
  • Geography
  • Hops
  • Capacity BW
  • Average BW
  • Dynamic
  • Latency measurement (ping)
  • BW measurement (? ping)

52
How is server load reduced?
  • Balance request load across servers and proxy
    server caches.

53
How is congestion reduced?
  • Selecting cache sites from run-time response
    measurements helps use less congested routes.
  • Retrieving embedded objects from multiple sites
    helps distribute network traffic.

54
How is response time reduced?
  • Discovery
  • Web pages - piggyback Proxy List onto HTML text.
  • Cache Site Directories and piggybacked Popular
    Lists.
  • Delivery
  • Direct
  • Choose cache site from run-time estimates of
    response time.
  • Retrieve embedded images in parallel from
    multiple sites.
Write a Comment
User Comments (0)
About PowerShow.com