Taxonomy and Design Analysis for Distributed Web Caching - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Taxonomy and Design Analysis for Distributed Web Caching

Description:

Proxy server caching is not enough. Hit rates depend upon overlap in user requests: ... Server-initiated. Hierarchical (Group query) Proxy Caching (Fixed Cache) ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 36
Provided by: sandy5
Category:

less

Transcript and Presenter's Notes

Title: Taxonomy and Design Analysis for Distributed Web Caching


1
Taxonomy and Design Analysis for Distributed Web
Caching
  • Sandra G. Dykes Clinton L. Jeffery Samir
    Das
  • Division of Computer Science
  • University of Texas at San Antonio
  • http//www.cs.utsa.edu/research/proxy/proxy.html

2
Outline
  • Web traffic characteristics
  • Taxonomy for distributed web caches
  • Analysis of web cache design
  • Which taxonomy categories best match Web traffic?
  • Server-directed proxy sharing
  • Simulation results

3
Proxy Server Caching
ISP boundary
User
Proxy
User User User
4
Proxy server caching is not enough
  • Hit rates depend upon overlap in user requests
  • high hit rates only if many users or if users
    access the same objects.
  • Maximum proxy cache hit rates - requests
  • DEC 30 - 50
  • Virginia Tech. 27, 28, 43
  • AOL 50
  • Measured hit rates
  • Duska, Marwood and Feeley 24 - 45

5
Web Traffic Characteristics
UTSA Other Small objects lt12 KB lt21
KB Pareto distribution Duplicated
requests gt99 gt97 Skewed popularity 10
objects satisfy 74 req. 90 req. HTML and
images Transfers 94 gt90 Bytes 61 52 - 94
Embedded images Transfers 42 Bytes 34 per
page 3.3 3 - 5
HICSS 32
6
Web Traffic Characteristics
  • Most objects are long-lived
  • (Bestavros, et.al.)
  • HTML 50 days
  • GIF 85 days
  • JPEG 100 days.
  • Popularity varies by region.
  • Popularity can change rapidly.
  • Requests often arrive in bursts.

7
Implications for Web caching
  • Small objects ? latency important
  • Pareto distribution ? bandwidth important
  • Bursty arrivals ? dynamically
    distribute load
  • Most cached objects wont be out-of-date.
  • Adapt quickly to popularity shifts.
  • Utilize popularity skew and Web page information
    (embedded images).
  • Consider geography.
  • Consider network topology.

8
Taxonomy for Web Caching
Discovery Fixed cache Group
query Manual Automatic Directory
lookup Centralized Distributed
Dissemination Client-initiated Server-initiat
ed
Delivery Direct Indirect
HICSS 32
9
Web Server
Web Server


Proxy Caching (Fixed Cache)
Flat mesh (Directory)
Web Server




Hierarchical (Group query)
10
Discovery
  • Distributed metadata directories
  • fast, local lookups
  • separates discovery from site selection
  • metadata can include URLs of related objects,
    timestamps, object size, site performance data,
    ...
  • Fixed cache, non-hierarchical groups, and
    centralized directories
  • do not scale.
  • Hierarchical group query
  • large miss penalties at each level

HICSS 32
11
Dissemination
  • Client-initiated
  • adapts automatically to popularity shifts
  • uses current request patterns.
  • Server-initiated
  • uses historic data
  • may be sensitive to time window
  • proxies do not control what they cache

12
Delivery
  • Direct
  • lower latency
  • fewer connections
  • less network traffic.
  • Indirect
  • Each intermediate site requires a remote
    connection and object transfer.
  • HTTP delivery is store-and-forward, making
    indirection most costly for large files

13
  • How do we propagate metadata?

14
Server-Directed Proxy Sharing
  • Server direction

Web Server
PROXY TABLE
  • Lazy prefetching

Request
Object Popular List
15
SDP Components
Web Server
PROXY TABLE
Metadata Directory Proxies and other
metadata for remote objects.
Popular List Most popular objects in
the local object cache.
Proxy Table Proxies and other metadata for
the servers objects.
16
SDP Protocol
1. Proxy looks in local object cache.
17
SDP Local cache miss
2. Proxy looks in Metadata Directory.
18
SDP Directory miss
3. Proxy requests object from server. Server
returns object metadata for related objects
(server direction).
Request
Web Server
PROXY TABLE
Object Proxy List
19
SDP Site selection
Web Server
Ping
Ping
Response
Ping
20
SDP Proxy-to-Proxy request
Peer proxy returns object metadata for popular
objects (lazy prefetching).
Request
Object Popular List
21
Web server traffic (MB/s)
Simulation Results
Web server requests (req/s)
22
Simulation results
Connections refusals
23
Advantages of SDP design
  • Takes advantage of Web characteristics
  • linked object requests
  • object popularity skew
  • rapid changes in popularity
  • Scalable
  • Internet-wide sharing using local discovery
  • Reduces server load, response time, network
    congestion
  • Separates discovery from site selection
  • Flexible
  • Metadata content can support different purposes
  • No change to routers or HTTP protocol
  • No change to local cache policy
  • No central administration, equipment or personnel

24
Questions and Future Work
  • What will hit rates be at Metadata Directories?
  • Cache site selection
  • Static site bandwidth, network proximity, ...
  • Statistical avg site load, historical
    performance, ...
  • Run-time ping, tcping, bandwidth probe,
  • Implementation

24
25
Web Caching Projects
Project Discovery Dissemin. Delivery Proxy
server cache Fixed cache Client-init. Direct Harv
est / Squid Group query Client-init. Indirect
- Manual Zhang, Floyd, Jacobson Group
query Client-init. Indirect -
Automatic Gwertzman Seltzer Directory Server-in
it. Direct - Centralized Bestavros,
et.al. Directory Server-init. Direct -
Centralized Tewari, Dahlin, Vin,
Directory Server-init. Direct -
Distributed Dykes, Jeffery, Das Directory Client-
init. Direct - Distributed
26
Simulation
  • Analytical workload
  • Preliminary single server model
  • Future extentions
  • multiple servers
  • network protocols router queues

27
Why Analytical Workloads?
  • Model functional dependencies of performance
    metrics on workload variables.
  • Separate the effects of workload variables.
  • Predict behavior for different workloads.

28
Single Server Simulation
  • Sessions Ta varied Exp (a)
  • Embedded images Ta 221 ms Log-Normal (a,b)
  • Connection duration Ts 289 ms
    Log-Normal (a,b)
  • Session Probability P 0.59
  • HTML with images P 0.13
  • HTML no images P 0.30
  • Non-HTML P 0.16
  • Embedded image P 0.41

28
29
Size and Type Distributions
  • Object type P_request Avg Size
    Distribution
  • HTML 0.430 4 KB Pareto
    (a)
  • Image 0.506 11 KB Pareto
    (a)
  • Audio 0.003 140 KB Pareto
    (a)
  • Application 0.007 260 KB Pareto
    (a)
  • Dynamic 0.019 1 KB Pareto
    (a)
  • Other 0.031 11 KB Pareto
    (a)

29
30
Prototype Design
SDP Server
SDP Proxy Server
SDP
31
How does SDP reduce server load?
  • Proxies retrieve objects from other proxies
    without involving the Web server.
  • Balances load between servers and proxies by
    including load estimators in site selection.

32
How does SDP reduce congestion?
  • Direct delivery reduces number of
    store-and-forward
  • object transfers.
  • Using run-time probes for site selection favors
    less congested routes.
  • Retrieving embedded objects from multiple sites
    helps distribute network traffic.

33
How does SDP reduce response time ?
  • Discovery
  • local lookup
  • low overhead for metadata propagation
  • servers piggyback metadata onto HTML files
  • proxies piggyback metadata onto objects
  • Delivery
  • direct
  • select cache site from estimates of response time
  • retrieve embedded images concurrently from
    multiple sites

34
Popularity SkewUTSA-CS, UTSA-VIS
35
Object Size - TransfersUTSA-CS, UTSA-VIS
Write a Comment
User Comments (0)
About PowerShow.com