Simulation Evaluation of Web Caching Architectures presentation

About This Presentation

Transcript and Presenter's Notes

Title: Simulation Evaluation of Web Caching Architectures

1
Simulation Evaluation of Web Caching Architectures

Carey Williamson
Mudashiru Busari
Department of Computer Science
University of Saskatchewan

2
Outline

Introduction Web Caching
Proxy Workload Generator (ProWGen)
Evaluation of Single-Level Caches
Evaluation of Multi-Level Caches
Conclusions and Future Work
Questions?

3
Introduction

The Web is both a blessing and a curse
Blessing
Internet available to the masses
Seamless exchange of information
Curse
Internet available to the masses
Stress on networks, protocols, servers, users
Motivation techniques to improve the performance
and scalability of the Web

4
Why is the Web so slow?

Three main possible reasons
Client-side bottlenecks (PC, modem)
Solution better access technologies (TRLabs)
Server-side bottlenecks (busy Web site)
Solution faster, scalable server designs
Network bottlenecks (Internet congestion)
Solutions caching, replication improved
protocols for client-server communication

5
What is a Web proxy cache?

Intermediary between Web clients (browsers) and
Web servers
Controlled Internet access point for an
institution or organization (e.g., firewall)
Natural point for Web document caching
Store local copies of popular documents
Forward requests to servers only if needed

6
Web Caching Proxy
Web Server
Web Server
Internet
Region or Organization Boundary
Proxy
Web Clients
C
C
C
C
7
Some Technical Issues

Size of cache
Replacement policy when cache is full
Cache coherence (Get-If-Modified)
Some content is uncacheable
Multi-cache coordination, peering (ICP)
Security and privacy hit metering
Other issues...

8
Our Previous Work

Collaborative project with CANARIE, through the
Advanced Networks Applications program
(July98-June99)
Design and evaluation of Web caching strategies
for Canadas CAnet II backbone (National Web
Caching Infrastructure)
For more information, see URL http//www.cs.usask.
ca/faculty/carey/projects/nwci.html

9
CAnet II Web Caching Hierarchy (Dec 1998)
10
CAnet II Web Caching Hierarchy (Dec 1998)
(selected measurement points for our traffic
analyses 3-6 months of data
from each)
USask
CANARIE (Ottawa)
To NLANR
11
Caching Hierarchy Overview
Top-Level/International (20-50 GB)
Cache Hit Ratios
Proxy
5-10
(empirically observed)
Proxy
National (10-20 GB)
Proxy
15-20
Regional/Univ. (5-10 GB)
Proxy
Proxy
Proxy
30-40
...
...
C
C
C
C
C
C
C
12
NWCI Project Contributions

Workload characterization and evaluation of
CAnet II Web caching hierarchy (IEEE
Network, May/June 2000)
Developed Web proxy caching simulator for
trace-driven simulation evaluation of Web proxy
caching hierarchies
Recommendations for CANARIE NWCI about
configuration of future caches

13
Overview of This Talk

Constructed synthetic Web proxy workload
generation tool (ProWGen) that captures the
salient characteristics of empirical Web proxy
workloads
Use ProWGen to evaluate sensitivity of proxy
caches to workload characteristics
Use ProWGen to evaluate effectiveness of
multi-level Web caching hierarchies (and
cache management techniques)

14
Research Methodology

Design, construction, and parameterization of
workload models
Validation of ProWGen (statistically, and versus
empirical workloads)
Simulation evaluation of single cache
Sensitivity to workload characteristics
Different cache sizes, replacement policies
Simulation evaluation of multi-level cache
Sensitivity to workload characteristics
Novel (heterogeneous) cache management policies

15
Key Workload Characteristics

One-timers (60-70 useless!!!)
Zipf-like document referencing popularity
Heavy-tailed file size distribution (i.e., most
files small, but most bytes are in big files)
Correlations (if any) between document size and
document popularity (debate!)
Temporal locality (temporal correlation between
recent past and near future references) Mahanti
et al. 2000

16
ProWGen Conceptual View
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
17
ProWGen Conceptual View
Zipf
P
r
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
18
ProWGen Conceptual View
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
19
ProWGen Conceptual View
ProWGen Software
Input Parameters
Synthetic Workload
1
Z
a
c
L
20
ProWGen Workload Modeling Details

Modeled workload characteristics
One-time referencing
Zipf-like referencing behaviour (Zipfs Law)
File size distribution
Body lognormal distribution
Tail Pareto Distribution
Correlation between file size and popularity
Temporal locality
Static probabilities in finite-size LRU stack
model
Dynamic probabilities in finite-size LRU stack
model

21
Validation of ProWGen

To establish that the synthetic workloads possess
the desired characteristics (quantitative and
qualitative), and that the characteristics are
similar to those in empirical workloads

Example analyze 5 million requests from a proxy
server trace and parameterize ProWGen to generate
a similar workload

22
Workload Synthesis
23
Zipf-like Referencing Behaviour
Empirical Trace Slope 0.81
Synthetic Trace Slope 0.83
24
Transfer Size Distribution
25
Research QuestionsSingle-Level Caches

In a single-level proxy cache, how sensitive is
Web proxy caching performance to certain workload
characteristics (one-timers, Zipf-ness,
heavy-tail index)?
How does the degree of sensitivity change
depending on the cache replacement policy?

26
Simulation Model
Web Servers
Web Clients
27
Factors and Levels

Cache size
Cache Replacement Policy
Recency-based LRU
Frequency-based LFU-Aging
Size-based GD-Size
Workload Characteristics
One-timers, Zipf slope, tail index, correlation,
temporal locality model

28
Performance Metrics

Cache hit ratio
Percent of requested docs found in cache (HR)
Percent of requested bytes found in cache (BHR)
User response time
Estimated analytically using request rates, cache
hit ratios, and (relative) cache miss penalties

29
Simulation Results (Preview)

Cache performance is very sensitive to
Slope of Zipf-like doc referencing popularity
Temporal locality property
Correlations between size and popularity
Cache performance relatively insensitive to
Tail index of heavy-tailed file size distribution
One-timers

30
Sensitivity to One-timers (LRU)
(a) Hit Ratio
(a) Byte Hit Ratio
31
Sensitivity to Zipf Slope (LRU)
Difference of 0.2 in Zipf slope impacts
performance by as much as 10-15 in hit ratio
and byte hit ratio
(a) Hit Ratio
(b) Byte Hit Ratio
32
Sensitivity to Heavy Tail Index (LRU Replacement
Policy)
(a) Hit Ratio
(b) Byte Hit Ratio
33
Sensitivity to Heavy Tail Index (GD-Size
Replacement Policy)
Difference of 0.2 in heavy tail index impacts
performance by less than 3
(a) Hit Ratio
(a) Byte Hit Ratio
34
Sensitivity to Correlation (LRU)
(a) Hit Ratio
(a) Byte Hit Ratio
35
Sensitivity to Temporal Locality (LRU)
(a) Hit Ratio
(b) Byte Hit Ratio
36
Summary Single-Level Caches

Cache performance is sensitive to
Slope of Zipf-like document referencing
popularity
Temporal locality
Correlation between size and popularity

Cache Performance is insensitive to
Tail index of heavy-tailed file size
distribution
One-timers

37
Multi-Level Caching...

Workload characteristics change as you move up
the Web caching hierarchy (due to filtering
effects, aggregation, etc)
Idea 1 Try different cache replacement policies
at different levels of hierarchy
Idea 2 Limit replication of cache content in
overall hierarchy through partitioning (size,
type, sharing,)

38
Research QuestionsMulti-Level Caches

In a multi-level caching hierarchy, can overall
caching performance be improved by using
different cache replacement policies at different
levels of the hierarchy?
In a multi-level caching hierarchy, can overall
performance be improved by keeping disjoint
document sets at each level of the hierarchy?

39
Simulation Model
Web Servers
Web Clients
40
Experiment 1 Different Policies at Different
Levels of the hierarchy
(a) Hit Ratio
(b) Byte Hit Ratio
41
Experiment 2 Shared files at the upper level of
the hierarchy
42
Experiment 3 Size-based Partitioning

Partition files across the two levels based on
sizes (e.g., keep small files at the lower level
and large files at the upper level) (or vice
versa)
Three size thresholds
5,000 bytes
10,000 bytes
100,000 bytes

43
Small files at the lower level Large files at
the upper level
Parent
Size threshold 5,000 bytes
44
Large files at the lower level Small files at
the upper level
Size threshold 5,000 bytes
45
Summary Multi-Level Caches

Different Policies at different levels
LRU/LFU-Aging at the lower level GD-Size at the
upper level provided improvement in performance
GD-Size GD-Size provided better performance in
hit ratio, but with some penalty in byte hit ratio

Sharing-based approach
no benefit compared to the other cases studied

Size-threshold approach
small files at the lower level large files at
the upper level provided improvement in
performance
reversing this policy offered no perf advantage

46
Conclusions

ProWGen is a valuable tool for the evaluation of
Web proxy caching architectures, using synthetic
workloads
Existing multi-level caching hierarchies are not
always that effective
Heterogeneous caching architectures may better
exploit workload characteristics and improve Web
caching performance

47
Future Work

Extend ProWGen
model response time
model file size modifications

Extend the multi-level experiments
look into configurations where there is
communication between the lower level proxies
investigate configurations involving more levels
and and more lower level proxies

48
For More Information...

M. Busari, Simulation Evaluation of Web Caching
Hierarchies, M.Sc. Thesis, June 2000
Two papers available soon (under review)
ProWGen tool is available now
Email carey_at_cs.usask.ca
http//www.cs.usask.ca/faculty/carey/

Write a Comment

User Comments (0)

About PowerShow.com

Simulation Evaluation of Web Caching Architectures PowerPoint PPT Presentation