Web Caching presentation

About This Presentation

Transcript and Presenter's Notes

Title: Web Caching

1
Web Caching

By
Amisha Thakkar
Alpa Shah

2
Overview

What is a Web Cache ?
Caching Terminology
Why use a cache?
Disadvantages of Web Cache
Other Features
Caching Rules

3
Overview

Caching Architectures
Comparison of Architectures
Cache Deployment Scheme
Client Side Cache Cooperation
Active Caching

4
What is a Web Cache ?

Cache is a place where temporary copies of
objects are stored
Cached information is generally closer to the
requester than the permanent information is
Objects -HTML pages, images, files

5
What is a Web Cache?
6
Caching Terminology

Client - An application program that establishes
connections for sending requests
Server- An application program that accepts
connection to service requests by sending back
responses
Origin Server-The server on which the given
resource resides or is to be created

7
Caching Terminology

Proxy- An intermediary program which acts both as
a server and a client which requests on behalf
of the other clients
Proxy is not necessarily a cache
Proxy does not always cache the replies
passing through it
It may be used on a firewall to monitor
accesses

8
Why use a cache ?

To reduce latency
To reduce network traffic
Load on origin servers will be reduced
Can isolate end users from network failures

9
Disadvantages of Web cache

With cached data there is always a chance of
receiving stale information
Content providers lose access counts when cache
hits are served
Manual configuration is often required
Operation of cache requires additional resources
In some situations the cache can be a single
point of failure

10
Other Features

Depending on the perspective the following may be
good or bad
Cache requests on behalf of clients the
servers never see the clients IP addresses
Cache provides an easy opportunity to
monitor and analyze browsing activities
Cache can be used to block certain requests

11
Types of Web Caches

Proxy caches
Serve a large number of users
Large corporations and ISPs often set
them up on the firewalls
They are type of shared caches
Browser caches
Use a section of the computers hard disk
to store objects that you have seen

12
Caching Rules

Rules on which caches work -
Some of them set in protocols
Some are set by cache administrator
Most common rules
If the object is authenticated or secure it
wont be cached
Objects headers indicate whether the
object is cacheable or not

13
Caching Rules

Object is considered fresh when -
? It has an expiry time or other age
controlling directive set is still
within the fresh period
? If the browser cache has already seen
the object has been set to check
once a session

14
Caching Rules

? If a proxy cache has seen the object
recently it was modified relatively
long ago
Fresh documents are served directly from the
cache without checking with the origin server

15
Caching Rules

For a stale object , the origin server will
be asked to validate the object , or tell
the
cache whether the copy is still good
The most common validator is the time
that the object was last changed

16
Caching Architectures Hierarchical /Simple Cache

Browser-cache interaction is same as browser
-host interaction, i.e. a TCP connection is made
item requested
If not found send request to parent cache
Hierarchy built up - each level serving
indirectly a wider community of users

17
Caching Architectures Hierarchical /Simple Cache
18
Caching Architectures Distributed /Co-operating
Cache

Decentralized(Cache Mesh)
Multiple servers cooperate in such a way that
they share their individual caches to create a
large distributed one
Simply put caching proxies communicating with
each other to serve different users
On a cache miss, it checks with other proxy
caches before contacting the origin server

19
Caching Architectures Distributed /Co-operating
Cache

Caches communicate amongst themselves using a
protocol like ICP (Internet Cache Protocol)
Caches can be selected on the basis of
Distances from the end user
Specialize in particular URLs(location hint).

20
Caching Architectures Distributed /Co-operating
Cache

Why Distributed - limitations of hierarchy
Width of cache in hierarchy caches at same
level are inaccessible to each other
LRU policy implies sufficient disk space
Cost in replication of disk storage
Amount of disk space reqd. depends on number
of users served breadth of reading

21
Caching Architectures Distributed /Co-operating
Cache

More the users ? more disk space higher in the
hierarchy
Exponential growth of number of documents on
WWW

22
Caching Architectures Distributed /Co-operating
Cache

Caching close to user - more effective, higher
the level lower the efficiency
Can be created for load balancing
Most effective when serving a community of
interests

23
Caching Architectures Distributed /Co-operating
Cache

First an UDP packet sent for cache inquiry.
Cache selection decision is determined by RTT
Potential problem -network congestion because of
UDP
In favor-
UDP exchange 2 IP packets, TCP at least 8
packets

24
Caching Architectures Distributed /Co-operating
Cache

UDP reply from cache can indicate
a. Presence
b. Speed
c. Availability of requested documents

25
Caching Architectures Hybrid Cache

Note ICP

26
Comparison of Architectures

Hierarchical caches placed at multiple levels
Distributed caches only at bottom level no
intermediate caches

27
Comparison of Architectures

Performance parameters.
? Connection time (Tc)is defined as the time
since the document is requested first data byte
is received
? Transmission time (Tt)is defined as the time
taken to transmit the document
? Total latency Tc Tt .
? Bandwidth usage

28
Comparison of Architectures

Fig 3 -Connection time for different documents
popularity

29
Comparison of Architectures

For unpopular documents high connection time
No of requests increases ? avg.. connection time
decreases
For extremely popular documents distributed has
smaller connection times

30
Comparison of Architectures

Fig 4 Network traffic generated

31
Comparison of Architectures

On lower levels, distributed caching practically
double the network bandwidth usage
Around the root node in national network, the
network traffic is reduced to half
Distributed caching uses all possible network
shortcuts between institutional caches,
generating more traffic in the less congested low
network levels

32
Comparison of Architectures

Fig 5 a, Not congested national network

33
Comparison of Architectures

The only bottleneck on the path from the client
to the origin server is the international path.
Hence transmission times are similar for both

34
Comparison of Architectures

Fig 5 b Congested National Networks

35
Comparison of Architectures

Both have higher transmission times compared to
the previous case
Distributed caching gives shorter transmission
times than hierarchical because many requests
travel through lower network levels

36
Comparison of Architectures

Fig 6 Average total latency

37
Comparison of Architectures

For large documents transmission time is more
relevant than connection times
Hierarchical caching gives lower latencies for
documents smaller 200 KB due to lower connection
times
Distributed caching gives lower latencies for
larger documents due to lower transmission times

38
Comparison of Architectures

The size- threshold depends on the degree of
congestion in national network
Higher the congestion, lower is the size-
threshold
Distributed caching has lower latencies than
hierarchical

39
Comparison of ArchitecturesWith Hybrid Scheme

Fig 7 connection time

40
Comparison of ArchitecturesWith Hybrid Scheme

Fig 8.

41
Comparison of ArchitecturesWith Hybrid Scheme

In the hybrid scheme if the number of
cooperating caches (kc) is very small , the
connection time is high
When number of cooperating caches increases, the
connection times decreases up to a minimum
If the number increases over the threshold , the
connection time increases very fast

42
Comparison of ArchitecturesWith Hybrid Scheme

Fig 9 Transmission time

43
Comparison of ArchitecturesWith Hybrid Scheme

For un-congested n/w the no.of coop caches (kt)
at every level hardly influences Tt
If no. of coop caches is very small , high Tt
vice -versa
If the no increases above the threshold the Tt
increases
Optimum no. of caches depends on the no of caches
reachable avoiding congested links

44
Comparison of ArchitecturesWith Hybrid Scheme

Fig 10

45
Comparison of ArchitecturesWith Hybrid Scheme

Fig 11 total latency

46
Comparison of ArchitecturesWith Hybrid Scheme

The no. of coop caches(kopt) at every level
depend on the document size to minimize the total
latency
For small documents the optimum no. is closer to
kc
For large documents the the optimum no. is closer
to kt

47
Comparison of ArchitecturesWith Hybrid Scheme

Fig 12

48
Comparison of ArchitecturesWith Hybrid Scheme

For any document the optimum kopt that minimizes
the total latency is such that kc? kopt?kt

49
Cache Deployment Schemes

Proxy caching

50
Cache Deployment Schemes

Advantages
? Clients point all web requests directly to
cache no effect on non web traffic
?Cost of upgrading h/w s/w is limited
? Administration on caches limited to basic
configuration

51
Cache Deployment Schemes

Disadvantages
?Every browser must be configured to point to
the cache
?Each client can hit only one cache
?Single point of failure
? Unnecessary duplication of data
? Bottleneck in cases where content is otherwise
available in LAN

52
Cache Deployment Schemes

Transparent Proxy caching

53
Cache Deployment Schemes

Advantages
?No browser configuration
?Cost of upgrading h/w s/w is limited
?No administration of intermediate systems
required

54
Cache Deployment Schemes

Disadvantages
? Each client can hit only one cache
?If cache goes down internet as well as
intranet access lost
? Negative impact on non web traffic
? Cache has to route non web traffic
? Routing ,packet examination n/w addr.
translation steal CPU cycles from the main cache
serving function

55
Cache Deployment Schemes

Transparent proxy caching with web cache
redirection.

56
Cache Deployment Schemes

Advantages
?Switch/ router examines the packets
?Minimal impact on non-web traffic
?Frees up CPU cycles for the web cache
? Allows client load to be dynamically spread
over multiple caches
? Eliminates single point of failure especially
if redundant redirectors are used

57
Cache Deployment Schemes

Disadvantages
?Additional intermediate systems must be
deployed
? Increases expense

58
Client Side Cache Cooperation.
59
Active Caching

Current problem unable to cache dynamic documents
Caching Dynamic contents on the web using active
web
Cache applet is server supplied code that is
attached with an URL , or collection of URLs
Applet is written in platform independent language

60
Active Caching

On a user request the applet is invoked by the
cache
The applet decides what is to be sent to the user
Other functions of the applet-
Logging user accesses
Checking access permissions
Rotating advertising banners

61
Active Caching

The proxy has the freedom to not invoke the
applet but send the request to the server
Proxy promises to not send back a cached copy
without invoking the applet
If applet too huge ,send request to server
Proxy not obligated to cache any applet , in that
case agrees to not service the request for that
document

62
Active Caching

Proxy can devote resources to the applets
associated with the hottest URLs to its user
Proxy that receives the request is typically the
proxy closest to the user , the scheme
automatically migrates the server processing to
the nodes that are close to users
Thus increasing the scalability of web based
services

Write a Comment

User Comments (0)

About PowerShow.com

Web Caching PowerPoint PPT Presentation