Analysis of Caching and Replication Strategies for Web Applications - PowerPoint PPT Presentation

About This Presentation

Title:

Analysis of Caching and Replication Strategies for Web Applications

Description:

Maarten van Steen. Analysis of Caching and Replication Strategies for Web Applications ... In this paper the authors present qualitative and quantitative analysis of ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 22

Provided by: sudarsa3

Learn more at: https://jmvidal.cse.sc.edu

Category:

more less

Transcript and Presenter's Notes

Title: Analysis of Caching and Replication Strategies for Web Applications

1
Analysis of Caching and Replication Strategies
for Web Applications
Authors Swaminathan Sivasubramaniam, Guillaume
Pierre, Maarten van Steen.

Presented By
Sudarsan Maddi
Graduate Student

2
Topics That We Will be Seeing

Introduction
Techniques to scale Web applications
Performance Analysis
Choosing the Right Strategy

3
Introduction

In this paper the authors present qualitative and
quantitative analysis of replication and caching
techniques to host Web applications.
Their analysis shows that selecting the best
mechanism depends heavily on data workload and
application characteristics.

4
Introduction

Web sites are slow dew to many reasons, one of
the main reason is dynamic generation of web
documents.
Web page caching Fragments of HTML pages the
application generates are cached to serve future
requests.
Content-delivery networks such as Akamai do this
by deploying edge servers around the Internet
thus reducing requests network latency.

5
Introduction

Limitations of page caching have given raise to
different approaches for scalable Web
applications, classified broadly into
Application code replication
Cache database records
Cache query results
Entire Database replication
In this article they have given overview of
various scalable techniques compared and analyzed
their features and performance.

6
Techniques to scale Web Applications

The techniques we are going to see are
Edge Computing
Data Replication
Content-Aware data Caching (CAC)
Content-Blind data Caching (CBC)

7
Edge Computing

In this the application code is replicated at
multiple edge servers and data is centralized.
Akamai and ACDN use this technique.
The data centralization create problems,
If the edge servers are located worldwide, each
data access incurs WAN latency.
The central database becomes a performance
bottleneck if the load increases.

8
Data Replication

Solution for Edge computing is to place the data
at each edge server.
Database replication (REPL) techniques can help
maintaining identical copies at multiple
locations.

Continued
9
Data Replication

The problem with this is when there is a database
update.
This creates huge network traffic and performance
overhead.

10
Content-Aware data Caching (CAC)

Instead of maintaining full copies of database
CAC systems cache database query results as the
application code issues them.
Query Containment Check The application running
at the edge-server issues a query, the local
database checks if it has enough data to answer
the query locally.
Containment check results positive query is
present locally, else its sent to central
database and inserts the result in its local
database.

Continued
11
An Example of CAC

CAC store query results efficiently
For example
Query Q1 Select from items where pricelt50
Query Q2 Select from items where pricelt20
Query template QT1
Select from items where pricelt

12
Content-Aware data Caching (CAC)

This query containment check is highly
computationally expensive because it must check
the new query with all previously cached queries.
In order to reduce this cost CAC makes use of
query template, which is a parameterized SQL
query whose parameter values are parse at runtime
In, CAC systems update queries is always
executed at the central database.

13
Content-Blind data Caching (CBC)

Here, edge servers dont need to run a database
at all.
Instead they store the results of remote database
queries independently.
The query results aren't merged here storing
redundant information, and will have a hit only
if application issues exact query, so hit rates
are low

Continued.
14
Content-Blind data Caching (CBC)

This have some advantages over CAC as,
Incurs very little computational load.
Caching query results as result sets instead of
database records, so can return results
immediately.
Finally, inserting a new element into the cache
doesn't require a query rewrite.

15
Scalable Web hosting. (a) edge computing, (b)
content-aware caching, (c) content-blind
caching, and (d) data replication.
16
Performance Analysis

To compare the four techniques, they have made
use of two different applications,
RUBBoS, a bulletin-board benchmark application
that models Slashdot.org,
http//jmob.objectweb.org/rubbos.html
TPC-W, an industry-standard e-commerce benchmark
that models an online book store such as
Amazon.com,
http//pgfoundry.org/projects/tpc-w-php/

17
Performance Analysis

They have measured the end-to-end client latency,
which is the sum of network latency and internal
latency.
The results shows that CBC performed best in
terms of client latency whereas EC performed the
worst for RUBBoS.
Were as for TPC-W REPL performed the best and EC
worst again.

18
Performance Results
(a) RUBBoS benchmark (b) TPC-W Browsing (c) TPC-W
Ordering
19
Choosing the Right Strategy

According to the author the Web designers should
choose the scalable technique by carefully
analyzing their Web application characteristics.
They have suggested the best strategy is the one
that minimizes the applications end-to-end client
latency.
This latency is affected by many parameters as
hit ratio, database query execution time,
application server execution time.
To do this they have proposed a concept called
virtual caches (VC).

Continued
20
Choosing the Right Strategy

VC behaves just like a real cache but it stores
only meta data, such as the list of objects in
the cache, sizes. So this requires less memory
compared to real caches.
So with the help of these VC we can get the hit
ratios and execution times for servers and can
estimate end-to-end latency.

21
Thank You.

Write a Comment

User Comments (0)