Title: Analysis of Caching and Replication Strategies for Web Applications
1Analysis of Caching and Replication Strategies
for Web Applications
Authors Swaminathan Sivasubramaniam, Guillaume
Pierre, Maarten van Steen.
- Presented By
- Sudarsan Maddi
- Graduate Student
2Topics That We Will be Seeing
- Introduction
- Techniques to scale Web applications
- Performance Analysis
- Choosing the Right Strategy
3Introduction
- In this paper the authors present qualitative and
quantitative analysis of replication and caching
techniques to host Web applications. - Their analysis shows that selecting the best
mechanism depends heavily on data workload and
application characteristics.
4Introduction
- Web sites are slow dew to many reasons, one of
the main reason is dynamic generation of web
documents. - Web page caching Fragments of HTML pages the
application generates are cached to serve future
requests. - Content-delivery networks such as Akamai do this
by deploying edge servers around the Internet
thus reducing requests network latency.
5Introduction
- Limitations of page caching have given raise to
different approaches for scalable Web
applications, classified broadly into - Application code replication
- Cache database records
- Cache query results
- Entire Database replication
- In this article they have given overview of
various scalable techniques compared and analyzed
their features and performance.
6Techniques to scale Web Applications
- The techniques we are going to see are
- Edge Computing
- Data Replication
- Content-Aware data Caching (CAC)
- Content-Blind data Caching (CBC)
7Edge Computing
- In this the application code is replicated at
multiple edge servers and data is centralized. - Akamai and ACDN use this technique.
- The data centralization create problems,
- If the edge servers are located worldwide, each
data access incurs WAN latency. - The central database becomes a performance
bottleneck if the load increases.
8Data Replication
- Solution for Edge computing is to place the data
at each edge server. - Database replication (REPL) techniques can help
maintaining identical copies at multiple
locations.
Continued
9Data Replication
- The problem with this is when there is a database
update. - This creates huge network traffic and performance
overhead.
10Content-Aware data Caching (CAC)
- Instead of maintaining full copies of database
CAC systems cache database query results as the
application code issues them. - Query Containment Check The application running
at the edge-server issues a query, the local
database checks if it has enough data to answer
the query locally. - Containment check results positive query is
present locally, else its sent to central
database and inserts the result in its local
database.
Continued
11An Example of CAC
- CAC store query results efficiently
- For example
- Query Q1 Select from items where pricelt50
- Query Q2 Select from items where pricelt20
- Query template QT1
- Select from items where pricelt
12Content-Aware data Caching (CAC)
- This query containment check is highly
computationally expensive because it must check
the new query with all previously cached queries. - In order to reduce this cost CAC makes use of
query template, which is a parameterized SQL
query whose parameter values are parse at runtime - In, CAC systems update queries is always
executed at the central database.
13Content-Blind data Caching (CBC)
- Here, edge servers dont need to run a database
at all. - Instead they store the results of remote database
queries independently. - The query results aren't merged here storing
redundant information, and will have a hit only
if application issues exact query, so hit rates
are low
Continued.
14Content-Blind data Caching (CBC)
- This have some advantages over CAC as,
- Incurs very little computational load.
- Caching query results as result sets instead of
database records, so can return results
immediately. - Finally, inserting a new element into the cache
doesn't require a query rewrite.
15Scalable Web hosting. (a) edge computing, (b)
content-aware caching, (c) content-blind
caching, and (d) data replication.
16Performance Analysis
- To compare the four techniques, they have made
use of two different applications, - RUBBoS, a bulletin-board benchmark application
that models Slashdot.org,
http//jmob.objectweb.org/rubbos.html - TPC-W, an industry-standard e-commerce benchmark
that models an online book store such as
Amazon.com, - http//pgfoundry.org/projects/tpc-w-php/
17Performance Analysis
- They have measured the end-to-end client latency,
which is the sum of network latency and internal
latency. - The results shows that CBC performed best in
terms of client latency whereas EC performed the
worst for RUBBoS. - Were as for TPC-W REPL performed the best and EC
worst again.
18Performance Results
(a) RUBBoS benchmark (b) TPC-W Browsing (c) TPC-W
Ordering
19Choosing the Right Strategy
- According to the author the Web designers should
choose the scalable technique by carefully
analyzing their Web application characteristics. - They have suggested the best strategy is the one
that minimizes the applications end-to-end client
latency. - This latency is affected by many parameters as
hit ratio, database query execution time,
application server execution time. - To do this they have proposed a concept called
virtual caches (VC).
Continued
20Choosing the Right Strategy
- VC behaves just like a real cache but it stores
only meta data, such as the list of objects in
the cache, sizes. So this requires less memory
compared to real caches. - So with the help of these VC we can get the hit
ratios and execution times for servers and can
estimate end-to-end latency.
21Thank You.