Analysis of Caching and Replication Strategies for Web Applications - PowerPoint PPT Presentation

About This Presentation
Title:

Analysis of Caching and Replication Strategies for Web Applications

Description:

Maarten van Steen. Analysis of Caching and Replication Strategies for Web Applications ... In this paper the authors present qualitative and quantitative analysis of ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 22
Provided by: sudarsa3
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Caching and Replication Strategies for Web Applications


1
Analysis of Caching and Replication Strategies
for Web Applications
Authors Swaminathan Sivasubramaniam, Guillaume
Pierre, Maarten van Steen.
  • Presented By
  • Sudarsan Maddi
  • Graduate Student

2
Topics That We Will be Seeing
  • Introduction
  • Techniques to scale Web applications
  • Performance Analysis
  • Choosing the Right Strategy

3
Introduction
  • In this paper the authors present qualitative and
    quantitative analysis of replication and caching
    techniques to host Web applications.
  • Their analysis shows that selecting the best
    mechanism depends heavily on data workload and
    application characteristics.

4
Introduction
  • Web sites are slow dew to many reasons, one of
    the main reason is dynamic generation of web
    documents.
  • Web page caching Fragments of HTML pages the
    application generates are cached to serve future
    requests.
  • Content-delivery networks such as Akamai do this
    by deploying edge servers around the Internet
    thus reducing requests network latency.

5
Introduction
  • Limitations of page caching have given raise to
    different approaches for scalable Web
    applications, classified broadly into
  • Application code replication
  • Cache database records
  • Cache query results
  • Entire Database replication
  • In this article they have given overview of
    various scalable techniques compared and analyzed
    their features and performance.

6
Techniques to scale Web Applications
  • The techniques we are going to see are
  • Edge Computing
  • Data Replication
  • Content-Aware data Caching (CAC)
  • Content-Blind data Caching (CBC)

7
Edge Computing
  • In this the application code is replicated at
    multiple edge servers and data is centralized.
  • Akamai and ACDN use this technique.
  • The data centralization create problems,
  • If the edge servers are located worldwide, each
    data access incurs WAN latency.
  • The central database becomes a performance
    bottleneck if the load increases.

8
Data Replication
  • Solution for Edge computing is to place the data
    at each edge server.
  • Database replication (REPL) techniques can help
    maintaining identical copies at multiple
    locations.

Continued
9
Data Replication
  • The problem with this is when there is a database
    update.
  • This creates huge network traffic and performance
    overhead.

10
Content-Aware data Caching (CAC)
  • Instead of maintaining full copies of database
    CAC systems cache database query results as the
    application code issues them.
  • Query Containment Check The application running
    at the edge-server issues a query, the local
    database checks if it has enough data to answer
    the query locally.
  • Containment check results positive query is
    present locally, else its sent to central
    database and inserts the result in its local
    database.

Continued
11
An Example of CAC
  • CAC store query results efficiently
  • For example
  • Query Q1 Select from items where pricelt50
  • Query Q2 Select from items where pricelt20
  • Query template QT1
  • Select from items where pricelt

12
Content-Aware data Caching (CAC)
  • This query containment check is highly
    computationally expensive because it must check
    the new query with all previously cached queries.
  • In order to reduce this cost CAC makes use of
    query template, which is a parameterized SQL
    query whose parameter values are parse at runtime
  • In, CAC systems update queries is always
    executed at the central database.

13
Content-Blind data Caching (CBC)
  • Here, edge servers dont need to run a database
    at all.
  • Instead they store the results of remote database
    queries independently.
  • The query results aren't merged here storing
    redundant information, and will have a hit only
    if application issues exact query, so hit rates
    are low

Continued.
14
Content-Blind data Caching (CBC)
  • This have some advantages over CAC as,
  • Incurs very little computational load.
  • Caching query results as result sets instead of
    database records, so can return results
    immediately.
  • Finally, inserting a new element into the cache
    doesn't require a query rewrite.

15
Scalable Web hosting.  (a) edge computing, (b)
content-aware caching, (c) content-blind
caching, and (d) data replication.
16
Performance Analysis
  • To compare the four techniques, they have made
    use of two different applications,
  • RUBBoS, a bulletin-board benchmark application
    that models Slashdot.org,
    http//jmob.objectweb.org/rubbos.html
  • TPC-W, an industry-standard e-commerce benchmark
    that models an online book store such as
    Amazon.com,
  • http//pgfoundry.org/projects/tpc-w-php/

17
Performance Analysis
  • They have measured the end-to-end client latency,
    which is the sum of network latency and internal
    latency.
  • The results shows that CBC performed best in
    terms of client latency whereas EC performed the
    worst for RUBBoS.
  • Were as for TPC-W REPL performed the best and EC
    worst again.

18
Performance Results
(a) RUBBoS benchmark (b) TPC-W Browsing (c) TPC-W
Ordering
19
Choosing the Right Strategy
  • According to the author the Web designers should
    choose the scalable technique by carefully
    analyzing their Web application characteristics.
  • They have suggested the best strategy is the one
    that minimizes the applications end-to-end client
    latency.
  • This latency is affected by many parameters as
    hit ratio, database query execution time,
    application server execution time.
  • To do this they have proposed a concept called
    virtual caches (VC).

Continued
20
Choosing the Right Strategy
  • VC behaves just like a real cache but it stores
    only meta data, such as the list of objects in
    the cache, sizes. So this requires less memory
    compared to real caches.
  • So with the help of these VC we can get the hit
    ratios and execution times for servers and can
    estimate end-to-end latency.

21
Thank You.
Write a Comment
User Comments (0)
About PowerShow.com