On the Placement of Web Server Replicas - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

On the Placement of Web Server Replicas

Description:

On the Placement of Web Server Replicas. Lili Qiu, Microsoft Research ... One of the first experimental studies on placement of Web server replicas ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 31
Provided by: Lili51
Category:

less

Transcript and Presenter's Notes

Title: On the Placement of Web Server Replicas


1
On the Placement of Web Server Replicas
  • Lili Qiu, Microsoft Research
  • Venkata N. Padmanabhan, Microsoft Research
  • Geoffrey M. Voelker, UCSD
  • IEEE INFOCOM2001, Anchorage, AK, April 2001

2
Outline
  • Overview
  • Related work
  • Our approach
  • Simulation methodology results
  • Summary

3
Motivation
  • Growing interests in Web server replicas
  • Exponential growth in Web usage
  • Content providers want to offer better service at
    lower cost
  • Solution replication
  • Forms of Web server replicas
  • Mirror sites
  • Content Distribution Networks (CDNs)
  • CDN a network of servers
  • Examples Akamai, Digital Island

Internet
replica
replica
replica
replica
replica
Content Providers
Clients
4
Placement of Web Server Replicas
  • Problem specification
  • Among a set of N potential sites, pick K sites as
    replicas to minimize users latency or bandwidth
    usage

Internet
Content Providers
Clients
5
Related Work
  • Placement of Web proxies LGI99
  • Cache location KRS00
  • Placement of Internet instrumentation JJJ00

6
Our Approach
  • Model Internet as a graph
  • Parameterize the graph using measured inputs
  • requests generated from each region
  • Distance between different regions
  • Map the placement problem onto a graph
    optimization problem
  • Assumption
  • Each client uses a single replica that is closest
    to it
  • Solve graph optimization problem
  • Using various approximation algorithms

7
Minimum K-median Problem
  • Given a complete graph G(V,E), d(j), c(i,j)
  • d(j) requests
  • c(i,j) distance between node i and j
  • Latency
  • or hop counts
  • or other metric to be optimized
  • Find a subset V ?V with V K s.t. it
    minimizes
  • ?v?V minw?V d(v)c(v,w)
  • NP-hard problem

8
7
4
5
3
2
2
2
4
8
6
3
5
10
6
8
Placement Algorithms
  • Tree based algorithm LGG99
  • Assume the underlying topologies are trees, and
    model it as a dynamic programming problem
  • O(N3M2) for choosing M replicas among N potential
    places
  • Random
  • Pick the best among several random assignments
  • Hot spot
  • Place replicas near the clients that generate the
    largest load

9
Placement Algorithms (Cont.)
  • Greedy algorithm
  • Calculate costs of assigning clients to replicas
  • Select replica with lowest cost
  • Adjust costs based upon assignment, repeat until
    done
  • Super-Optimal algorithm
  • Lagrangian relaxation subgradient method

10
Simulation Methodology
  • Network topology
  • Randomly generated topologies
  • Using GT-ITM Internet topology generator
  • Real Internet network topology
  • AS level topology obtained using BGP routing data
    from a set of seven geographically dispersed BGP
    peers
  • Web Workload
  • Real server traces
  • MSNBC, ClarkNet, NASA Kennedy Space Center
  • Performance Metric
  • Relative performance costpractical/costsuper-opti
    mal

11
Simulation Methodology (Cont.)
  • Simulate a network of N nodes (100 ? N ? 3000)
  • Cluster clients using network aware clustering
    KW00
  • IP addresses with the same address prefix belong
    to a cluster
  • A small number of popular clusters account for
    most requests
  • Top 10, 100, 1000, 3000 clusters account for
    about 24, 45, 78, and 94 of the requests
    respectively
  • Pick the top N clusters
  • Map them to different nodes

12
Simulation Methodology (Cont.)
  • Random trees
  • Random graphs
  • AS-level topologies
  • Sensitivity to the error in the input

13
Random Tree Topologies
Tree-based algorithm performs well as
expected. Greedy algorithm performs equally as
well.
14
Random Graph Topologies
The greedy and hot-spot algorithms out-perform
the tree-based algorithm.
15
Large Random Graph Topologies
The greedy performs the best, and the hot-spot
performs nearly as well.
16
AS-level Internet Topologies
The greedy performs the best, and the hot-spot
performs nearly as well.
17
Effects of Imperfect Knowledge about Input Data
  • Predicted workload (using moving window average)
  • Perfect topology information

Within 5 degradation when using predicted
workload
18
Effects of Imperfect Knowledge about Input Data
(Cont.)
  • Predicted workload (using moving window average)
  • Noisy topology information
  • Perturb the distance between two nodes i and j by
    up to a factor of 2

Within 15 degradation when using predicted
workload and noisy topology information
19
Summary
  • One of the first experimental studies on
    placement of Web server replicas
  • Knowledge about client workload and topology is
    needed for provisioning replicas
  • The greedy algorithm performs very well
  • Within a factor of 1.1 1.5 of the super-optimal
  • Insensitive to noise
  • Stay within a factor of 2 of the super-optimal
    when the salted error is a factor of 4
  • The hot spot algorithm performs nearly as well
  • Within a factor of 1.6 2 of the super-optimal
  • Obtaining input data
  • Moving window average for load prediction
  • Using BGP router data to obtain topology
    information

20
Conclusion
  • Recommend using the greedy algorithm for deciding
    the placement of Web server replicas

21
Acknowledgement
  • Craig Labovitz
  • Yin Zhang
  • Ravi Kumar

22
Comments on greedy algorithm performance
  • Worst-case performance unbounded
  • Bad example
  • A full homogeneous binary tree with n2i leaves
    and n caches
  • optimal cost
    0
  • greedy cost
    (n-1)d
  • However, the worst-case scenario seems unlikely
    to occur in real and random topologies

0
0
0
d
d
d
d
23
Simulation Results inRandom Tree Topologies
24
Random Tree Topologies
Tree-based algorithm performs well as
expected. Greedy algorithm performs equally as
well.
25
Random Graph Topologies
The greedy and hot-spot algorithms out-perform
the tree-based algorithm.
26
Large Random Graph Topologies
The greedy performs the best, and the hot-spot
performs nearly as well.
27
AS-level Internet Topologies
The greedy performs the best, and the hot-spot
performs nearly as well.
28
Simulation Results inReal Internet Topologies
29
Obtaining Input Data
  • Workload
  • The number of requests generated by popular
    client clusters
  • Stable
  • Placement algorithm can use moving window average
    for predicting load with negligible impact on
    performance
  • Network topology
  • Propagation delay
  • Hop count
  • AS hop count
  • Internet weather map

30
Placement of Web Server Replicas
  • Goal
  • Placing K replicas to minimize users latency or
    bandwidth usage
  • Minimum K-median problem
  • Select K servers to minimize the sum of
    assignment costs
  • NP-hard problem

Internet
replica
replica
replica
replica
replica
Content Providers
Clients
Write a Comment
User Comments (0)
About PowerShow.com