Programming Support and Resource Management for Cluster-based Internet Services - PowerPoint PPT Presentation

About This Presentation
Title:

Programming Support and Resource Management for Cluster-based Internet Services

Description:

University of California, Santa Barbara. 2/24/2003. Hong Tang, UCSB. 2 ... Serving highly concurrent and fluctuating traffic under interactive response constraint. ... – PowerPoint PPT presentation

Number of Views:31
Avg rating:3.0/5.0
Slides: 52
Provided by: Hong87
Category:

less

Transcript and Presenter's Notes

Title: Programming Support and Resource Management for Cluster-based Internet Services


1
Programming Support and Resource Management for
Cluster-based Internet Services
  • Hong Tang
  • Department of Computer Science
  • University of California, Santa Barbara

2
Cluster-based Internet Services
  • Advantages
  • Cost-effectiveness.
  • Incremental scalability.
  • High availability.
  • Examples
  • Yahoo, MSN, AOL, Google, Teoma.

Firewall/ Traffic switch
Web server/ Query handlers
Local-area network
Service nodes
3
Challenges
  • Hardware failures and configuration errors due to
    a large number of components.
  • Platform heterogeneity due to irregularity in
    hardware, networking, data partitions.
  • Serving highly concurrent and fluctuating traffic
    under interactive response constraint.

4
Neptune Programming and Runtime Support for
Cluster-based Internet Services
  • Programming support
  • Component-oriented style allows programers to
    focus on application functionality.
  • Neptune API provides high-level primitives for
    service programming.
  • Runtime support
  • Neptune runtime glues components together and
    takes care of reliability and scalability issues.
  • Applications
  • Discussion groups online auctions index search
    persistent cache utility BLAST-based protein
    sequence match.
  • Industrial Deployment Teoma/AskJeeves.

5
Example Document Search Engine
Index servers (partition 1)
Query caches
Firewall/ Traffic switch
Web server/ Query handlers
Local-area network
Index servers (partition 2)
Doc server (partition 2)
Index servers (partition 3)
Doc server (partition 1)
6
Outline
  • Cluster-based Internet services Background and
    challenges.
  • Programming support for data aggregation
    operations.
  • Integrated resource management and QoS support.
  • Future work.

7
Data Aggregation Operation
  • Aggregate request processing results from
    multiple data partitions.
  • Examples search engine, discussing groups,
  • Naïve approach
  • Rely on a fixed server for data collection and
    aggregation.
  • The fixed server is a scalability bottleneck.
  • Actually used in TACC framework (Fox97) and
    previous version of Neptune system.
  • Need explicit programming support and efficient
    runtime system design!

8
Data Aggregation Operation The Search Engine
Example
Index servers (partition 1)
Query caches
Firewall/ Traffic switch
Web server/ Query handlers
Local-area network
Index servers (partition 2)
Doc server (partition 2)
Index servers (partition 3)
Doc server (partition 1)
9
Data Aggregation Operation
  • Aggregate request processing results from
    multiple data partitions.
  • Examples search engine, discussion groups,
  • Naïve approach
  • Rely on a fixed server for data collection and
    aggregation.
  • The fixed server is a scalability bottleneck.
  • Actually used in TACC framework (Fox97) and
    previous version of Neptune system.
  • Need explicit programming support and efficient
    runtime system design!

10
Design Objectives
  • An easy-to-use programming primitive.
  • Scalable to a large number of partitions.
  • Interactive responses and high throughput.
  • Reminder All must be achieved in a cluster
    environment!
  • Component failures.
  • Platform heterogeneity.

11
Data Aggregation Call (DAC)The Basic Semantics
DAC(P, opproc , opreduce)
Requirement of reduce() commutative and
associative.
partition 1
partition 2
partition 3
partition 4
12
Adding Quality Control to DAC
  • What if some server fails?
  • Partial aggregation results may still be useful.
  • Provide aggregation quality guarantee .
  • Aggregation quality Percentage of partitions
    contributed to the aggregation result.
  • What if some server is very slow?
  • Better return partial results than waiting.
  • Provide soft deadline guarantee .

DAC(P, opproc , opreduce ,q, T)
13
Design Alternatives
Service client
Service client
Service client
P1
P1
P2
Pn
P1
P2
Pn
P3
P2
P4
P5
P6
(b)
(a)
(c)
Base
Flat
Hierarchical tree
14
Tree-based Reduction
Participating servers
Service client
The reduction tree is built dynamically for each
request.
15
Building Dynamic Reduction Trees
  • Objective
  • High throughput and low response time.
  • Achieving high throughput
  • Balance load, keep all servers busy.
  • Achieving low response time?

16
Building Dynamic Reduction Trees
17
Building Dynamic Reduction Trees
  • Objective
  • High throughput and low response time.
  • Achieving high throughput
  • Balance load, keep all servers busy.
  • Achieving low response time
  • Reducing the longest queue length.
  • Queue length indicates server load.
  • Balance load!
  • Observation Under highly concurrent workload,
    the goals of reducing response time and improving
    throughput require us to balance load!
  • Decisions Tree shape server assignment.

18
Load-aware Server Assignment
A
  • A servers load increase is determined by of
    children.
  • k children 1 local processing k reduction.
  • Underloaded servers nodes with more children.
  • Overloaded servers leaf nodes, or nodes with
    fewer children.

B
C
D
E
F
G
19
Choosing Reduction Tree Shapes
  • Static tree shapes Balanced d-ary tree binomial
    tree.
  • Problem Not sufficient to correct load imbalance
    caused by platform heterogeneity in a cluster
    environment.

20
Load-adaptive Tree Formation (LAT)
7
G
H
6
5
4
3
2
1
D
E
F
A
B
D
C
E
F
G
H
21
LAT Adjustment
  • Problem When all servers have similar load, LAT
    will assign one reduction operation per server,
    resulting in a link list.
  • Solution Final adjustment ensures the depth is
    no more than logN. If a subtree is in the form of
    a link list, change it to a binomial tree.

22
LAT Summary
  • Steps
  • Collecting server load information.
  • Assigning operations to servers.
  • Constructing the reduction tree.
  • Adjusting the tree shape.
  • Time complexity O(nlogn).

23
Request Scheduling in a Server
  • Problem Blocking threads for data from children
    will reduce throughput.
  • Solution Event-driven scheduling.

Data recved from child
reduction
All data from children aggregated
Timeout
Local proc done (non-leaf node)
Req recved
Local proc done (leaf node)
Local process initiated
Send data to parent
24
Handling Server Failures
  • Failures
  • Server stopped No heartbeat packets.
  • Server unresponsive Very long queue.
  • Solutions
  • Exclude stopped servers from the reduction tree.
  • Use staged timeout to eagerly prune unresponsive
    servers.

25
Evaluation Settings
  • A cluster of Linux servers (kernel ver. 2.4.18)
  • 30 dual-CPU (400MHz P-II), 512MB MEM 4 quad-CPU
    (500MHz P-II), 1GB MEM.
  • Benchmark I Search engine index server.
  • Dataset 28 partitions, 1-1.2GB each.
  • Workload Trace-driven.
  • One week trace from Ask Jeeves.
  • Contains only uncached queries.
  • Benchmark II CPU-spinning microbenchmark.
  • Workload Synthetic.

26
Ease of Use
  • Applications Index server NCBIs BLAST protein
    sequence matcher online facial recognizer.
  • First implemented without DAC.
  • A graduate student modified it with DAC.

Services Code Size (lines) Changed Lines Effort (days)
Index 2384 142 (5.9) 1.5
BLAST 1060K 307 (0.03) 2
Face 4306 190 (4.4) 1
27
Tree Formation Schemes
  • 24 dual-CPU nodes, index server benchmark.

28
Tree Formation Schemes
  • 20 dual-CPU, 4 quad-CPU nodes (heterogeneous).

(A) Response Time - 20 Dual, 4 Quad
(B) Throughput - 20 Dual, 4 Quad
1500
25
Binomial
LAT
20
1000
15
Binomial
Throughput (req/sec)
LAT
Response Time (ms)
10
500
5
16 ? 25
4 ? 21
0
0
10
15
20
25
30
10
15
20
25
30
Request Rate (req/sec)
Request Rate (req/sec)
29
Handling Servers Failures
  • LAT with Staged timeout (ST).
  • Event-driven request scheduling (ED).
  • Three versions None, ED-only, EDST.

30
Scalability (simulation)
(B) Scalability Throughput
(A) Scalability Response Time
0.5
100
0.4
80
0.3
60
Throughput (req/sec)
Response Time (s)
40
0.2
Throughput
95 Demand level
60 Demand level
80 Demand level
0.1
20
90 Demand level
0
0
100
200
300
400
500
100
200
300
400
500
Number of Server Partitions
Number of Server Partitions
31
Summary
  • Programming support
  • DAC primitive.
  • Runtime system
  • LAT tree formation.
  • Event-driven scheduling.
  • Staged timeout.
  • PPoPP03.

32
Outline
  • Cluster-based Internet services background and
    challenges.
  • Programming support for data aggregation
    operations.
  • Integrated resource management and QoS support.
  • Future work.

33
Research Objectives
  • Service-specific resource management objectives.
  • Previous research Rely on concrete metrics to
    measure resource management efficiency.
  • Observation Different services may have
    different objectives.
  • Statement Resource management objectives should
    not be built into the runtime system.
  • Differentiated services qualities for multiple
    request classes (QoS).
  • Internet traffic is bursty 31 peak-to-average
    load ratio reported at Ask Jeeves.
  • Prioritized resource allocation is desirable.

34
Service Yield Function
  • Service yield The benefit achieved from serving
    a request.
  • A monotonically non-increasing function of
    response time.
  • Service yield function Y(r) Specified by
    service providers.
  • Optimization goal Maximize aggregate yield
    .

35
Sample Service Yield Functions
36
Service Differentiation
  • Service class A category of service requests
    that enjoy the same level of QoS support.
  • Client identity (paid vs unpaid membership).
  • Service types (order placement vs catalog
    browsing).
  • Provision
  • Differentiated service yield functions.
  • Proportional resource allocation guarantee.

37
Runtime System Request Scheduling
  • Functionally homogeneous sub-cluster.
  • Example Replicas of index server partition 1.
  • Cluster level
  • Which server to handle a request?
  • Server level
  • When to serve a request?

Service client
Service client
Service client
Cluster-level request dispatch
Server
Server
Other server
...
Other server
Sub-cluster
Service cluster
38
Cluster Level Partitioning or Not?
  • Periodic server partitioning Infocom01.
  • Partition the sub-cluster among service classes.
  • Periodically adjust server pool sizes based on
    request demand of the service classes.
  • Problems
  • Decisions are made by a centralized dispatcher.
  • Periodical adjustment means slow response to
    demand changes.
  • This work Random polling.
  • Service differentiation at the server level.
  • Functional-symmetry and decentralization.
  • Better handling of demand spikes and failures.

39
Server Level Scheduling
  • Drop requests that are likely to generate zero
    yield.
  • If there is any under-allocated service class,
    schedule a request in that class.
  • Otherwise, find the request that has the best
    chance to maximize aggregate yield.
  • System underloaded?
  • Observation Yield loss due to missed deadlines.
  • Idea Schedule requests with tight deadlines.
  • Solution YID (yield-inflated deadline)
    scheduling.
  • System overloaded?
  • Observation Yield loss due to lack of resources.
  • Idea Schedule requests with low resource
    consumption.
  • Solution YIC (yield-inflated-cost) scheduling.

...
Class 1
Class 2
Class N
Request scheduling for service differentiation
Thread pool
40
Evaluation Settings
  • A cluster of 24 dual-CPU Linux servers.
  • Benchmark Differentiated index search service.
  • Three service classes
  • Gold, Silver, Bronze memberships.
  • Request composition 10 30 60.
  • Service yield ratio 4 2 1.
  • 20 resource guarantee forBronze class.
  • Workload Trace-driven.
  • One week trace from Ask Jeeves.
  • Contains only uncached queries.

Service yield functions
6
Gold
Silver
5
Bronze
4
3
2
1
0
0
1
2
3
4
5
6
Response time (seconds)
41
Service Differentiation During a Demand Spike and
Server Failure
  • Demand spike for the Silver class between time 50
    and 150.
  • One server failure between time 200 and 250.

Elapsed Time (sec)
42
Service Differentiation During a Demand Spike and
Server Failure
  • Periodic server partitioning.

Elapsed Time (sec)
43
Summary
  • Service yield function.
  • As a mechanism to express resource management
    objectives.
  • As a means to differentiate service qualities.
  • Two-level decentralized request scheduling.
  • Cluster level Random polling.
  • Server level Adaptive scheduling.
  • OSDI02.

44
Related Work
  • Programming support for cluster-based Internet
    services TACC Fox97, MultiSpace Gribble99,
    Ninja von Behren02.
  • Event-driven request processing Flash Pai99,
    SEDA Welsh01.
  • Tree-based reduction in MPI Gropp96, MagPIe
    Kielmann99, TMPI Tang01.
  • Data aggregation Aggregation queries for
    databases Saito99, Madden02, Scientific
    application Chang01.
  • QoS for computer networks Weighted Fair Queuing
    Demers90 Parekh93, Leaky Bucket, LIRA
    Stoica98, Dovrolis99.
  • QoS or real-time scheduling at the single host
    level Huang89, Haritsa93, Waldspurger94,
    Mogul96, LRP Druschel96, Jones97, Eclipse
    Bruno98, Resource Container Banga99,
    Steere99.
  • QoS and resource management for Web servers
    Almeida98, Pandey98, Abdelzaher99,
    Bhatti99, Chandra00, Li00, Voigt01.
  • QoS and load balancing for Internet services
    LARD Pai98, Cluster Reserves Aron00,
    Sullivan00, DDSD Zhu01, Chase01,
    Goswami93, Mitzenmacher97, Zhou87.

45
Outline
  • Cluster-based Internet services background and
    challenges.
  • Programming support for data aggregation
    operations.
  • Integrated resource management and QoS support.
  • Future work.

46
Self-organizing Storage Cluster
  • Challenge Distributed storage resources are hard
    to manage and utilize.
  • Fragmented storage space.
  • Frequent disk failures.
  • Objective Let the cluster manage storage
    resources by itself.
  • Storage virtualization.
  • Incrementally scalable.
  • Automatic redundancy maintenance.

47
Dynamic Service Composition
  • Challenge Internet services are evolving
    rapidly.
  • More functionality requires more service
    components.
  • Reusing existing service components.
  • Objective Programming and runtime support for
    dynamic service composition.
  • Easy to use composition mechanisms.
  • On-the-fly service reconfiguration.

48
Q A
  • Acknowledgement
  • Tao Yang, Lingkun Chu UCSB
  • Kai Shen University of Rochester
  • Project Web Site http//www.cs.ucsb.edu/projects/
    neptune/
  • Personal home pagehttp//www.cs.ucsb.edu/htang/

49
Event-driven Scheduling
(B) Throughput - 24 Partitions
(A) Response Time - 24 Partitions
25
1000
Event Driven
Event Driven
No Event Driven
20
800
No Event Driven
15
600
Throughput (req/sec)
Response Time (ms)
10
400
5
200
0
0
5
10
15
20
25
5
10
15
20
25
Request Rate (req/sec)
Request Rate (req/sec)
50
Evaluation Workload Trace
Traces Number of requests Number of requests Mean arrival interval Mean service time
Traces Total Non-cached Mean arrival interval Mean service time
Gold (Tue peak) 507,202 154,466 163.1ms 247.9ms
Silver (Wed peak) 512,227 151,827 166.0ms 249.7ms
Bronze (Thu peak) 517,116 156,214 161.3ms 245.1ms
51
Compare MPI Reduce and DAC
MPI Reduce DAC
Primitive semantics Tolerate failures All or nothing Allow partial results
Primitive semantics Deadline requirement No Yes
Primitive semantics Programming model Procedure-based Request-driven
Runtime system design Tree shape Static Dynamic
Runtime system design Server assignment Static Dynamic
Write a Comment
User Comments (0)
About PowerShow.com