A Model for Usage Policy-based Resource Allocation in Grids - PowerPoint PPT Presentation

1 / 34

About This Presentation

Title:

A Model for Usage Policy-based Resource Allocation in Grids

Description:

Grid resource sharing is challenging when multiple institutions are involved: Participants might wish to delegate ... Without any precedence constraints ... – PowerPoint PPT presentation

Number of Views:37

Avg rating:3.0/5.0

Slides: 35

Provided by: catal7

Learn more at: http://people.cs.uchicago.edu

Category:

more less

Transcript and Presenter's Notes

Title: A Model for Usage Policy-based Resource Allocation in Grids

1
A Model for Usage Policy-based Resource
Allocation in Grids
Michael Wilde Argonne Natl. Laboratory
University of Chicago

Catalin L. Dumitrescu
The University of Chicago

Ian Foster Argonne National Laboratory The
University of Chicago
2
Introduction

Grid resource sharing is challenging when
multiple institutions are involved
Participants might wish to delegate resource
utilization under various constraints
How are such usage SLAs expressed, discovered,
interpreted and enforced?
We propose
An architecture and recursive policy model
Define roles and functions for controlled
resource sharing in Grids

3
Talk Outline / Part I

Part I
Introduction
Problem Statement
Syntax and Semantics
Usage Policy Enforcement Model
Site Usage Policy Enforcement
VO Usage Policy Enforcement
Verifying Monitoring Infrastructure
Part II
Architecture Simulations
Usage Policy Experimental Results
S-PEP Usage Policy Enforcement
Local RM Usage Policy Enforcement
Quantitative Comparison
Part III
Grid3 Usage Policy Enforcement Status
Conclusions

4
Main Questions

How usage policies are enforced at the resource
and VO level?
What strategies must a VO deploy to ensure usage
policy enforcement?
How are usage policies distributed to enforcement
points?
How usage policies are made available to VO job
and data planners?

5
Problem Overview

Main players
Resource providers (a site, a VO, etc)
Resource consumers (a VO user)
Each relationship provider-consumer is governed
by an appropriate SLA

6
Site SLA Example

Assumption
provider P has agreed to make R (resource)
available to consumer C for a period of one month
How is this agreement to be interpreted?
R might be dedicated to C if needed
P might make R available to others when C is not
using R.
P might commit to preempt other users as soon as
C requests R
Over-usage
If C is allowed to acquire more than R, then this
may or may not result in Cs allocation being
reduced later in the month

7
VO Level and beyond

A VO could acts as a broker for a set of
resources
These brokering functions can be implemented in
different ways
The VO partitions resources and establishes SLAs
directly between its consumers and its providers
if providers support the necessary mechanisms,
the VO hands its consumers digital tickets

8
Syntax and Semantics

Resource allocations
lt resource-type, provider, consumer,
epoch-allocation, burst-allocation gt
where
resource-type CPU NET STORAGE
provider site-name vo-name
consumer vo-name(vo-name,group-name)
epoch-allocation (interval, percentage)
burst-allocation (interval, percentage)

9
UP Enforcement Components

Policy enforcement points (PEPs)
Responsible for enforcing usage policies
Several types (S-PEP, V-PEP, etc)
Site policy enforcement points (S-PEPs)
Reside at all sites and enforce site-specific
policies
VO policy enforcement points (V-PEPs)
Associated with VOs
Operate in a similar way to S-PEPs
Make decisions on a per-job basis to enforce
policy regarding VO specifications

10
UP Enforcement Prototype
11
Stand-alone S-PEPs (Sol 1)

Does not require a usage policy-cognizant cluster
resource manager (RM)
Works with any primitive batch system that
Provides accurate usage and state information
about all scheduled jobs
Has job start/stop/held/remove capabilities
Has running job increase/decrease priority
capabilities
Interacts with the RM
checks continuously the status of jobs in all
queues
invokes management operations on the cluster
resource manager when required to enforce usage
policies

12
S-PEP Processing Logic
1. foreach VOi with EPi 2. Case 1 fill BPi 3.
  if S(BAj)0 BAi lt BPi Qi has jobs
then 4.     release job i from some Qi 5. Case
2 available and BAi lt BPi 6.   else if
S(BAk)ltTOTAL BAiltBPi Qi has jobs then
7.     release job i from some Qi 8. Case 3
res. contention fill EPi 9. else if
S(BAk)TOTAL BAiltEPi Qi has
jobs then 10.   if j exists BAj gt EPj
then 11.      suspend an over-quota job Qj 12.
release job i from some Qi 13. foreach VOi with
EPi 14.   if EAigtEPi then 15.    suspend jobs
for VOi from all Qi
where EPi Epoch allocation policy for VOi BPi
Burst allocation policy for VOi Qi set of
queues with jobs from VOi BAi Burst Resource
Allocation for VOi EAi Epoch Resource
Allocation for VOi TOTAL possible allocation on
the site Over-quota job job of VOj
13
S-PEPs over RMs (Sol 2)

Developed and implemented with success in the
context of Grid3 environment
Decoupled S-PEP functionalities
Standalone site policy observation point (S-POP)
Cluster scheduler for resource allocation control
We assume the cluster scheduler is able to
enforce the desired usage policies
Examples Condor, Portable Batch System, and Load
Sharing Facility, widely used on Grid3

14
V-PEPs

Operate at the submission host queues
Executes various logics to determine whether new
jobs should be submitted to sites according to
the VO usage policies
Provides answers to two questions
What jobs should be scheduled next?
When job j should start?
A third question important here, Where job j
should run?, is also addressed

15
V-PEP Logic
1. foreach (Gi with EPi, BPi, BEi) 2. Case 1
fill BPi BEi 3.   if S(Baj)0
BAiltBpi Qi has jobs then 4.     schedule a
job from some Qi to the least loaded
site 5. Case2 res. available BAiltBPi 6.
else if (S(BAk) lt TOTAL) BAiltBPi
Qi has jobs then 7.    schedule a job from some
Qi to the least loaded site
8. Case 3 rs. contention fill EPi 9.  else
if (S(BAk)TOTAL)(BAiltEPi) (Qi
exists) then 10.   if (j exists BAjgtEPj)
then 11.      stop scheduling jobs for VOj 12.
Need to fill with extra jobs? 13. if (BAi lt
EPi BEi) then 14.    schedule a job from
some Qi to the least loaded site 15.
if (EAi lt EPi) (Qi has jobs) then 16.
  schedule additional backfill jobs
16
UP Enforcement Verification

Focus on measuring
Performance achieved by each consumer
Resource utilizations for provider interests
Composed of
A host sensor collector
An aggregation meta-daemon
A cluster/host web-interface (for human
consumption)

17
Web-Interface Example
18
Talk Outline / Part II

Part I
Introduction
Problem Statement
Syntax and Semantics
Usage Policy Enforcement Model
Site Usage Policy Enforcement
VO Usage Policy Enforcement
Verifying Monitoring Infrastructure
Part II
Architecture Simulations
Usage Policy Experimental Results
S-PEP Usage Policy Enforcement
Local RM Usage Policy Enforcement
Quantitative Comparison
Part III
Grid3 Usage Policy Enforcement Status
Conclusions

19
Architecture Simulations

Workloads
Overlays work for two VOs/two groups (440 jobs)
One hour simulation period
Usage Policy (Allocating grid resources from 2
sites to 2 VOs)
(1) CPU, Site1, VO0, (3600,20),(5,60)
(2) CPU, Site2, VO0, (3600,20),(5,60)
(3) CPU, Site1, VO1, (3600,80),(5,90)
(4) CPU, Site2, VO1, (3600,80),(5,90)
Jobs Job States
Arrive, are executed, and leave the system
(Poisson distribution)
Without any precedence constraints
Four states submitted by a user to a submission
host submitted by a submission host to a site,
but queued or held running at a site and
completed

20
Simulation Results(Allocation of Grid resources
to VOs)
Figure 2 VO0 job execution on two sites
Figure 4 VO1 Jobs execution on two grid sites
Figure 5 Usage Policy for VO1 on two sites
Figure 3 Usage Policy for VO0 on two sites
21
Simulation Results(Allocation of VO resources to
Groups)
Figure 6 Workload for VO0 Group0
Figure 8 Workload for VO0 Group1
Figure 9 Policy enforcement for VO0 Group1
Figure 7 Policy enforcement for VO0 Group0
22
Simulation Results(Overall Grid Utilization
Generality)
Figure 10 Overall grid workload
Figure 11 Overall grid CPU utilization
Figure 13 Policy for VO0 Group0 on 5 sites
Figure 12 Policy for VO0 on 5 sites
23
Architecture Experiments

Comparison between the two S-PEP solutions
presented before, as well as with the simulation
results
Problem
How well resource managers are able to enforce
the extensible usage policy?
Settings (similar to the simulation studies)
Workloads identical
Usage Policies (as before)
CPU, Site0, VO0, (3000s,20),(30s,60)
CPU, Site0, VO1, (3000s,80),(30s,90)
Jobs were submitted via the Globus Toolkit 2.0

24
Experimental Results(Allocation of Grid
resources to VOs Comparisons)
Figure 14 S-PEP with Condor (VO0)
Figure 18 Condor as S-PEP (VO0)
Figure 16 S-PEP with Maui/OpenPBS (VO0)
Figure 20 OpenPBS/Maui as S-PEP (VO0)
25
Quantitative Comparisons

Burst Usage Policy Violation
BUPV S (BETi) / (cpus ?t)
Epoch Usage Policy Violation
EUPV S (EETi) / (cpus ?t)
Aggregated Resource Utilization
ARU S (ETi) / (cpus ?t)

26
Comparison Results

S-PEP achieves better enforcement of epoch usage
policies than do local resource managers
For burst usage policies, there is no clear
winner
PBS is slightly better at enforcing burst usage
policies than is Condor

Policy BUPV EUPV ARU
S-PEP/Condor 0.12 0.01 70.2
S-PEP/PBS 0.10 0.08 88.5
RM/Condor 0.16 0.16 73.2
RM/PBS 0.03 0.17 67.6
27
Talk Outline / Part III

Part I
Introduction
Problem Statement
Syntax and Semantics
Usage Policy Enforcement Model
Site Usage Policy Enforcement
VO Usage Policy Enforcement
Verifying Monitoring Infrastructure
Part II
Architecture Simulations
Usage Policy Experimental Results
S-PEP Usage Policy Enforcement
Local RM Usage Policy Enforcement
Quantitative Comparison
Part III
Grid3 Usage Policy Enforcement Status
Conclusions

28
Grid3 UP Status

U. Chicago (Grid3) Site usage policies and
actual resource utilizations over a period of two
weeks for two VOs (USATLAS and Grid-Exerciser)
Usage policies (based on Condors extensible fair
share)
35 for USATLAS and
0.1 for GRIDEX

USATLASs jobs are immediately executed
As soon as USTALAS load decreases (X800, or
X1030), Grid-Exercisers jobs take over and get
all the resources they request
When the USATLAS jobs start again (X950),
Grid-Exercisers jobs are throttled back

29
Grid3 Advantages

Aggregated response time (ART)
ART Si1..N RTi / N
Aggregated job completion (AJC)
AJC Job Completed / Job Submitted

The use of policy information leads to
A better job completion rate
Higher the response time

Policy ART ARU AJC
RA /NP 97.01 0.27 0.54
RR /NP 114.99 0.27 0.54
RA /SC 126.43 0.35 0.69
RR /SC 130.70 0.33 0.65
30
Work in Progress GRUBER
Choices
Site Selector
select site whererecommendation-honored and
condition
Users VO Higher-Level
Automated Grid Schedulers (Match-Makers)
Choices
Start-time Predictor
Add start-time prediction to Choice-Table
Job Information
Choices
Site Recommender (can be multi-level Grid, VO,
and Group)
UP DB
SiteRec (VO, ResRequest) gt Choice-Table
S-POP Data
Automated Agent
31
Related Work

SPHINX a centralized resource broker
CREMONA a framework for SLA specification and
automated negotiations
GARA
MAUI

32
Conclusions

Achievements
A model for usage policy based resource
scheduling that can be applied with success in
real case scenarios
The comparison of two cluster-wide enforcement
prototypes
The design of an infrastructure for usage
policy-based grid scheduling
Open problems
Resource over-provisioning i.e., a policy that
allocates 40 of CPU to VO0 and 80 to VO1
Focus on other resources besides processor time,
disk and network

33
Addressed Questions