A Model for Usage Policy-based Resource Allocation in Grids - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

A Model for Usage Policy-based Resource Allocation in Grids

Description:

Grid resource sharing is challenging when multiple institutions are involved: Participants might wish to delegate ... Without any precedence constraints ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 35
Provided by: catal7
Category:

less

Transcript and Presenter's Notes

Title: A Model for Usage Policy-based Resource Allocation in Grids


1
A Model for Usage Policy-based Resource
Allocation in Grids
Michael Wilde Argonne Natl. Laboratory
University of Chicago
  • Catalin L. Dumitrescu
  • The University of Chicago

Ian Foster Argonne National Laboratory The
University of Chicago
2
Introduction
  • Grid resource sharing is challenging when
    multiple institutions are involved
  • Participants might wish to delegate resource
    utilization under various constraints
  • How are such usage SLAs expressed, discovered,
    interpreted and enforced?
  • We propose
  • An architecture and recursive policy model
  • Define roles and functions for controlled
    resource sharing in Grids

3
Talk Outline / Part I
  • Part I
  • Introduction
  • Problem Statement
  • Syntax and Semantics
  • Usage Policy Enforcement Model
  • Site Usage Policy Enforcement
  • VO Usage Policy Enforcement
  • Verifying Monitoring Infrastructure
  • Part II
  • Architecture Simulations
  • Usage Policy Experimental Results
  • S-PEP Usage Policy Enforcement
  • Local RM Usage Policy Enforcement
  • Quantitative Comparison
  • Part III
  • Grid3 Usage Policy Enforcement Status
  • Conclusions

4
Main Questions
  • How usage policies are enforced at the resource
    and VO level?
  • What strategies must a VO deploy to ensure usage
    policy enforcement?
  • How are usage policies distributed to enforcement
    points?
  • How usage policies are made available to VO job
    and data planners?

5
Problem Overview
  • Main players
  • Resource providers (a site, a VO, etc)
  • Resource consumers (a VO user)
  • Each relationship provider-consumer is governed
    by an appropriate SLA

6
Site SLA Example
  • Assumption
  • provider P has agreed to make R (resource)
    available to consumer C for a period of one month
  • How is this agreement to be interpreted?
  • R might be dedicated to C if needed
  • P might make R available to others when C is not
    using R.
  • P might commit to preempt other users as soon as
    C requests R
  • Over-usage
  • If C is allowed to acquire more than R, then this
    may or may not result in Cs allocation being
    reduced later in the month

7
VO Level and beyond
  • A VO could acts as a broker for a set of
    resources
  • These brokering functions can be implemented in
    different ways
  • The VO partitions resources and establishes SLAs
    directly between its consumers and its providers
  • if providers support the necessary mechanisms,
    the VO hands its consumers digital tickets

8
Syntax and Semantics
  • Resource allocations
  • lt resource-type, provider, consumer,
    epoch-allocation, burst-allocation gt
  • where
  • resource-type CPU NET STORAGE
  • provider site-name vo-name
  • consumer vo-name(vo-name,group-name)
  • epoch-allocation (interval, percentage)
  • burst-allocation (interval, percentage)

9
UP Enforcement Components
  • Policy enforcement points (PEPs)
  • Responsible for enforcing usage policies
  • Several types (S-PEP, V-PEP, etc)
  • Site policy enforcement points (S-PEPs)
  • Reside at all sites and enforce site-specific
    policies
  • VO policy enforcement points (V-PEPs)
  • Associated with VOs
  • Operate in a similar way to S-PEPs
  • Make decisions on a per-job basis to enforce
    policy regarding VO specifications

10
UP Enforcement Prototype
11
Stand-alone S-PEPs (Sol 1)
  • Does not require a usage policy-cognizant cluster
    resource manager (RM)
  • Works with any primitive batch system that
  • Provides accurate usage and state information
    about all scheduled jobs
  • Has job start/stop/held/remove capabilities
  • Has running job increase/decrease priority
    capabilities
  • Interacts with the RM
  • checks continuously the status of jobs in all
    queues
  • invokes management operations on the cluster
    resource manager when required to enforce usage
    policies

12
S-PEP Processing Logic
1. foreach VOi with EPi 2. Case 1 fill BPi 3.
  if S(BAj)0 BAi lt BPi Qi has jobs
then 4.     release job i from some Qi 5. Case
2 available and BAi lt BPi 6.   else if
S(BAk)ltTOTAL BAiltBPi Qi has jobs then
7.     release job i from some Qi 8.  Case 3
res. contention fill EPi 9. else if
S(BAk)TOTAL BAiltEPi Qi has
jobs then 10.   if j exists BAj gt EPj
then 11.      suspend an over-quota job Qj 12.   
release job i from some Qi 13. foreach VOi with
EPi 14.   if EAigtEPi then 15.    suspend jobs
for VOi from all Qi
where EPi Epoch allocation policy for VOi BPi
 Burst allocation policy for VOi Qi set of
queues with jobs from VOi BAi Burst Resource
Allocation for VOi EAi Epoch Resource
Allocation for VOi TOTAL  possible allocation on
the site Over-quota job job of VOj
13
S-PEPs over RMs (Sol 2)
  • Developed and implemented with success in the
    context of Grid3 environment
  • Decoupled S-PEP functionalities
  • Standalone site policy observation point (S-POP)
  • Cluster scheduler for resource allocation control
  • We assume the cluster scheduler is able to
    enforce the desired usage policies
  • Examples Condor, Portable Batch System, and Load
    Sharing Facility, widely used on Grid3

14
V-PEPs
  • Operate at the submission host queues
  • Executes various logics to determine whether new
    jobs should be submitted to sites according to
    the VO usage policies
  • Provides answers to two questions
  • What jobs should be scheduled next?
  • When job j should start?
  • A third question important here, Where job j
    should run?, is also addressed

15
V-PEP Logic
1. foreach (Gi with EPi, BPi, BEi) 2. Case 1
fill BPi BEi 3.   if S(Baj)0
BAiltBpi Qi has jobs then 4.     schedule a
job from some Qi to the least loaded
site 5. Case2 res. available BAiltBPi 6.
 else if (S(BAk) lt TOTAL) BAiltBPi
Qi has jobs then 7.    schedule a job from some
Qi to the least loaded site
8. Case 3 rs. contention fill EPi 9.  else
if (S(BAk)TOTAL)(BAiltEPi) (Qi
exists) then 10.   if (j exists BAjgtEPj)
then 11.      stop scheduling jobs for VOj 12.
Need to fill with extra jobs? 13.  if (BAi lt
EPi BEi) then 14.     schedule a job from
some Qi to the least loaded site 15.
if (EAi lt EPi) (Qi has jobs) then 16.
  schedule additional backfill jobs
16
UP Enforcement Verification
  • Focus on measuring
  • Performance achieved by each consumer
  • Resource utilizations for provider interests
  • Composed of
  • A host sensor collector
  • An aggregation meta-daemon
  • A cluster/host web-interface (for human
    consumption)

17
Web-Interface Example
18
Talk Outline / Part II
  • Part I
  • Introduction
  • Problem Statement
  • Syntax and Semantics
  • Usage Policy Enforcement Model
  • Site Usage Policy Enforcement
  • VO Usage Policy Enforcement
  • Verifying Monitoring Infrastructure
  • Part II
  • Architecture Simulations
  • Usage Policy Experimental Results
  • S-PEP Usage Policy Enforcement
  • Local RM Usage Policy Enforcement
  • Quantitative Comparison
  • Part III
  • Grid3 Usage Policy Enforcement Status
  • Conclusions

19
Architecture Simulations
  • Workloads
  • Overlays work for two VOs/two groups (440 jobs)
  • One hour simulation period
  • Usage Policy (Allocating grid resources from 2
    sites to 2 VOs)
  • (1) CPU, Site1, VO0, (3600,20),(5,60)
  • (2) CPU, Site2, VO0, (3600,20),(5,60)
  • (3) CPU, Site1, VO1, (3600,80),(5,90)
  • (4) CPU, Site2, VO1, (3600,80),(5,90)
  • Jobs Job States
  • Arrive, are executed, and leave the system
    (Poisson distribution)
  • Without any precedence constraints
  • Four states submitted by a user to a submission
    host submitted by a submission host to a site,
    but queued or held running at a site and
    completed

20
Simulation Results(Allocation of Grid resources
to VOs)
Figure 2 VO0 job execution on two sites
Figure 4 VO1 Jobs execution on two grid sites
Figure 5 Usage Policy for VO1 on two sites
Figure 3 Usage Policy for VO0 on two sites
21
Simulation Results(Allocation of VO resources to
Groups)
Figure 6 Workload for VO0 Group0
Figure 8 Workload for VO0 Group1
Figure 9 Policy enforcement for VO0 Group1
Figure 7 Policy enforcement for VO0 Group0
22
Simulation Results(Overall Grid Utilization
Generality)
Figure 10 Overall grid workload
Figure 11 Overall grid CPU utilization
Figure 13 Policy for VO0 Group0 on 5 sites
Figure 12 Policy for VO0 on 5 sites
23
Architecture Experiments
  • Comparison between the two S-PEP solutions
    presented before, as well as with the simulation
    results
  • Problem
  • How well resource managers are able to enforce
    the extensible usage policy?
  • Settings (similar to the simulation studies)
  • Workloads identical
  • Usage Policies (as before)
  • CPU, Site0, VO0, (3000s,20),(30s,60)
  • CPU, Site0, VO1, (3000s,80),(30s,90)
  • Jobs were submitted via the Globus Toolkit 2.0

24
Experimental Results(Allocation of Grid
resources to VOs Comparisons)
Figure 14 S-PEP with Condor (VO0)
Figure 18 Condor as S-PEP (VO0)
Figure 16 S-PEP with Maui/OpenPBS (VO0)
Figure 20 OpenPBS/Maui as S-PEP (VO0)
25
Quantitative Comparisons
  • Burst Usage Policy Violation
  • BUPV S (BETi) / (cpus ?t)
  • Epoch Usage Policy Violation
  • EUPV S (EETi) / (cpus ?t)
  • Aggregated Resource Utilization
  • ARU S (ETi) / (cpus ?t)

26
Comparison Results
  • S-PEP achieves better enforcement of epoch usage
    policies than do local resource managers
  • For burst usage policies, there is no clear
    winner
  • PBS is slightly better at enforcing burst usage
    policies than is Condor

Policy BUPV EUPV ARU
S-PEP/Condor 0.12 0.01 70.2
S-PEP/PBS 0.10 0.08 88.5
RM/Condor 0.16 0.16 73.2
RM/PBS 0.03 0.17 67.6
27
Talk Outline / Part III
  • Part I
  • Introduction
  • Problem Statement
  • Syntax and Semantics
  • Usage Policy Enforcement Model
  • Site Usage Policy Enforcement
  • VO Usage Policy Enforcement
  • Verifying Monitoring Infrastructure
  • Part II
  • Architecture Simulations
  • Usage Policy Experimental Results
  • S-PEP Usage Policy Enforcement
  • Local RM Usage Policy Enforcement
  • Quantitative Comparison
  • Part III
  • Grid3 Usage Policy Enforcement Status
  • Conclusions

28
Grid3 UP Status
  • U. Chicago (Grid3) Site usage policies and
    actual resource utilizations over a period of two
    weeks for two VOs (USATLAS and Grid-Exerciser)
  • Usage policies (based on Condors extensible fair
    share)
  • 35 for USATLAS and
  • 0.1 for GRIDEX
  • USATLASs jobs are immediately executed
  • As soon as USTALAS load decreases (X800, or
    X1030), Grid-Exercisers jobs take over and get
    all the resources they request
  • When the USATLAS jobs start again (X950),
    Grid-Exercisers jobs are throttled back

29
Grid3 Advantages
  • Aggregated response time (ART)
  • ART Si1..N RTi / N
  • Aggregated job completion (AJC)
  • AJC Job Completed / Job Submitted
  • The use of policy information leads to
  • A better job completion rate
  • Higher the response time

Policy ART ARU AJC
RA /NP 97.01 0.27 0.54
RR /NP 114.99 0.27 0.54
RA /SC 126.43 0.35 0.69
RR /SC 130.70 0.33 0.65
30
Work in Progress GRUBER
Choices
Site Selector
select site whererecommendation-honored and
condition
Users VO Higher-Level
Automated Grid Schedulers (Match-Makers)
Choices
Start-time Predictor
Add start-time prediction to Choice-Table
Job Information
Choices
Site Recommender (can be multi-level Grid, VO,
and Group)
UP DB
SiteRec (VO, ResRequest) gt Choice-Table
S-POP Data
Automated Agent
31
Related Work
  • SPHINX a centralized resource broker
  • CREMONA a framework for SLA specification and
    automated negotiations
  • GARA
  • MAUI

32
Conclusions
  • Achievements
  • A model for usage policy based resource
    scheduling that can be applied with success in
    real case scenarios
  • The comparison of two cluster-wide enforcement
    prototypes
  • The design of an infrastructure for usage
    policy-based grid scheduling
  • Open problems
  • Resource over-provisioning i.e., a policy that
    allocates 40 of CPU to VO0 and 80 to VO1
  • Focus on other resources besides processor time,
    disk and network

33
Addressed Questions
  • How usage policies are enforced at the resource
    and VO level?
  • What strategies must a VO deploy to ensure usage
    policy enforcement?
  • How are usage policies distributed to enforcement
    points?
  • How usage policies are made available to VO job
    and data planners?

34
Thanks
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com