Title: Ivan Rodero, Julita Corbaln, Rosa M' Badia, Jess Labarta
1eNANOS Grid Resource BrokerEuropean Grid
Conference, February 14 2005
- Ivan Rodero, Julita Corbalán, Rosa M. Badia,
Jesús Labarta - CEPBA-IBM Research Institute
- Technical University of Catalonia (UPC), Spain
- irodero, juli, rosab, jesus_at_ac.upc.edu
2Motivations and requirements
- Resource Broker oriented to HPC research (eNANOS
project) - On top of Globus Toolkit 3 (based on WS)
- Dynamic policy management
- Extensible and modular design
- Easy and useful interfaces (API and
command-line) - Data persistence
- Conclusion to develop a Resource Broker from
the scratch to fit our specific requirements
3Agenda
- System architecture
- Components and behavior
- Main functionalities
- User criteria and policy management
- Functionality evaluation and conclusions
- Future work
4System Architecture
5Components and behavior
Execution environment
finishes
jobi
resource 2
resource server n
resource server 1
notifies
submits
Broker (Grid Service)
discovers
Job Submission
Resource Discovery
User environment
pending jobs
jobi
jobi
res2
running
idle
finished
enqueues
schedules
selects
API
submit
Resource Selection
job ID
Resource Monitoring
Client
Job Monitoring
job ID 5_at_1092731973163
6Main functionalities
- Job Submission
- Job Monitoring
- Resource Discovery
- Resource Selection
- Resource Monitoring
7Job Submission
Client (user, application, portal)
- RSL and user criteria are required
- Jobs are scheduled periodically depending of
the Broker policy - Examples FIFO, REALTIME, EDF
- Uses GRAM to submit and manage both GT2 and
GT3 jobs
RSL user criteria
API
BROKER
RESOURCE
8Job Monitoring
- Controls the job status through
- GRAM notifications (callbacks)
- An additional thread (forces the job status
update) - Implements the persistency of jobs by a thread
that periodically updates the recovery file
(contains queued jobs) - In coordination with the resource monitoring
module
9Resource Discovery
- Local information about Grid resources
- This information is updated periodically by
a thread - Supports both GT2 and GT3 resources
- GLUE specification
- Resources obtained from resource servers (GGRIS)
- GIIS and GRIS from GT2
- Index Service from GT3
10Resource Selection
- Filters resources depending of the RSL and
requirements - Returns an ordered set of candidates resources
- Policies can be changed dynamically by the user
Changes policy
User Criteria
RSL
Resources
RESOURCE SELECTION POLICY
RESOURCE SELECTION POLICY
Result
Resource Discovery
Resource 2
Resource 4
11User Criteria
- XML file used by the resource selection module
to choose and order resources - List of criteria (attributes)
- Examples RAM size, clock speed, OS, total
CPUs, etc. - Hard and Soft attributes (requirements and
- recommendations, respectively)
- Priority (if it is a soft attribute)
- Supports operators
- Example
- ltAttibute NameClockSpeed TypeINTEGER
Operatorgt Value500 ImportanceSOFT
Priority7/gt
12User Criteria - GUI
13Resource Monitoring
- A thread controls periodically changes in the
available resources in the Grid - When a machine fails, all running jobs are
rescheduled through another higher priority
queue - The Broker implements the persistency of
resources servers with a XML file
14Policy management
n
m resources
1
1
m
n jobs
- Basic policies job scheduling and resource
selection - Extensible implementation through interfaces
- Policy management.
- Interface example
- evaluate(List jobs, List resources) job,
resource - Client interface examples
- get_MetaSchPolicies() string
- set_JobSchPolicy(String policy) int
SCHEDULING POLICY
k jobs
1
k
RESOURCE SELECTION POLICY
j resources
job1
resl
res1
resj
job2
res3
res2
res4
Resi
jobi
res5
jobk
resp
res4
META-POLICY (e.g. genetic algorithm)
job / resource
15Functionality
- We have done functionality tests including data
persistence and job rescheduling - Basic performance and behavior studies
irodero_at_pcmas/broker/proves job_history
3_at_1087831873921 21/5/2004 172011 gt JOB
CREATION 21/5/2004 172011 gt JOB QUEUED in
PENDING queue 21/5/2004 173113 gt RESTORED
From Recovery File (to be created another
time) 21/5/2004 173113 gt JOB
CREATION 21/5/2004 173113 gt JOB QUEUED in
PENDING queue 21/5/2004 173201 gt JOB SUBMITED
to pcirodero.ac.upc.es 21/5/2004 173301 gt
JOB QUEUED FOR RETRYING because resource
pcirodero.ac.upc.es has failed 21/5/2004 173411
gt JOB SUBMITED to pcmas 21/5/2004 174625 gt
JOB DONE
16Conclusions
- We have presented a Resource Broker implemented
as a Grid Service and compatible with both GT3
and GT2 services - Using GT3 and Java we obtained more
flexibility, oriented to WS, standard and
portable - We have encountered some problems with GT3
(especially with GRAM) - The GT3 main problems such as overhead are
being solved with GT4
17Future Work
- Coordinated scheduling between
- Grid Broker (eNANOS Broker)
- Local Scheduler (eNANOS Scheduler)
- Queuing System (LoadLeveler)
- Processor Scheduler (NANOS-RM)
- for MPIOpenMP applications in HPC environments
- Scheduling based on prediction techniques
- Porting the eNANOS Broker to GT4
18Thank you -)irodero_at_ac.upc.edu