Condor-G Making Condor Grid Enabled - PowerPoint PPT Presentation

1 / 36
About This Presentation
Title:

Condor-G Making Condor Grid Enabled

Description:

glide-ins. ondor. C. www.cs.wisc.edu/condor. GlideIn Concerns ... glide-ins. 600 Condor. jobs. ondor. C. www.cs.wisc.edu/condor. Current Status ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 37
Provided by: Miron1
Learn more at: https://www.cs.wisc.edu
Category:
Tags: condor | enabled | grid | ins | making

less

Transcript and Presenter's Notes

Title: Condor-G Making Condor Grid Enabled


1
Condor-GMaking Condor Grid Enabled
2
Outline
  • Why use Condor-G
  • Globus Universe
  • GlideIn
  • Status Future Work

3
What is Condor-G?
  • Extensions to Condor to allow access to the Grid
    through Globus
  • Two Parts
  • Globus Universe
  • GlideIn

4
Why Use Condor-G
  • Condor
  • Designed to run jobs within a single
    administrative domain
  • Globus
  • Designed to run jobs across many administrative
    domains
  • Condor-G
  • Combine the strengths of both

5
Condor-G Helps Condor Users
  • Machines available to Condor users are limited
  • Local Condor Pool
  • Friendly Condor Pools (via Flocking)
  • Through Globus, many more machines become
    available to run your jobs

6
Condor-G Helps Globus Users
  • Globus is primarily an infrastructure upon which
    to develop distributed applications
  • Command-line tools are limited
  • Some users dont want to rewrite their
    applications to use Globus
  • Condor-G provides them a powerful interface to
    the Grid to run their existing applications

7
Globus Universe
  • Advantages of using Condor as a front-end to
    Globus
  • Full-featured queuing service
  • Fault-tolerance
  • Credential Management

8
Full-Featured Queue
  • Persistent queue
  • Many queue-manipulation tools
  • Set up job dependencies (DAGman)
  • E-mail notification of events
  • Log files

9
Fault-Tolerance
  • Local Crash
  • Queue state kept on disk
  • Condor Master restarts other daemons
  • Remote Crash
  • Condor will resubmit jobs
  • Globus jobmanager enhanced to improve
    recoverability

10
Credential Management
  • Authentication in Globus is done with
    limited-lifetime X509 proxies
  • Proxy may expire before jobs finish executing
  • Condor can put jobs on hold and e-mail user to
    refresh proxy

11
How It Works
Personal Condor
Globus Resource
Schedd
LSF
12
How It Works
Personal Condor
Globus Resource
Schedd
LSF
13
How It Works
Personal Condor
Globus Resource
Schedd
LSF
GridManager
14
How It Works
Personal Condor
Globus Resource
JobManager
Schedd
LSF
GridManager
15
How It Works
Personal Condor
Globus Resource
JobManager
Schedd
LSF
GridManager
User Job
16
Globus Universe
  • Disadvantages
  • No matchmaking or dynamic scheduling of jobs
  • No job checkpoint or migration
  • No remote system calls

17
Solution GlideIn
  • Use the Globus Universe to run the Condor daemons
    on Globus resources
  • When the resources run these GlideIn jobs, they
    will join your Condor Pool
  • Submit your jobs as Standard or Vanilla Universe
    jobs and they will be matched and run on the
    Globus resources

18
How It Works
Personal Condor
Globus Resource
Schedd
LSF
Collector
19
How It Works
Personal Condor
Globus Resource
Schedd
LSF
Collector
20
How It Works
Personal Condor
Globus Resource
Schedd
LSF
GridManager
Collector
21
How It Works
Personal Condor
Globus Resource
JobManager
Schedd
LSF
GridManager
Collector
22
How It Works
Personal Condor
Globus Resource
JobManager
Schedd
LSF
GridManager
Startd
Collector
23
How It Works
Personal Condor
Globus Resource
JobManager
Schedd
LSF
GridManager
Startd
Collector
24
How It Works
Personal Condor
Globus Resource
JobManager
Schedd
LSF
GridManager
Startd
Collector
User Job
25
GlideIn Concerns
  • What if a Globus resource kills my GlideIn?
  • That resource will disappear from your pool and
    you jobs will be rescheduled on other machines
  • What if all my jobs are done before a GlideIn
    runs?
  • If the glided-in Condor daemons are not matched
    with a job in 10 minutes, they terminate

26
Group Condor
27
Group Condor
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
Current Status
  • First version of GridManager ready
  • Runs jobs using Globus GRAM
  • Stages executable and standard I/O using Globus
    GASS
  • Jobmanager changes will be folded into a future
    release of Globus
  • Credential management in progress

35
Future Work
  • GridManager
  • Stage user jobs data files
  • Automatic GlideIn
  • Condor creates GlideIn jobs when more resources
    are needed
  • Matchmaking in Globus Universe
  • Use Globus GRIS to create ClassAds for Globus
    resources and match them to job ClassAds

36
QuestionsandThank You!
Write a Comment
User Comments (0)
About PowerShow.com