Condor Tutorial for Users INFN-Bologna, 6/29/99 - PowerPoint PPT Presentation

Loading...

PPT – Condor Tutorial for Users INFN-Bologna, 6/29/99 PowerPoint presentation | free to download - id: 14f7d5-MjU2Z



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Condor Tutorial for Users INFN-Bologna, 6/29/99

Description:

If you want the email to go to a different address, use this: notify_user = email_at_address.here ... Finding and Using the ClassAd Attributes in your Pool ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 91
Provided by: derekw4
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Condor Tutorial for Users INFN-Bologna, 6/29/99


1
Condor Tutorial for UsersINFN-Bologna, 6/29/99
  • Derek Wright
  • Computer Sciences Department
  • University of Wisconsin-Madison
  • wright_at_cs.wisc.edu

2
Conventions Used In This Presentation
  • A slide with an all-yellow background is the
    beginning of a new chapter
  • The slides after it will describe each entry on
    the yellow slide in great detail
  • A Condor tool that users would use will be in red
    italics
  • A ClassAd attribute name will be in blue
  • A UNIX shell command or file name will be in
    courier font

3
What is Condor?
  • A system for High-Throughput Computing
  • Lots of jobs over a long period of time, not a
    short burst of high-performance
  • Condor manages both resources (machines) and
    resource requests (jobs)
  • Supports additional features for jobs that are
    re-linked with Condor libraries
  • checkpointing
  • remote system calls

4
Whats Condor Good For?
  • Managing a large number of jobs
  • You specify the jobs in a file and submit them to
    Condor, which runs them all and sends you email
    when they complete
  • Mechanisms to help you manage huge numbers of
    jobs (1000s), all the data, etc.
  • Condor can handle inter-job dependencies (DAGMan)

5
Whats Condor Good For? (contd)
  • Robustness
  • Checkpointing allows guaranteed forward progress
    of your jobs, even jobs that run for weeks before
    completion
  • If an execute machine crashes, you only loose
    work done since the last checkpoint
  • Condor maintains a persistent job queue - if the
    submit machine crashes, Condor will recover

6
Whats Condor Good For? (contd)
  • Giving you access to more computing resources
  • Checkpointing allows your job to run on
    opportunistic resources (not dedicated)
  • Checkpointing also provides migration - if a
    machine is no longer available, move!
  • With remote system calls, you dont even need an
    account on a machine where your job executes

7
What is a Condor Pool?
  • Pool can be a single machine, or a group of
    machines
  • Determined by a central manager - the
    matchmaker and centralized information repository
  • Each machine runs various daemons to provide
    different services, either to the users who
    submit jobs, the machine owners, or the pool
    itself

8
What Kind of Job Do You Have?
  • You must know some things about your job to
    decide if and how it will work with Condor
  • What kind of I/O does it do?
  • Does it use TCP/IP? (network sockets)
  • Can the job be resumed?
  • Is the job multi-process (fork(), pvm_addhost(),
    etc.)

9
What Kind of I/O Does Your Job Do?
  • Interactive TTY
  • Batch TTY (just reads from STDIN and writes to
    STDOUT or STDERR, but you can redirect to/from
    files)
  • X Windows
  • NFS, AFS, or another network file system
  • Local file system
  • TCP/IP

10
What Does Condor Support?
  • Condor can support various combinations of these
    features in different Universes
  • Different Universes provide different
    functionality for your job
  • Vanilla
  • Standard
  • Scheduler
  • PVM

11
What Does Condor Support?
12
Condor Universes
  • A Universe specifies a Condor runtime
    environment
  • STANDARD
  • Supports Checkpointing
  • Supports Remote System Calls
  • Has some limitations (no fork(), socket(), etc.)
  • VANILLA
  • Any Unix executable (shell scripts, etc)
  • No Condor Checkpointing or Remote I/O

13
Condor Universes (contd)
  • PVM (Parallel Virtual Machine)
  • Allows you to run parallel jobs in Condor (more
    on this later)
  • SCHEDULER
  • Special kind of Condor job the job is run on the
    submit machine, not a remote execute machine
  • Job is automatically restarted is the
    condor_schedd is shutdown
  • Used to schedule jobs (e.g. DAGMan)

14
Submitting Jobs to Condor
  • Choosing a Universe for your job (already
    covered this)
  • Preparing your job
  • Making it batch-ready
  • Re-linking if checkpointing and remote system
    calls are desired (condor_compile)
  • Creating a submit description file
  • Running condor_submit
  • Sends your request to the User Agent
    (condor_schedd)

15
Preparing Your Job
  • Making your job batch-ready
  • Must be able to run in the background no
    interactive input, windows, GUI, etc.
  • Can still use STDIN, STDOUT, and STDERR (the
    keyboard and the screen), but files are used for
    these instead of the actual devices
  • If your job expects input from the keyboard, you
    have to put the input you want into a file

16
Preparing Your Job (contd)
  • If you are going to use the standard universe
    with checkpointing and remote system calls, you
    must re-link your job with Condors special
    libraries
  • To do this, you use condor_compile
  • Place condor_compile in front of the command
    you normally use to link your job

condor_compile gcc -o myjob myjob.c
17
Creating a Submit Description File
  • A plain ASCII text file
  • Tells Condor about your job
  • Which executable, universe, input, output and
    error files to use, command-line arguments,
    environment variables, any special requirements
    or preferences (more on this later)
  • Can describe many jobs at once (a cluster) each
    with different input, arguments, output, etc.

18
Example Submit Description File
Example condor_submit input file (Lines
beginning with are comments) NOTE the words
on the left side are not case sensitive,
but filenames are! Universe
standard Executable /home/wright/condor/my_job.c
ondor Input my_job.stdin Output
my_job.stdout Error my_job.stderr Log
my_job.log Arguments -arg1
-arg2 InitialDir /home/wright/condor/run_1 Queue

19
Example Submit Description File Described
  • Submits a single job to the standard universe,
    specifies files for STDIN, STDOUT and STDERR,
    creates a UserLog defines command line arguments,
    and specifies the directory the job should be run
    in
  • Equivalent to (for outside of Condor)

cd /home/wright/condor/run_1
/home/wright/condor/my_job.condor -arg1 -arg2 \
gt my_job.stdout 2gt my_job.stderr \ lt
my_job.stdin
20
Clusters and Processes
  • If your submit file describes multiple jobs, we
    call this a cluster
  • Each job within a cluster is called a process
    or proc
  • If you only specify one job, you still get a
    cluster, but it has only one process
  • A Condor Job ID is the cluster number, a
    period, and the process number (23.5)
  • Process numbers always start at 0

21
Example Submit Description File for a Cluster
Example condor_submit input file that defines
a whole cluster of jobs at once Universe
standard Executable /home/wright/condor/my_job.c
ondor Input my_job.stdin Output
my_job.stdout Error my_job.stderr Log
my_job.log Arguments -arg1
-arg2 InitialDir /home/wright/condor/run_(Proce
ss) Queue 500
22
Example Submit Description File for a Cluster -
Described
  • Now, the initial directory for each job is
    specified with the (Process) macro, and instead
    of submitting a single job, we use Queue 500 to
    submit 500 jobs at once
  • (Process) will be expaned to the process number
    for each job in the cluster (from 0 up to 499 in
    this case), so well have run_0, run_1,
    run_499 directories
  • All the input/output files will be in different
    directories!

23
Running condor_submit
  • You give condor_submit the name of the submit
    file you have created
  • condor_submit parses the file and creates a
    ClassAd that describes your job(s)
  • Creates the files you specified for STDOUT and
    STDERR
  • Sends your jobs ClassAd(s) and executable to the
    condor_schedd, which stores the job in its queue

24
Monitoring Your Jobs
  • Using condor_q
  • Using a User Log file
  • Using condor_status
  • Using condor_rm
  • Getting email from Condor
  • Once they complete, you can use condor_history to
    examine them

25
Using condor_q
  • To view the jobs you have submitted, you use
    condor_q
  • Displays the status of your job, how much compute
    time it has accumulated, etc.
  • Many different options
  • A single job, a single cluster, all jobs that
    match a certain constraint, or all jobs
  • Can view remote job queues (either individual
    queues, or -global)

26
Using a User Log file
  • A UserLog must be specified in your submit file
  • Log filename
  • You get a log entry for everything that happens
    to your job
  • When it was submitted, when it starts executing,
    if it is checkpointed or vacated, if there are
    any problems, etc.
  • Very useful! Highly recommended!

27
Using condor_status
  • To view the status of the whole Condor pool, you
    use condor_status
  • Can use the -run option to see which machines
    are running jobs, as well as
  • The user who submitted each job
  • The machine they submitted from
  • Can also view the status of various submitters
    with -submitter ltnamegt

28
Using condor_rm
  • If you want to remove a job from the Condor
    queue, you use condor_rm
  • You can only remove jobs that you own (you cant
    run condor_rm on someone elses jobs unless you
    are root)
  • You can give specific job IDs (cluster or
    cluster.proc), or you can remove all of your jobs
    with the -a option.

29
Getting Email from Condor
  • By default, Condor will send you email when your
    jobs completes
  • If you dont want this email, put this in your
    submit file
  • notification never
  • If you want email every time something happens to
    your job (checkpoint, exit, etc), use this
  • notification always

30
Getting Email from Condor (contd)
  • If you only want email if your job exits with an
    error, use this
  • notification error
  • By default, the email is sent to your account on
    the host you submitted from. If you want the
    email to go to a different address, use this
  • notify_user email_at_address.here

31
Using condor_history
  • Once your job completes, it will no longer show
    up in condor_q
  • Now, you must use condor_history to view the
    jobs ClassAd
  • The status field (ST) will have either a C
    for completed, or an X if the job was removed
    with condor_rm

32
Any questions?
  • Nothing is too basic
  • If I was unclear, you probably are not the only
    person who doesnt understand, and the rest of
    the day will be even more confusing

33
Hands-On Exercise 1 Submitting and Monitoring a
Simple Test Job
34
Hands-On Exercise 1
  • Login to your machine as user condor
  • You will see two windows
  • Netscape, with instructions
  • An xterm, where you execute commands
  • To begin, click on Simple Test Job
  • Please follow the directions carefully
  • Any lines beginning with are commands that you
    should execute in your xterm
  • If you accidentally exit Netscape, click on
    Tutorial in the Start menu

35
Lunch break
  • Please be back by 1330

36
Welcome Back
37
Classified Advertisements
  • ClassAds
  • Language for expressing attributes
  • Semantics for evaluating them
  • Intuitively, a ClassAd is a set of named
    expressions
  • Each named expression is an attribute
  • Expressions are similar to C
  • Constants, attribute references, operators

38
Classified Advertisements Example
  • MyType "Machine"
  • TargetType "Job"
  • Name "froth.cs.wisc.edu"
  • StartdIpAddr"lt128.105.73.4433846gt"
  • Arch "INTEL"
  • OpSys "SOLARIS26"
  • VirtualMemory 225312
  • Disk 35957
  • KFlops 21058
  • Mips 103
  • LoadAvg 0.011719
  • KeyboardIdle 12
  • Cpus 1
  • Memory 128
  • Requirements LoadAvg lt 0.300000
    KeyboardIdle gt 15 60
  • Rank 0

39
Classified Advertisements Matching
  • ClassAds are always considered in pairs
  • Does ClassAd A match ClassAd B (and vice versa)?
  • This is called 2-way matching
  • If the same attribute appears in both ClassAds,
    you can specify which attribute you mean by
    putting MY. or TARGET. in front of the
    attribute name

40
Classified Advertisements Examples
  • ClassAd A
  • MyType "Apartment"
  • TargetType "ApartmentRenter"
  • SquareArea 3500
  • RentOffer 1000
  • HeatIncluded False
  • OnBusLine True
  • Rank UnderGradFalse
    TARGET.RentOffer
  • Requirements MY.RentOffer -
    TARGET.RentOffer lt 150
  • ClassAd B
  • MyType "ApartmentRenter"
  • TargetType "Apartment"
  • UnderGrad False
  • RentOffer 900
  • Rank 1/(TARGET.RentOffer 100.0)
    50HeatIncluded
  • Requirements OnBusLine
  • SquareArea gt 2700

41
ClassAds in the Condor System
  • ClassAds allow Condor to be a general system
  • Constraints and ranks on matches expressed by the
    entities themselves
  • Only priority logic integrated into the
    Match-Maker
  • All principal entities in the Condor system are
    represented by ClassAds
  • Machines, Jobs, Submitters

42
ClassAds in Condor Requirements and
Rank(Example for Machines)
  • Friend Owner "tannenba" Owner "wright"
  • ResearchGroup Owner "jbasney" Owner
    "raman"
  • Trusted Owner ! "rival" Owner !
    "riffraff"
  • Requirements Trusted ( ResearchGroup
    (LoadAvg lt 0.3 KeyboardIdle gt 1560) )
  • Rank Friend ResearchGroup10

43
Requirements for Machine Example Described
  • Machine will never start a job submitted by
    rival or riffraff
  • If someone from ResearchGroup (jbasney or
    raman) submits a job, it will always run,
    regardless of keyboard activity or load average
  • If anyone else submits a job, it will only run
    here if the keyboard has been idle for more than
    15 minutes and the load average is less than 0.3

44
Machine Rank Example Described
  • If the machine is running a job submitted by
    owner foo, it will give this a Rank of 0, since
    foo is neither a friend nor in the same research
    group
  • If wright or tannenba submits a job, it will
    be ranked at 1 (since Friend will evaluate to 1
    and ResearchGroup is 0)
  • If raman or jbasney submit a job, it will
    have a rank of 10
  • While a machine is running a job, it will be
    preempted for a higher ranked job

45
ClassAds in Condor Requirements and
Rank(Example for Jobs)
  • Requirements Arch INTEL OpSys
    LINUX Memory gt 20
  • Rank (Memory gt 32) ( (Memory 100)
    (IsDedicated 10000) Mips )

46
Job Example Described
  • The job must run on an Intel CPU, running Linux,
    with at least 20 megs of RAM
  • All machines with 32 megs of RAM or less are
    Ranked at 0
  • Machines with more than 32 megs of RAM are ranked
    according to how much RAM they have, if the
    machine is dedicated (which counts a lot to this
    job!), and how fast the machine is, as measured
    in Million Instructions Per Second

47
Finding and Using the ClassAd Attributes in your
Pool
  • Condor defines a number of attributes by default,
    which are listed in the User Manual (About
    Requirements and Rank)
  • To see if machines in your pool have other
    attributes defined, use
  • condor_status -long lthostnamegt
  • A custom-defined attribute might not be defined
    on all machines in your pool, so youll probably
    want to use meta-operators

48
ClassAd Meta-Operators
  • Meta operators allow you to compare against
    UNDEFINED as if it were a real value
  • ? is meta-equal-to
  • ! is meta-not-equal-to
  • Color ! Red (non-meta) would evaluate to
    UNDEFINED if Color is not defined
  • Color ! Red would evaluate to True if Color
    is not defined, since UNDEFINED is not Red

49
Hands-On Exercise 2 Submitting Jobs with
Requirements and Rank
50
Hands-On Exercise 2
  • Please point your browser to the new
    instructions
  • Go back to the tutorial homepage
  • Click on Requirements and Rank
  • Again, read the instructions carefully and
    execute any commands on a line beginning with
    in your xterm
  • If you exited Netscape, just click on Tutorial
    from your Start menu

51
Priorities In Condor
  • Two kinds of priorities
  • User Priorities
  • Priorities between users in the pool to ensure
    fairness
  • The lower the value, the better the priority
  • Job Priorities
  • Priorities that users give to their own jobs to
    determine the order in which they will run
  • The higher the value, the better the priority
  • Only matters within a given users jobs

52
User Priorities in Condor
  • Each active user in the pool has a user priority
  • Viewed or changed with condor_userprio
  • The lower the number, the better
  • A given users share of available machines is
    inversely related to the ratio between user
    priorities.
  • Example Freds priority is 10, Joes is 20.
    Fred will be allocated twice as many machines as
    Joe.

53
User Priorities in Condor, cont.
  • Condor continuously adjusts user priorities over
    time
  • machines allocated gt priority, priority worsens
  • machines allocated lt priority, priority improves
  • Priority Preemption
  • Higher priority users will grab machines away
    from lower priority users (thanks to
    Checkpointing)
  • Starvation is prevented
  • Priority thrashing is prevented

54
Job Priorities in Condor
  • Can be set at submit-time in your description
    file with
  • prio ltnumbergt
  • Can be viewed with condor_q
  • Can be changed at any time with condor_prio
  • The higher the number, the more likely the job
    will run (only among the jobs of an individual
    user)

55
Managing a Large Cluster of Jobs
  • Condor can manage huge numbers of jobs
  • Special features of the submit description file
    make this easier
  • Condor can also manage inter-job dependencies
    with condor_dagman
  • For example job A should run first, then, run
    jobs B and C, when those finish, submit D, etc
  • Well discuss DAGMan later

56
Submitting a Large Cluster
  • Anywhere in your submit file, if you use
    (Process), that will expand to the process
    number of each job in the cluster
  • input my_input.(process)
  • arguments (process)
  • It is common to use (Process) to specify
    InitialDir, so that each process runs in its own
    directory
  • InitialDir dir.(process)

57
Submitting a Large Cluster (contd)
  • Can either have multiple Queue entries, or put a
    number after Queue to tell Condor how many to
    submit
  • Queue 1000
  • A cluster is more efficient Your jobs will run
    faster, and theyll use less space
  • Can only have one executable per cluster
    Different executables must be different clusters!

58
Hands-On Exercise 3 Submitting a Large Cluster
of Jobs
59
Hands-On Exercise 3
  • Please point your browser to the new
    instructions
  • Go back to the tutorial homepage
  • Click on Large Clusters
  • Again, read the instructions carefully and
    execute any commands on a line beginning with
    in your xterm
  • If you exited Netscape, just click on Tutorial
    from your Start menu

60
10 Minute Break
  • Questions are welcome.

61
Inter-Job Dependencies with DAGMan
  • DAGMan can be used to handle a set of jobs that
    must be run in a certain order
  • Also provides pre and post operations, so you
    can have a program or script run before each job
    is submitted and after it completes
  • Robust handles errors and submit-machine crashes

62
Using DAGMan
  • You define a DAG description file, which is
    similar in function to the submit file you give
    to condor_submit
  • DAGMan restrictions
  • Each job in the DAG must be in its own cluster
    (this is a limitation we will remove in future
    versions)
  • All jobs in the DAG must have a User Log and must
    share the same file

63
Format of the DAGMan Description File
  • is a comment
  • First section names the jobs in your DAG and
    associates a submit description file with each
    job
  • Second (optional) section defines PRE and POST
    scripts to run
  • Final section defines the job dependencies

64
Example DAGMan Description File
Example DAGMan input file Job A A.submit Job B
B.submit Job C C.submit Job D D.submit Script PRE
D d_input_checker Script POST A
a_output_processor A.out PARENT A CHILD B
C PARENT B C CHILD D
65
Setting up a DAG for Condor
  • Must create the DAG description file
  • Must create all the submit description files for
    the individual jobs
  • Must prepare any executables you plan to use
  • If you want, you can have a mix of Vanilla and
    Standard jobs
  • Must setup any PRE/POST commands or scripts you
    wish to use

66
Submitting a DAG to Condor
  • Once you have everything in place, to submit a
    DAG, you use condor_submit_dag and give it the
    name of your DAG description file
  • This will check your input file for errors and
    submit a copy of condor_dagman as a scheduler
    universe job with all the necessary command-line
    arguments

67
Removing a DAG
  • Removing a DAG is easy
  • Just use on the scheduler universe job
    (condor_dagman)
  • On shutdown, DAGMan will remove any jobs that are
    currently in the queue that are associated with
    its DAG
  • Once all jobs are gone, DAGMan itself will exit,
    and the scheduler universe job will be removed
    from the queue

68
Hands-On Exercise 4 Using DAGMan
69
Hands-On Exercise 4
  • Please point your browser to the new
    instructions
  • Go back to the tutorial homepage
  • Click on Using_DAGMan
  • Again, read the instructions carefully and
    execute any commands on a line beginning with
    in your xterm
  • If you exited Netscape, just click on Tutorial
    from your Start menu

70
Whats Wrong with my Vanilla Job?
  • Special requirements expressions for vanilla jobs
  • You didnt submit it from a directory that is
    shared
  • Condor isnt running as root (more on this later)
  • You dont have your file permissions setup
    correctly (more on this later)

71
Special Requirements Expressions for Vanilla Jobs
  • When you submit a vanilla job, Condor
    automatically appends two extra Requirements
  • UID_DOMAIN ltsubmit_uid_domaingt
  • FILESYSTEM_DOMAIN ltsubmit_fsgt
  • Since there are no remote system calls with
    Vanilla jobs, they depend on a shared file system
    and a common UID space to run as you and access
    your files

72
Special Requirements Expressions for Vanilla Jobs
  • By default, each machine in your pool is in its
    own UID_DOMAIN and FILESYSTEM_DOMAIN, so your
    pool administrator has to configure your pool
    specially if there really is a common UID space
    and a network file system
  • If you dont have an account on the remote
    system, Vanilla jobs wont work

73
Shared File Systems for Vanilla Jobs
  • Just because you have AFS or NFS doesnt mean ALL
    files are shared
  • Initialdir /tmp will probably cause trouble for
    Vanilla jobs!
  • You must be sure to set Initialdir to a shared
    directory (or cd into it to run condor_submit)
    for Vanilla jobs

74
Why Dont My Jobs Run?
  • Try using condor_q -analyze
  • Try specify a User Log for your job
  • Look at condor_userprio maybe you have a bad
    priority and higher priority users are being
    served
  • Problems with file permissions or network file
    systems
  • Look at the SchedLog

75
Using condor_q -analyze
  • condor_q -analyze will analyze your jobs
    ClassAd, get all the ClassAds of the machines in
    the pool, and tell you whats going on
  • Will report errors in your Requirements
    expression (impossible to match, etc.)
  • Will tell you about user priorities in the pool
    (other people have better priority)

76
Looking at condor_userprio
  • You can look at condor_userprio yourself
  • If your priority value is a really high number
    (because youve been running a lot of Condor
    jobs), other users will have priority to run jobs
    in your pool

77
File Permissions in Condor
  • If Condor isnt running as root, the
    condor_shadow process runs as the user the
    condor_schedd is running as (usually condor)
  • You must grant this user write access to your
    output files, and read access to your input files
    (both STDOUT, STDIN from your submit file, as
    well as files your job explicitly opens)

78
File Permissions in Condor (contd)
  • Often, there will be a condor group and you can
    make your files owned and write-able by this
    group
  • For vanilla jobs, even if the UID_DOMAIN setting
    is correct, and they match for your submit and
    execute machines, if Condor isnt running as
    root, your job will be started as user Condor,
    not as you!

79
Problems with NFS in Condor
  • For NFS, sometimes the administrators will setup
    read-only mounts, or have UIDs remapped for
    certain partitions (the classic example is root
    nobody, but modern NFS can do arbitrary
    remappings)

80
Problems with NFS in Condor (contd)
  • If your pool uses NFS automounting, the directory
    that Condor thinks is your InitialDir (the
    directory you were in when you ran condor_submit)
    might not exist on a remote machine
  • E.g. youre in /mnt/tmp/home/me/...
  • With automounting, you always need to specify
    InitialDir explicitly
  • InitialDir /home/me/...

81
Problems with AFS in Condor
  • If your pool uses AFS, the condor_shadow, even if
    its running with your UID, will not have your
    AFS token
  • You must grant an unauthenticated AFS user the
    appropriate access to your files
  • Some sites provide a better alternative that
    world-writable files
  • Host ACLs
  • Network-specific ACLs

82
Looking at the SchedLog
  • Looking at the log file of the condor_schedd, the
    SchedLog file can possibly give you a clue if
    there are problems
  • Find it with
  • condor_config_val schedd_log
  • You might need your pool administrator to turn on
    a higher debugging level to see more verbose
    output

83
Other User Features
  • Submit-Only installation
  • Heterogeneous Submit
  • PVM jobs

84
Submit-Only Installation
  • Can install just a condor_master and
    condor_schedd on your machine
  • Can submit jobs into a remote pool
  • Special option to condor_install

85
Heterogeneous Submit
  • The job you submit doesnt have to be the same
    platform as the machine you submit from
  • Maybe you have access to a pool thats full of
    Alphas, but you have a Sparc on your desk, and
    moving all your data is a pain
  • You can take an Alpha binary, copy it to your
    Sparc, and submit it with a requirements
    expression that says you need to run on ALPHA/OSF1

86
Parallel Jobs in Condor
  • Condor can run parallel applications
  • Written to the popular PVM message passing
    library
  • Future work includes support for MPI
  • Master-Worker Paradigm
  • What does Condor-PVM do?
  • How to compile and submit Condor-PVM jobs

87
Master-Worker Paradigm
  • Condor-PVM is designed to run PVM applications
    which follow the master-worker paradigm.
  • Master
  • has a pool of work, sends pieces of work to the
    workers, manages the work and the workers
  • Worker
  • gets a piece of work, does the computation, sends
    the result back

88
What does Condor-PVM do?
  • Condor acts as the PVM resource manager.
  • All pvm_addhost requests get re-mapped to Condor.
  • Condor dynamically constructs PVM virtual
    machines out of non-dedicated desktop machines.
  • When a machine leaves the pool, the user gets
    notified via the normal PVM notification
    mechanisms.

89
How to compile and submit Condor-PVM jobs
  • Binary Compatible
  • Compile and link with PVM library just as normal
    PVM applications. No need to link with Condor.
  • Submit
  • In the submit description file, set
  • universe PVM
  • machine_count ltmingt..ltmaxgt

90
Obtaining Condor
  • Condor can be downloaded from the Condor web site
    at
  • http//www.cs.wisc.edu/condor
  • Complete Users and Administrators manual
    available
  • http//www.cs.wisc.edu/condor/manual
  • Contracted Support is available
  • Questions? Email
  • condor-admin_at_cs.wisc.edu
About PowerShow.com