Title: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation
1Lessons Learned in the Purdue Teragrid Condor
PoolsWith An Adventure in Light Weight
Adaptation
P. A. Cheeseman (aai_at_purdue.edu) Preston Smith
(psmith_at_purdue.edu)Rosen Center for Advanced
ComputingPurdue University
2Purdue Condor Pools
-
- Rosen Center Clusters
- Condor backfills among idle nodes in PBS clusters
- Provided 5.5 million CPU-hours in 2006, all from
idle nodes in clusters - Nature of Purdue pools makes for non-trivial
chance of job eviction. More on this later. - Campus
- Idle labs
- Departments around campus
3Purdue TeraGrid
- All in all, 6400 CPUs available!
- Use on TeraGrid
- 2.4 million hours in 2006 spent Building a
database of hypothetical zeolite structions - Solving the Football Pool Problem
- Already in 2007 5.5 million hours allocated
- 4th largest single award in March allocations
meeting - Condor provides TeraGrid unparalleled price/cycle
- Similar throughput in terms of hours serviced
with Cray XT3, DataStar, etc., for much less cost
4Purdue TeraGrid - Challenges
- Usage reporting
- TeraGrid uploads per-job usage nightly. This
proved challenging to collect with Condor - Perl scripting and data massaging to process
history files and inject data into a database. - Usage reporting infrastructure (AMIE) unable to
keep up with the deluge of job records. - But thats TeraGrids issue, not Condors.
- TG implemented temporary solution - usage
reporting up to date
5Purdue Teragrid - Usage Reporting
- Detective Work learning how to determine accurate
job time - RemoteWallClockTime - CumulativeSuspensionTime
- Not so useful for charged time on (such as
TeraGrid) - Require manually computing difference of
completion time and last start time. - Occasional bugs
- Negative walltime numbers (or really large ones)
- Usually in a job that has been condor_rmd
6More on Usage Reporting
- RCAC tracks job-level history. Similar history
processing scripts used in campus grid as for
TeraGrid - Difficult to locate every schedd and grab history
from it - Even more complicated when some schedds that we
want to account for usage are under different
administration. - Skate aroung with ssh-key to collect history
files - We would love a centralized method to gather or
record job history - Or condor_history outputting XML or GGF usage
records..
7TeraGrid Projects
Prof. Keith Cherkauer (Purdue University) Hydrolo
gic Simulations, continuing enterprise, reasonabl
y predictable impact. I/O to CPU on the order of
50-200 MB/hour. File system saturation a strong
possibility. Prof. M. W. Deem (Rice
University)Prof. D. J. Earl (University of
Pittsburgh) Hypothetical Zeolite
Structures Monte Carlo computation. Average
time per set of 1 hour with broad variance. I/O
to Time on the order of 1-2 MB/hour on average.
8Lesson Learned Early - Cherkauer
- Reality of leverage.
- 500 jobs at 50-200 MB/Hour can keep a single
file system very busy. The Cherkauer application
was identifiably a problem for a particular
parallel filesystem that shall not be named, at
more than 200 jobs in simultaneous execution (10
GB/Hour minimum).Problems were resolved by
conversion to standard universe to enable longer
duration and fewer jobs. - Eliminated system() calls.
- Added code to locate data files per search path.
- Resulting code was usable under both vanilla and
standard universes. Production runs presently
being done in standard universe with file
transfer.
9Outcome - Cherkauer
- Procedures developed to set up submissions allow
for jobs to be queued in digestible batches. - Procedures in full automatic mode could be
used to complete an entire problem while
handling remote archive of results to avoid file
system issues. - Computation known to require a month or more of
time now completes in less than a day.
10Database of Hypothetical Zeolite Structures
- Prototyping
- Trial group of 100 parameter sets was used to
prototype. - Initial live data group was 6707 parameter
sets. - Set processed by executing program 100 times
(cycles). - Execution of application performed by script
in vanilla universe. Script allowed self
checkpoint capability and duration control. - Prototyping Observations
- Early delivery rates of 7200 hours/day easily
achieved. - Ultimate number of sets to process was not well
known. First estimate of 500,000 grew to
2,900,000 by 2007/02. - Eviction rates were unacceptably high (see
Figure 1).
11 Figure 1
12Database of Hypothetical Zeolite Structures
- Prototyping Observations
- Compute times per set variable from minutes to
several hours. (see Figures 2 and 3). - Execution speed strongly related to compiler.
Intel compilers were known in advance to
produce significantly faster code.
13Database of Hypothetical Zeolite Structures
- Adaptation Issues
- Limiting job duration to eliminate runaways and
limit eviction. - Increasing small job duration to lower overhead
of handling. - Preemption tolerance (self checkpoint).
- Fault tolerance.
- Many issues were initially addressed via
execution script. Adaptation to standard
universe was thought to be a must.
14Database of Hypothetical Zeolite Structures
- Workflow
- Groups delivered via HTTP from Prof. Earls web
site. - Sets per group ranged from 7000 to 30,000.
- Results returned to Prof. Earl via drop zone
in archival storage for post-analysis until
approximately 10/2006. Post-analysis was
subsequently handled at Purdue in Condor. - Processing at Purdue
- Steward procedures developed to feed jobs to
Condor, monitor progress, validate results,
resubmit unanticipated failure cases, and
archive results for group. - Stewards were designed to process group in
batches of 2000 sets to allow processing
within 6-8 GB of volatile storage.
15Database of Hypothetical Zeolite Structures
- Adapting for the Purdue Condor Pools
- Eliminate need for execution side script to
pave the way for standard universe execution. - Incorporate repetitive execution within core
application. Address overhead of execution
side script, multiple loads of core
application, enable transition to standard
universe. - Introduce self imposed timing controls.
Address inability to identify runaways among
1000s of jobs. - Embed reasonable self checkpoint capability.
Address both preemption and fault tolerance. - Introduce ability to tune average job duration
to Condor pool conditions. Address eviction
rate problem. - Any other code work required to achieve the
points above. Some memory management work was
expected.
16Database of Hypothetical Zeolite Structures
- Notes on Condor Adaptation
- Written adaptation plan reviewed by all
concerned parties. - Adaptation work undertaken while production
continued. - Modifications to existing code plus new code
30 routines. - Several hundred lines of non-commentary
written. - Code revisions validated periodically by
textual comparison of result files for 100
parameter sets from control case. - Adaptation period spanned compiler version
changes. - Adapted code became production version 09/2006.
- Approximately 325,000 sets completed before
production. - Code adaptation mandated changes to steward
procedures
17Database of Hypothetical Zeolite Structures
- After adapting to Condor?
- Execution times became manageable (see Figures
4 and 5). - Eviction rates fell to more controllable ratios
(Figure 6). - Workflow became more automatic with ability to
limit job duration (and exposure to various
system hiccups). Recovering loss of a few
hundred short jobs was easier than recovering
loss of the same number of long jobs. - Application could run in either of standard or
vanilla universe equally well due to duration
control. - Ultimate choice was to remain in vanilla
universe to continue using Intel V9 compiler
suite. - Front end load due to steward procedures was
reduced due to less handling of intermediate
semaphore and lock files.
18Figure 2
Figure 3
Figure 5
Figure 4
19Database of Hypothetical Zeolite Structures
Figure 6
20Database of Hypothetical Zeolite Structures
- Project to-date
- 1.5 million sets processed since 2006/02
including dry spells due to delays in
workflow, exhaustion of allocation, and
processing of renewal. - 2.4 million hours officially delivered to the
project since 2006/02 or 250 hours per hour
excluding dry spells. - Most recent throughput delivered 96,000 hours
in 226 hour time span or 400 processor hours
per hour. - Entire collaboration continues exclusively via
e-mail. - Approximately 1.4 million sets remain.
21Database of Hypothetical Zeolite Structures
- Miscellany
- Throughout the project, emphasis was given to
designing stewards to return results to Prof.
Earl in the same arrangement as they were
delivered to ease post-analysis. Revision of the
data structure was never seriously undertaken,
nor seen to be necessary. - While getting jobs into execution was initial
primary concern, bulk of work in stewards
ultimately centered on automating handling of
results. - Various working file systems were tried during
production. The present procedures operate
using volatile storage for active computation,
high capacity storage for staging, and long term
(tape robot) storage for archival.
22Database of Hypothetical Zeolite Structures
- More Miscellany
- Core of steward procedures composed of less
than 10 scripts. - Many additional scripts written to gather post
mortem data w.r.t. job cost, fault statistics,
exhaustive result validation, and summaries. - DAGs were explored as a job metering tool but
deferred due to problems not well understood
and demands of production. Since the workflow
didnt implicitly require DAG features, the
production methods were retained until a solid
reason for using DAGs could be discerned.
Additionally, pre and post procedures were
known to be an undertaking as demanding as
developing the stewards. - Adaptation of the stewards to other batch
systems was done more easily than expected.
Batch systems for which prototypes were done
included PBS, LSF, and LoadLeveler.
23Required Plug
- TeraGrid 07
- Right here at UW!
- June 4-8,2 2007
- Full analysis of Zeolite application, plus other
Condor work from Purdue in proceedings - Condor tutorial and demonstrations
- Come join us or even help!