Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation

Description:

Usage reporting infrastructure (AMIE) unable to keep up with the deluge of job records. ... TG implemented temporary solution - usage reporting up to date ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 24
Provided by: pres69
Category:

less

Transcript and Presenter's Notes

Title: Lessons Learned in the Purdue Teragrid Condor Pools With An Adventure in Light Weight Adaptation


1
Lessons Learned in the Purdue Teragrid Condor
PoolsWith An Adventure in Light Weight
Adaptation
P. A. Cheeseman (aai_at_purdue.edu) Preston Smith
(psmith_at_purdue.edu)Rosen Center for Advanced
ComputingPurdue University
2
Purdue Condor Pools
  • Rosen Center Clusters
  • Condor backfills among idle nodes in PBS clusters
  • Provided 5.5 million CPU-hours in 2006, all from
    idle nodes in clusters
  • Nature of Purdue pools makes for non-trivial
    chance of job eviction. More on this later.
  • Campus
  • Idle labs
  • Departments around campus

3
Purdue TeraGrid
  • All in all, 6400 CPUs available!
  • Use on TeraGrid
  • 2.4 million hours in 2006 spent Building a
    database of hypothetical zeolite structions
  • Solving the Football Pool Problem
  • Already in 2007 5.5 million hours allocated
  • 4th largest single award in March allocations
    meeting
  • Condor provides TeraGrid unparalleled price/cycle
  • Similar throughput in terms of hours serviced
    with Cray XT3, DataStar, etc., for much less cost

4
Purdue TeraGrid - Challenges
  • Usage reporting
  • TeraGrid uploads per-job usage nightly. This
    proved challenging to collect with Condor
  • Perl scripting and data massaging to process
    history files and inject data into a database.
  • Usage reporting infrastructure (AMIE) unable to
    keep up with the deluge of job records.
  • But thats TeraGrids issue, not Condors.
  • TG implemented temporary solution - usage
    reporting up to date

5
Purdue Teragrid - Usage Reporting
  • Detective Work learning how to determine accurate
    job time
  • RemoteWallClockTime - CumulativeSuspensionTime
  • Not so useful for charged time on (such as
    TeraGrid)
  • Require manually computing difference of
    completion time and last start time.
  • Occasional bugs
  • Negative walltime numbers (or really large ones)
  • Usually in a job that has been condor_rmd

6
More on Usage Reporting
  • RCAC tracks job-level history. Similar history
    processing scripts used in campus grid as for
    TeraGrid
  • Difficult to locate every schedd and grab history
    from it
  • Even more complicated when some schedds that we
    want to account for usage are under different
    administration.
  • Skate aroung with ssh-key to collect history
    files
  • We would love a centralized method to gather or
    record job history
  • Or condor_history outputting XML or GGF usage
    records..

7
TeraGrid Projects
Prof. Keith Cherkauer (Purdue University) Hydrolo
gic Simulations, continuing enterprise, reasonabl
y predictable impact. I/O to CPU on the order of
50-200 MB/hour. File system saturation a strong
possibility. Prof. M. W. Deem (Rice
University)Prof. D. J. Earl (University of
Pittsburgh) Hypothetical Zeolite
Structures Monte Carlo computation. Average
time per set of 1 hour with broad variance. I/O
to Time on the order of 1-2 MB/hour on average.
8
Lesson Learned Early - Cherkauer
  • Reality of leverage.
  • 500 jobs at 50-200 MB/Hour can keep a single
    file system very busy. The Cherkauer application
    was identifiably a problem for a particular
    parallel filesystem that shall not be named, at
    more than 200 jobs in simultaneous execution (10
    GB/Hour minimum).Problems were resolved by
    conversion to standard universe to enable longer
    duration and fewer jobs.
  • Eliminated system() calls.
  • Added code to locate data files per search path.
  • Resulting code was usable under both vanilla and
    standard universes. Production runs presently
    being done in standard universe with file
    transfer.

9
Outcome - Cherkauer
  • Procedures developed to set up submissions allow
    for jobs to be queued in digestible batches.
  • Procedures in full automatic mode could be
    used to complete an entire problem while
    handling remote archive of results to avoid file
    system issues.
  • Computation known to require a month or more of
    time now completes in less than a day.

10
Database of Hypothetical Zeolite Structures
  • Prototyping
  • Trial group of 100 parameter sets was used to
    prototype.
  • Initial live data group was 6707 parameter
    sets.
  • Set processed by executing program 100 times
    (cycles).
  • Execution of application performed by script
    in vanilla universe. Script allowed self
    checkpoint capability and duration control.
  • Prototyping Observations
  • Early delivery rates of 7200 hours/day easily
    achieved.
  • Ultimate number of sets to process was not well
    known. First estimate of 500,000 grew to
    2,900,000 by 2007/02.
  • Eviction rates were unacceptably high (see
    Figure 1).

11

Figure 1
12
Database of Hypothetical Zeolite Structures
  • Prototyping Observations
  • Compute times per set variable from minutes to
    several hours. (see Figures 2 and 3).
  • Execution speed strongly related to compiler.
    Intel compilers were known in advance to
    produce significantly faster code.

13
Database of Hypothetical Zeolite Structures
  • Adaptation Issues
  • Limiting job duration to eliminate runaways and
    limit eviction.
  • Increasing small job duration to lower overhead
    of handling.
  • Preemption tolerance (self checkpoint).
  • Fault tolerance.
  • Many issues were initially addressed via
    execution script. Adaptation to standard
    universe was thought to be a must.

14
Database of Hypothetical Zeolite Structures
  • Workflow
  • Groups delivered via HTTP from Prof. Earls web
    site.
  • Sets per group ranged from 7000 to 30,000.
  • Results returned to Prof. Earl via drop zone
    in archival storage for post-analysis until
    approximately 10/2006. Post-analysis was
    subsequently handled at Purdue in Condor.
  • Processing at Purdue
  • Steward procedures developed to feed jobs to
    Condor, monitor progress, validate results,
    resubmit unanticipated failure cases, and
    archive results for group.
  • Stewards were designed to process group in
    batches of 2000 sets to allow processing
    within 6-8 GB of volatile storage.

15
Database of Hypothetical Zeolite Structures
  • Adapting for the Purdue Condor Pools
  • Eliminate need for execution side script to
    pave the way for standard universe execution.
  • Incorporate repetitive execution within core
    application. Address overhead of execution
    side script, multiple loads of core
    application, enable transition to standard
    universe.
  • Introduce self imposed timing controls.
    Address inability to identify runaways among
    1000s of jobs.
  • Embed reasonable self checkpoint capability.
    Address both preemption and fault tolerance.
  • Introduce ability to tune average job duration
    to Condor pool conditions. Address eviction
    rate problem.
  • Any other code work required to achieve the
    points above. Some memory management work was
    expected.

16
Database of Hypothetical Zeolite Structures
  • Notes on Condor Adaptation
  • Written adaptation plan reviewed by all
    concerned parties.
  • Adaptation work undertaken while production
    continued.
  • Modifications to existing code plus new code
    30 routines.
  • Several hundred lines of non-commentary
    written.
  • Code revisions validated periodically by
    textual comparison of result files for 100
    parameter sets from control case.
  • Adaptation period spanned compiler version
    changes.
  • Adapted code became production version 09/2006.
  • Approximately 325,000 sets completed before
    production.
  • Code adaptation mandated changes to steward
    procedures

17
Database of Hypothetical Zeolite Structures
  • After adapting to Condor?
  • Execution times became manageable (see Figures
    4 and 5).
  • Eviction rates fell to more controllable ratios
    (Figure 6).
  • Workflow became more automatic with ability to
    limit job duration (and exposure to various
    system hiccups). Recovering loss of a few
    hundred short jobs was easier than recovering
    loss of the same number of long jobs.
  • Application could run in either of standard or
    vanilla universe equally well due to duration
    control.
  • Ultimate choice was to remain in vanilla
    universe to continue using Intel V9 compiler
    suite.
  • Front end load due to steward procedures was
    reduced due to less handling of intermediate
    semaphore and lock files.

18
Figure 2
Figure 3
Figure 5
Figure 4
19
Database of Hypothetical Zeolite Structures
Figure 6
20
Database of Hypothetical Zeolite Structures
  • Project to-date
  • 1.5 million sets processed since 2006/02
    including dry spells due to delays in
    workflow, exhaustion of allocation, and
    processing of renewal.
  • 2.4 million hours officially delivered to the
    project since 2006/02 or 250 hours per hour
    excluding dry spells.
  • Most recent throughput delivered 96,000 hours
    in 226 hour time span or 400 processor hours
    per hour.
  • Entire collaboration continues exclusively via
    e-mail.
  • Approximately 1.4 million sets remain.

21
Database of Hypothetical Zeolite Structures
  • Miscellany
  • Throughout the project, emphasis was given to
    designing stewards to return results to Prof.
    Earl in the same arrangement as they were
    delivered to ease post-analysis. Revision of the
    data structure was never seriously undertaken,
    nor seen to be necessary.
  • While getting jobs into execution was initial
    primary concern, bulk of work in stewards
    ultimately centered on automating handling of
    results.
  • Various working file systems were tried during
    production. The present procedures operate
    using volatile storage for active computation,
    high capacity storage for staging, and long term
    (tape robot) storage for archival.

22
Database of Hypothetical Zeolite Structures
  • More Miscellany
  • Core of steward procedures composed of less
    than 10 scripts.
  • Many additional scripts written to gather post
    mortem data w.r.t. job cost, fault statistics,
    exhaustive result validation, and summaries.
  • DAGs were explored as a job metering tool but
    deferred due to problems not well understood
    and demands of production. Since the workflow
    didnt implicitly require DAG features, the
    production methods were retained until a solid
    reason for using DAGs could be discerned.
    Additionally, pre and post procedures were
    known to be an undertaking as demanding as
    developing the stewards.
  • Adaptation of the stewards to other batch
    systems was done more easily than expected.
    Batch systems for which prototypes were done
    included PBS, LSF, and LoadLeveler.

23
Required Plug
  • TeraGrid 07
  • Right here at UW!
  • June 4-8,2 2007
  • Full analysis of Zeolite application, plus other
    Condor work from Purdue in proceedings
  • Condor tutorial and demonstrations
  • Come join us or even help!
Write a Comment
User Comments (0)
About PowerShow.com