Condor - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Condor

Description:

Why we love Metronome. Reproducibility. Dealing with resource contention. Decouple builds from tests ... run multiple sets of tests against the same build ... – PowerPoint PPT presentation

Number of Views:96
Avg rating:3.0/5.0
Slides: 20
Provided by: Der23
Category:
Tags: condor | love | tests

less

Transcript and Presenter's Notes

Title: Condor


1
Condors use of Metronome and the NMI Lab
  • NMI Build and Test Workshop
  • Madison, WI
  • April 29, 2008

2
The Dark Ages
  • Back in the day, I used to be the build and test
    system for Condor.
  • 20 xterms always open on a separate virtual
    desktop on my workstation.
  • Every morning, Id launch a make clean make
    release in each window.
  • Test suite only run before a release.

3
The Dawn of Metronome
  • Went through a period trying a few other partial
    solutions, none of which were fully satisfactory.
  • Peter and I ended up writing whats now known as
    Metronome based on the lessons learned.
  • Important to keep the framework cleanly separate
    from the specific code for any given projects
    build or test process.

4
Why we love Metronome
  • Reproducibility
  • Dealing with resource contention
  • Decouple builds from tests
  • Flexible framework that handles our
    customizations for specific needs

5
Reproducibility
  • Explicit prereq versions are designed into the
    foundations of Metronome.
  • Scripted machine management for the hardware in
    the NMI lab itself.
  • We used to have machines with a strangely
    configured gcc install which was the only way to
    build on a platform, and no one knew how to
    reconstruct it.

6
Resource Contention problems and solutions
  • The downside of a shared lab is dealing with
    contention for build/test resources.
  • The good part of a shared lab is that theres
    more funding and support for resources, and we
    dont have to manage it all ourselves.
  • Sometimes we even face contention within the
    Condor team itself.

7
Resource Contention
  • Metronome makes it relatively trivial to throw
    more hardware at the problem when a given
    platform is a bottleneck.
  • Theres a real batch scheduling system under the
    hood that can use the resources.
  • The scripted prereq and OS deployment of the lab
    makes an identical machine image.
  • Someday, we hope to flock build jobs to wider
    Condor pools (e.g. the CS pool)

8
Decoupled Builds and Tests
  • Makes it clear where a failure lies.
  • Makes it possible to build once and run multiple
    sets of tests against the same build results.
  • Makes cross-platform testing possible (build on
    one Linux distro, see if those binaries can run
    the test suite on another distro).

9
Test classes
  • Condor makefiles and NMI test glue provide a
    mechanism to define sets of tests for different
    things.
  • E.g. a quick class for very short running
    tests, a lib class for tests of libraries, etc.
  • Can just pass an argument in to the test run to
    select which test class to use.

10
Condor Customizations
  • condor_nmi_submit
  • Condor-specific web pages to view build results
  • Unmanaged build/test machine

11
condor_nmi_submit
  • Condor build is so complicated that we cant
    manually make an nmi_submit file.
  • Platforms we support change across branches
    (major versions) of Condor.
  • Prereqs sometimes have to change, too.
  • Wrote a script that parses an input file thats
    tagged/branched with the source, and generates
    the nmi_submit file.

12
Condor Web Pages for Build and Test Results
  • The generic pages provided by Metronome are nice,
    but we want to see different views of the data.
  • Because everything is in the database, we can
    write our own code to query it and serve it up
    how we want it.

13
Overview listing
14
Per-branch listing
15
Platform summary for a specific build
16
Unmanaged hardware
  • Sometimes Condors compatibility with a given
    Linux distro is broken via a security patch
    that breaks library APIs or contains other
    far-reaching changes.
  • Auto updating breaks reproducibility.
  • Special unmanaged box configured to do it
    anyway, so we can still test.

17
Gripes
  • Debugging can sometimes be a challenge (as much
    our problem as Metronome). E.g. when a test
    fails
  • Test written non-deterministically?
  • Actual Condor bug were testing for?
  • Temporary NMI lab problem?
  • Condor bug under the hood in the lab?
  • What does error -1003 mean?

18
More Gripes
  • Resource contention (the NMI pools arteries are
    clogged with Bacon).
  • No real API to get the data (yet). We just have
    to write our own SQL directly.
  • Build or test jobs (from other projects, we never
    do this ourseves) that hang.

19
Thanks
  • Questions?
  • Comments?
Write a Comment
User Comments (0)
About PowerShow.com