Challenges in a Production Grid - PowerPoint PPT Presentation

About This Presentation
Title:

Challenges in a Production Grid

Description:

Work with interested ITB site admins to deploy new release. Deploy new versions of grid services ... Middleware interoperability testing between OSG Releases ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 36
Provided by: ALAI51
Category:

less

Transcript and Presenter's Notes

Title: Challenges in a Production Grid


1
Challenges in a Production Grid
  • Leigh Grundhoefer, Operations Coordinator
  • Alain Roy, Software Coordinator

2
What is Open Science Grid?
  • It is a US grid computing infrastructure that
    supports scientific computing via an open
    collaboration of science researchers, software
    developers and computing, storage and network
    providers.
  • The OSG Consortium builds and operates the OSG,
    bringing resources and researchers from
    universities and national laboratories together
    and cooperating with other national and
    international infrastructures to give scientists
    from many fields access to shared resources
    worldwide.

3
Current OSG deployment
  • 96 Resources

27 Virtual Organizations
4
Running a Production Grid
5
Running a Production Grid
6
Alain?
  • Can we have a couple of statistics on how many
    CPU hours have been consumed by OSG jobs?
  • Too much?
  • How bout now? to call out one VO may another
    upset another VO.
  • Note the diff in cumulative CPU, I think that is
    local usage..?

7
The OSG Function Set
  • The goal of an OSG Function Set is to provide a
    well-defined set of services and functionality to
    which OSG applications can be built.
  • We are introducing the term Function Set
    (previously described as Release) to help
    clarify the distinction between the functional
    specification and the software release of
    reference implementations of that specification

8
Challenges
  • A good way to look at challenges is in the
    context of the entire process
  • First challenge What is our process for moving
    from ideas to deployment
  • Well look at other challenges in this context

9
Release Process (Subway Map)
Gather requirements

Time
Build software
Test
Validation test bed
VDT Release
ITB Release Candidate
Integration test bed
OSG Release
10
Gather Requirements
  • Applications Coordinators meet with each Virtual
    Organization to define issues with current
    Release and requests for additional services.
  • The Software Coordinator evaluates each request.

Gather Requirements
11
Gather Requirements
Example Gathering
Gather Requirements
12
Challenge What Software?
  • How do we decide what software is in a release?
  • Everyone has their own idea about what should be
    in
  • We dont have time to put in and support
    everything
  • General policies
  • Is it needed by more than one user community?
  • Can we legally distribute the software?
  • Do we think its stable and secure?

Gather Requirements
13
What is the VDT?
  • A collection of software
  • Grid software (Condor, Globus and lots more)
  • Virtual Data System (Origin of the name VDT)
  • Utilities
  • An easy installation
  • Goal Push a button, everything just works
  • Two methods
  • Pacman installs and configures it all
  • RPM installs some of the software, no
    configuration
  • A support infrastructure

Build Software
14
Why have the VDT?
  • Everyone could download the software from the
    providers
  • But the VDT
  • Figures out dependencies between software
  • Works with providers for bug fixes
  • Provides automatic configuration
  • Packages it
  • Tests everything on thirteen platforms (and
    growing)

Build Software
15
How much software?
Build Software
16
Build Software
  • The VDT distributes binaries
  • And writes configuration scripts
  • And brings along dependencies (build and
    configure)
  • We build on fewer platforms than we support
    (binary re-use works)
  • Building and configuration for installation of
    software is often hard
  • VDT does all the steps for, make config, make,
    make install plus more

Build Software
17
Builds in NMI Build Test
  • We use NMI build and test
  • Condor-based system for managing of builds and
    tests
  • Solves problems
  • How do I manage a build for multiple platforms?
  • How do a I replicate a build I did a long time
    ago?
  • All binaries in VDT are built this way

Build Software
18
Using NMI
NMI
RPMs
Build Test Condor pool (100 computers)
Test
Sources (CVS)
Users

Pacman Cache
Package
Patching
Build
Binaries
Build
Binaries
Test
Contributors
Build Software
19
Challenge Building VOMS
  • VOMS can authorize people in a VO
  • VOMS has a web interface
  • We
  • Install Tomcat
  • Install Apache
  • Built with Globus SSL
  • Patched so GSI pass-through to Apache works
  • Install VOMS
  • Install VOMS Admin
  • Install Perl modules needed by VOMS Admin
  • Install MySQL and set up database (with
    command-line tool)
  • Configure all software
  • Configure rotation of log files

Build Software
20
Some VDT Challenges
  • How should we smoothly update a production
    service?
  • In-place vs. on-the-side
  • Preserve old configuration while making big
    changes.
  • As easy as we try to make it, it still takes
    hours to fully install and set up from scratch
  • How do we support more platforms?
  • Its a struggle to keep up with the onslaught of
    Linux distributions
  • AIX? Mac OS X? Solaris?

Fedora Core 6
RHEL 5
Gentoo
Debian
Fedora Core 5
BCCD
Fedora Core 4
RHEL 3
Fedora Core 3
RHEL 4
Build Software
21
Testing the VDT
  • At least one week of testing on VDT testbed
    before release
  • Run nightly tests on each supported platform
    until they are clean

Test
22
Testing
  • We have a local testbed
  • One of each supported computer
  • Nightly Tests
  • Run tests on a single computer only (not a real
    grid)
  • Test current version and in-development version
  • Its surprising how easy it is to break current
    release
  • Cover as much as we can
  • Daily Results
  • Email at 800am every morning to greet us
  • Web pages to show results

Test
23
VDT Testing Challenges
  • How do we get better coverage?
  • How do we simulate a grid?
  • Hint See validation testbed next
  • How do we understand results from 13 platforms
    running hundreds of tests a night?

Test
24
Validation Test Bed Roles
  • A middleware systems-level specification and
    implied architecture of an OSG grid release.
  • Rapid feedback and spark testing of pre-VDT
    releases as deployed in grid systems.
  • First-pass functional testing of integrated
    services based on VDT software.
  • Provision/adoption of systems-level testing and
    validation frameworks
  • Validation tests, with focus on functional
    testing.
  • An ITB release candidate for deployment on the
    full ITB for broad, pre-deployment testing and VO
    validation and sign-off.
  • A physical infrastructure to support all of the
    above

Validation test bed
25
Validation Test Bed
  • Composed of dedicated testing sites which are
    located at
  • University of Chicago
  • NERSC
  • Caltech.
  • The OSG Grid Operations Center provides service
    validations and monitoring services

Validation test bed
26
Challenges
  • Gaining and maintaining effort by experts in
    production deployment
  • Development of a complete set of validation and
    service function tests
  • Effort spent here equals 100x savings from
    finding correctable problems after release.

Validation test bed
27
VDT Release
  • After the VDT is released anyone is free to
    install it
  • OSG will proceed to package it for release on the
    ITB

VDT Release
28
Integration Testbed Release
  • Develop configuration scripts for OSG environment
  • Functional Elements created from VDT Release
  • Compute Element
  • Globus/VDS/MonaLisa/CEMon/etc
  • Worker Client
  • Client - for Submit Hosts
  • VOMS
  • GUMS
  • Squid


These are subsets of the VDT, tailored to OSG
ITB Release
29
ITB Deployment Activities
  • Work with interested ITB site admins to deploy
    new release
  • Deploy new versions of grid services
  • GridCat, MonaLisa etc.
  • Middleware interoperability testing between OSG
    Releases
  • Test with focus on expected functionality and
    scalability
  • Application validation is done by each VO
  • Develop Installation Guides and OSG Release
    Documentation

Integration Testbed
30
ITB
  • From 10 - 30 sites participate in the ITB
    activity
  • Validation is done by the Application
    administrators for each VO after sites have
    passed all functional service validations
  • The community-based Twiki and the mailing list
    provides the mechanisms for questions and problem
    resolution.
  • Weekly meeting held to direct the activities and
    work toward release deadline.

Integration Testbed
31
ITB Validations
  • GridEX - application validation ( pilot job
    submissions )
  • Site Policy template and publication
  • Generic Information Provider Validation
  • Monitoring validation MonALisa Client status
    (VO Jobs I/O)
  • GridCat

Integration Testbed
32
OSG Release
  • After testing and validation, a production
    release is made.
  • Sites are free to upgrade when ready not in
    lock-step
  • We support two most recent releases

OSG Release
33
Challenge Heterogeneity
  • How do we support a grid where
  • Different software versions are deployed?
  • People are free to install functionality
    equivalents?
  • It is essential that we do so!
  • Sites simply cannot be expected to do
    simultaneous upgrades
  • A site may be doing local scientific production
    runs
  • A site may have other priorities (not everything
    is the grid)

OSG Release
34
Overall Challenge Security
  • Security (and other need now updates) cross the
    entire process
  • Its the entire process, sped up
  • Gather requirements We need a security update
    NOW
  • Build in VDT on all platforms
  • Build Test and Validate
  • Deploy
  • Its essential to be able to
  • Do builds quickly and replicate old builds (NMI)
  • Have mechanism for easy updates and rollback in
    case of failure

35
Questions?
  • Open Science Grid
  • http//www.opensciencegrid.org
  • VDT Support
  • vdt-support_at_ivdgl.org
  • http//vdt.cs.wisc.edu
  • Alain Roy
  • roy_at_cs.wisc.edu
  • Leigh Grundhoefer
  • leighg_at_indiana.edu
Write a Comment
User Comments (0)
About PowerShow.com