Title: The Open Science Grid Experience A Partner in the Global Distributed CyberInfrastructure that suppor
1The Open Science Grid Experience A Partner
in the Global Distributed Cyber-Infrastructure
that supports International Computational
ScienceRuth Pordes, Fermilab
Supported by the Department of Energy Office of
Science SciDAC-2 program from the High Energy
Physics, Nuclear Physics and Advanced Software
and Computing Research programs, and the
National Science Foundation Math and Physical
Sciences, Office of CyberInfrastructure and
Office of International Science and Engineering
Directorates.
2Who I am?
- Background
- Associate Head of the Fermilab Computing
Division
- US CMS Grid Services Coordinator
- 20 years working on data acquisition and offline
systems at Fermilab, leading projects to develop
common solutions.
- PPDG coordinator and PI, member of the iVDGL
management team.
- Coined Trillium for grass-roots DOE-NSF project
collaboration.
- OSG Executive Director
- Elected by the Council.
- Day-to-Day responsibility for all aspects of the
Project.
- Communicate with the Funding Agencies.
- Respond to the needs of the Consortium and
Council.
- Attend to external communications and
relationships.
- Make sure taking small concrete steps that get
followed up.
3OSGs Family History
(International) Relationships essential!
LIGO operation
LIGO preparation
LHC construction, preparation
LHC Ops
Increasing non-physics science
iVDGL
(NSF)
OSG
Trillium
Grid3
GriPhyN
(NSF)
(DOENSF)
PPDG
(DOE)
DOE Science Grid
(DOE)
2000
2001
2002
2005
2003
2004
2006
2007
2008
2009
2010
European Grid Worldwide LHC Computing Grid
Campus, regional grids
4OSG isdiverse, heterogeneous, includes very
large and very small organizations and is
geographically distributed
- 17 Scientific Communities Particle physics,
Laser Interferometer Gravitational Wave
Observatory (LIGO), Nuclear physics, molecular
dynamics (JHU-CHARMM), mathematical optimizations
(football pool problem), NanoHub, diverse
application groups at RENCI - 70 Resource (processing and storage) providers
DOE national labs, university facilities,
department clusters. 3 sites in Brazil 1 site in
Mexico - cinvestav - 10 Software Providers Condor, Globus, gLITE,
DESY/Fermilab storage management, Ultralight,
Internet2, ESNET, SciDAC-2 CEDPS, Fermilab
accounting,. - 8 Partners - peer organizations EGEE, TeraGrid,
GridUNESP, NorduGrid, TWGrid, APAC, NYSGrid
5OSG characteristics
- 10,000 CPU-days per day in processing(some days
15,000 CPU-days per day).
- 75 are physics
- 100,000 jobs running per day
- Provides 10 Terabytes per day in data transport
- CPU usage averages about 75
- Starting to offer support for MPI
- Especially for non-physics applications
- Support for opportunistic use
- Currently about 10 of resources used by groups
that dont own them.
6How we actually get something done! Results of
D0 Reprocessing
- OSG responded to an unexpected request for a
significant amount of resources- typical of
discovery process.
- Through team work
- OSG Members providing processing cycles,
- OSG and D0 worked together to troubleshoot
problems
- Daily attention to availability and throughput
- Reprocessing was finished in time for summer
conferences
- 12 OSG sites contributed up to 1000 jobs a day, 2
M CPU hours
- 286 million events, 286M Jobs on OSG
- 48 TB Input data, 22 TB Output data transferred
D0 OSG CPU Hours / Week
D0 Event Throughput
7The Three Building Blocks
- The OSG organization, management and operation is
structured around three components
- the Consortium
- the Project
- the Facility
8OSG Consortium Partners
Academia Sinica Argonne National Laboratory (ANL)
Boston University Brookhaven National Laborator
y (BNL) California Institute of Technology Cente
r for Advanced Computing Research
Center for Computation Technology at Louisiana
State University Center for Computational Researc
h, The State University of New York at Buffalo
Center for High Performance Computing at the
University of New Mexico Columbia University Com
putation Institute at the University of Chicago
Cornell University DZero Collaboration Dartmouth
College Fermi National Accelerator Laboratory (F
NAL) Florida International University Georgetown
University Hampton University Indiana Universit
y Indiana University-Purdue University, Indianapo
lis International Virtual Data Grid Laboratory (i
VDGL)
Thomas Jefferson National Accelerator Facility
University of Arkansas Universidade de São Paulo
Universideade do Estado do Rio de
Janerio University of Birmingham University of C
alifornia, San Diego University of Chicago Unive
rsity of Florida University of Illinois at Chicag
o University of Iowa University of Michigan Uni
versity of Nebraska - Lincoln University of New M
exico University of North Carolina/Renaissance Co
mputing Institute University of Northern Iowa Un
iversity of Oklahoma University of South Florida
University of Texas at Arlington University of
Virginia University of Wisconsin-Madison Univers
ity of Wisconsin-Milwaukee Center for Gravitation
and Cosmology Vanderbilt University Wayne State
University
Kyungpook National University Laser Interferomete
r Gravitational Wave Observatory (LIGO)
Lawrence Berkeley National Laboratory (LBL)
Lehigh University Massachusetts Institute of Tech
nology National Energy Research Scientific Comput
ing Center (NERSC) National Taiwan University Ne
w York University Northwest Indiana Computational
Grid Notre Dame University Pennsylvania State U
niversity Purdue University Rice University Roc
hester Institute of Technology
Sloan Digital Sky Survey (SDSS)
Southern Methodist University Stanford Linear Acc
elerator Center (SLAC) State University of New Yo
rk at Albany State University of New York at Bing
hamton State University of New York at Buffalo S
yracuse University T2 HEPGrid Brazil Texas Advan
ced Computing Center Texas Tech University
9OSG Project Resources
- 30M over 5 years, jointly funded by DOE and NSF.
- Responsible to the Consortium for specific
activities through funding of 33 FTEs of effort.
- The Project staff are distributed across 17
institutions.
10Organization
-
- The OSG Facility has six groups
- Engagement Identify and support new groups
- Integration Transition the OSG software stack
to deployment
- Operation Monitor activities and support VOs
and sites
- Security define, implement and monitor the
security plan of the OSG
- Software evolve, package and support the VDT
- Troubleshooting work with sites and VOs to
resolve problems in end-to-end functionality
- Education and Training
- Extensions It takes real effort to integrate and
deploy new capabilities and technologies in the
end-to-end system.
- Scalability and system tests.
- Add to software as delivered.
- Work with the users on their upcoming needs.
11Networks are not enough!
- The end-to-end throughput and usability
essential.
- Middleware and distributed infrastructures are
the middle-tier.
- Need joint work across the interfaces.
12An International Science Community
Common Goals, Shared Data, Collaborative work
National Infrastructure boundaries in policies,
funding, culture,
physical components
National Infrastructure
National Infrastructure
13OSG Community StructureVirtual Organizations
(VOs)
- The OSG community shares and trades in groups
(VOs) not individuals.
- VO management services allow registration,
administration and control of members within
VOs.
- Facilities trust and authorize VOs.
- Compute and storage services prioritize according
to VO group membership
14OSGs Eco-System
National
Harmonized into a well integrated whole.
Campus
Community
15Working across the Boundaries
- Global communities of users on the OSG
- Require OSG attention to working together with
Europe, Asia, the rest of the Americas, Africa
- International needs immersed in all OSG
activities
- Security
- Software
- Operations
- Monitoring/measurement
- Data
- Computation
- OSG acts to Bridge independent, autonomous
infrastructures, not one size fits all
16Examples of work areas between EGEE - OSG
17 different work areas with activities, plans,
meetings, organization, communication, consensus
building and collaboration needs.
- OSG Blueprint - EGEE gLite Design.
- Middleware Security Working Group.
- Joint Security Policy Group.
- Joint operations meetings and workshops.
- Virtual Data Toolkit common software base release
and support.
- Software build and test infrastructure ETICS -
Metronome.
- Automated problem ticket exchange and cooperating
support processes.
- Resource Service Validation (RSV) - Site
Availability monitoring (SAM).
- Common Storage Resource Management (SRM)
interface and API.
- Consistent Resource Information Publishing -
Glue-Schema.
- International Grid Trust Federation (identity
trust).
- Joint monitoring group - consistent monitoring
information.
- Accounting information transfer.
- International training and education school.
- Grid Interoperability Now/Open Grid Forum
multi-infrastructure contributions.
- Contributions to World Wide LHC Computing Grid.
- Planning for support of new cross-grid
communities e.g. ITER, Dark Energy Survey, LIGO
II -VIRGO.
17Campus and Regional Grids
- Are a fundamental building block of the OSG
- The multi-institutional, multi-disciplinary
nature of the OSG is a macrocosm of many campus
IT cyberinfrastructure coordination issues.
- Currently OSG has four operational campus grids
on board
- FermiGrid, Purdue, Clemson, Wisconsin
- Working to add Harvard, Lehigh
- Currently OSG interfaces to regional grid
infrastructures
- GridUNESP, NYWGRid, TIGRE,
- Elevation of jobs from Campus CI to OSG is
transparent
- Campus and regional scale brings value through
- Efficiencies in common software stack with common
interfaces.
- Higher common denominator makes sharing easier.
- Greater collective buying power with vendors.
- Synergy through common goals and achievements.
18Training and Education
- Training program consists of 2-3 day schools of
lectures and hands on training
- For administrators, users and software
developers. Education Coordinator follows up with
alumni to help them use their knowledge.
- Sponsor some of students.
- Have contributed to schools in Columbia, Brazil,
Sweden and soon South Africa
- Engagement program works and helps new
Communities (User and Campus) directly for
several months each.
- Campus Cyberinfrastructure (CI) Days
- (together with Educause, Internet2, TeraGrid)
- helps planning and communication across CIO,
Faculty, IT organizations.
- What about some sustained international ties??
19The OSG Virtual Data Toolkit
- A collection of software - the VDT doesnt write
software, but gets it from providers
- Condor, Globus, EGEE Components, Community
Software (e.g. Fermilab accounting), open source
utilities (Apache, MySQL etc).
- An easy installation.
- A support infrastructure.
- VDT provides a middleware foundation for the
software stack of several production Grids
including EGEE in Europe and TeraGrid components
in the US. - The VDT
- Figures out dependencies between software
- Works with providers on bug fixes feature
enhancements
- Facilitates interoperability
- Offers automatic configuration
- Packages it
- Tests everything on 15 platforms (and growing)
20OSG Middleware Is Driven by the User Groups
Domain science requirements.
OSG stakeholders and middleware developer (joint)
projects.
Condor, Globus, Privilege, EGEE,
Test on VO specific grid
Integrate into VDT Release. Deploy on OSG integr
ation grid
Interoperability testing
Provision in OSG release deploy on OSG sites.
21Other areas of International Collaboration OSG
has started with
- FY07 Grant from NSF Office of International
Science and Engineering for student stipends for
international grid school in Scandinavia (was
constrained to Scandinavia) was a recognised
success. - Plan to submit further such grants in early
2008.
- At NSF IRNC workshop this week (and emerging NSB
report)
- Recommendation 8
- The National Science Foundation should better
publicize its practice of encouraging PIs to
request modest supplemental funding through their
research grants for foreign collaborators from
developing countries.
22OSG and Networks?
- OSG scale and performance driven by the science
communities that use it.
- OSG is the US infrastructure contributing to the
World Wide LHC Computing Grid. ie. The scale and
performance are initially driven by the LHC
experiments - notably ATLAS and CMS. - OSG depends on the networks being there and
working well!
- Internet2 and ESNET are members of the OSG
Executive Board.
- Evaluating which network tools to include in our
common s/w stack.
23Can OSG Contribute to network activities?
- Clear that effective network use is an end-to-end
problem.
- OSG provides an infrastructure supporting
multiple-end-to-end communities and
applications.
- OSG continues to develop a well-oiled process for
coordinating the introduction of new middleware
that includes (high throughput) data transport to
a large number of sites and applications. - And OSG has an Integration Grid of 15 sites
which completely mimics the production
infrastructure and where applications test new
services before they are put into production. - There is potential for OSG to provide a very
useful end-to-end testbed for multiple science
applications in acceptance testing of new
networks and network protocols.
24Whats needed
- Means to contribute in cross-national joint
initiatives and work common technical projects
for all aspects of distributed infrastructures,
in ways that are - Inclusive, open, responsive, flexible,
innovative
- International agreements and trust at all levels
enabling global science and research.
- Defend against inevitable security incidents
being too destructive.
- Defend against isolation and divergence.
- Recognised commitment to sustaining contributions
already being made.
- Infrastructures, software, support, engagement
- Accurate recording and open communication of
benefits and costs.
- Support for international face-to-face work
exchanges balanced with support for
easy-to-use/complete collaboration tools.
25(No Transcript)