Trillium, Tier2 Centers and Grid3 - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Trillium, Tier2 Centers and Grid3

Description:

Particle Physics Data Grid: $12M (DOE) (1999 2004 ) GriPhyN: ... Patching. GPT src. bundles. NMI. Build & Test. Condor pool (37 computers) Build. Test. Package ... – PowerPoint PPT presentation

Number of Views:76
Avg rating:3.0/5.0
Slides: 34
Provided by: paula92
Category:

less

Transcript and Presenter's Notes

Title: Trillium, Tier2 Centers and Grid3


1
  • Trillium, Tier2 Centers and Grid3

NSF Tier2 Meeting Arlington, VAJuly 9, 2004
Paul Avery University of Florida avery_at_phys.ufl.ed
u
2
U.S. Trillium Grid Partnership
  • Trillium PPDG GriPhyN iVDGL
  • Particle Physics Data Grid 12M (DOE) (1999
    2004)
  • GriPhyN 12M (NSF) (2000 2005)
  • iVDGL 14M (NSF) (2001 2006)
  • Basic composition (150 people)
  • PPDG 4 universities, 6 labs
  • GriPhyN 12 universities, SDSC, 3 labs
  • iVDGL 18 universities, SDSC, 4 labs, foreign
    partners
  • Expts BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO,
    SDSS/NVO
  • Complementarity of projects
  • GriPhyN CS research, Virtual Data Toolkit (VDT)
    development
  • PPDG End to end Grid services, monitoring,
    analysis
  • iVDGL Grid laboratory deployment using VDT
  • Experiments provide frontier challenges
  • Unified entity when collaborating internationally

3
Trillium Science Drivers
  • ATLAS CMS experiments _at_ CERN LHC
  • 100s of Petabytes 2007 - ?
  • High Energy Nuclear Physics expts
  • 1 Petabyte (1000 TB) 1997 present
  • LIGO (gravity wave search)
  • 100s of Terabytes 2002 present
  • Sloan Digital Sky Survey
  • 10s of Terabytes 2001 present
  • Future Grid resources
  • Massive CPU (PetaOps)
  • Large distributed datasets (gt100PB)
  • Global communities (1000s)

4
Goal Peta-scale Virtual-Data Gridsfor Global
Science
Production Team
Single Researcher
Workgroups
Interactive User Tools
GriPhyN
GriPhyN
GriPhyN
Request Execution Management Tools
Request Planning Scheduling Tools
Virtual Data Tools
ResourceManagementServices
Security andPolicyServices
Other GridServices
  • PetaOps
  • Petabytes
  • Performance

GriPhyN
Distributed resources(code, storage,
CPUs,networks)
Raw datasource
5
LHC Petascale Global Science
  • Complexity Millions of individual detector
    channels
  • Scale PetaOps (CPU), 100s of Petabytes (Data)
  • Distribution Global distribution of people
    resources

BaBar/D0 Example - 2004 700 Physicists 100
Institutes 35 Countries
CMS Example- 2007 5000 Physicists 250
Institutes 60 Countries
6
Global LHC Data Grid Hierarchy
10s of Petabytes/yr by 2007-81000 Petabytes in
lt 10 yrs?
CMS Experiment
Online System
0.1 - 1.5 GBytes/s
CERN Computer Center
Tier 0
10-40 Gb/s
Tier 1
2.5-10 Gb/s
Tier 2
1-2.5 Gb/s
Tier 3
Physics caches
Tier 4
PCs
7
Tier2 Centers
  • Tier2 facility
  • 20 40 of Tier1?
  • 1 FTE support commodity CPU disk, no
    hierarchical storage
  • Essential university role in extended computing
    infrastructure
  • Validated by 3 years of experience with
    proto-Tier2 sites
  • Functions
  • Physics analysis
  • Simulation
  • Experiment software
  • Support smaller institutions
  • Official role in Grid hierarchy (U.S.)
  • Sanctioned by MOU (ATLAS, CMS, LIGO)
  • Local P.I. with reporting responsibilities
  • Selection by collaboration via careful process

8
Analysis by Globally Distributed Teams
  • Non-hierarchical Chaotic analyses productions
  • Superimpose significant random data flows

9
International Virtual Data Grid Laboratory
SKC
Boston U
Buffalo
UW Milwaukee
Michigan
UW Madison
BNL
Fermilab
LBL
Argonne
PSU
Iowa
Chicago
J. Hopkins
Indiana
Hampton
Caltech
ISI
Vanderbilt
UCSD
  • Partners
  • EU
  • Brazil
  • Korea

UF
Austin
FIU
Brownsville
10
Roles of iVDGL Institutions
  • U Florida CMS (Tier2), Management
  • Caltech CMS (Tier2), LIGO (Management)
  • UC San Diego CMS (Tier2), CS
  • Indiana U ATLAS (Tier2), iGOC (operations)
  • Boston U ATLAS (Tier2)
  • Harvard ATLAS (Management)
  • Wisconsin, Milwaukee LIGO (Tier2)
  • Penn State LIGO (Tier2)
  • Johns Hopkins SDSS (Tier2), NVO
  • Chicago CS, Coord./Management, ATLAS (Tier2)
  • Vanderbilt BTEV (Tier2, unfunded)
  • Southern California CS
  • Wisconsin, Madison CS
  • Texas, Austin CS
  • Salish Kootenai LIGO (Outreach, Tier3)
  • Hampton U ATLAS (Outreach, Tier3)
  • Texas, Brownsville LIGO (Outreach, Tier3)
  • Fermilab CMS (Tier1), SDSS, NVO
  • Brookhaven ATLAS (Tier1)

11
iVDGL Goals
  • Deploy and operate a Grid laboratory
  • Support research mission of data intensive
    experiments
  • Provide computing people at university
    proto-Tier2 sites
  • Operate Grid laboratory for CS technology
    development
  • Prototype and deploy a Grid Operations Center
    (iGOC)
  • Integrate Grid software tools
  • Into computing infrastructures of the experiments
  • Support delivery of Grid technologies
  • Harden the Virtual Data Toolkit (VDT) and
    middleware technologies developed by GriPhyN and
    other Grid projects
  • Education and Outreach
  • Provide tools and mechanisms for underrepresented
    groups and remote regions to participate in
    international science projects
  • Collaborate on joint projects with other EO
    efforts

12
Trillium Grid Tools Virtual Data Toolkit
Use NMI processes later
NMI
VDT
Test
Sources (CVS)
Build
Binaries
Build Test Condor pool (37 computers)
Pacman cache
Package
Patching

RPMs
Build
Binaries
GPT src bundles
Build
Binaries
Test
Contributors (VDS, etc.)
13
Trillium Collaborative RelationshipsInternal and
External
Partner Physics projects Partner Outreach projects
Requirements
Prototyping experiments
Production Deployment
  • Other linkages
  • Work force
  • CS researchers
  • Industry

Computer Science Research
Virtual Data Toolkit
Larger Science Community
Techniques software
Tech Transfer
Globus, Condor, NMI, iVDGL, PPDG, EU DataGrid,
LHC Experiments, QuarkNet, CHEPREO
U.S.Grids
Intl
Outreach
14
  • Grid2003 An Operational National Grid
  • 28 sites Universities national labs
  • 2800 CPUs, 4001300 jobs
  • Running since October 2003
  • Applications in HEP, LIGO, SDSS, Genomics

Korea
http//www.ivdgl.org/grid2003
15
Grid2003 Three Months Usage
16
Production Simulations on Grid2003
US-CMS Monte Carlo Simulation
Used 1.5 ? US-CMS resources
Non-USCMS
USCMS
17
Education and Outreach
18
Grids and the Digital DivideRio de Janeiro, Feb.
16-20, 2004
NEWS Bulletin ONE TWOWELCOME BULLETIN
General InformationRegistrationTravel
Information Hotel Registration Participant List
How to Get UERJ/Hotel Computer Accounts Useful
Phone Numbers Program Contact us Secretariat
Chairmen
  • Background
  • World Summit on Information Society
  • HEP Standing Committee on Inter-regional
    Connectivity (SCIC)
  • Themes
  • Global collaborations, Grids and addressing the
    Digital Divide
  • Next meeting 2005 (Korea)

http//www.uerj.br/lishep2004
19
iVDGL, GriPhyN Education / Outreach
  • Basics
  • 200K/yr
  • Led by UT Brownsville
  • Workshops, portals
  • Partnerships with CHEPREO, QuarkNet,

20
36 students!
21
CHEPREO Center for High Energy Physics Research
and Educational OutreachFlorida International
University
  • Physics Learning Center
  • CMS Research
  • iVDGL Grid Activities
  • AMPATH network (S. America)

Funded September 2003 4M initially (3 years)
22
UUEO A New Initiative
  • Meeting April 8 in Washington DC
  • Brought together 40 outreach leaders (including
    NSF)
  • Proposed Grid-based framework for common E/O
    effort

23
Extra Slides
24
OutreachQuarkNet-Trillium Virtual Data Portal
  • More than a web site
  • Organize datasets
  • Perform simple computations
  • Create new computations analyses
  • View share results
  • Annotate enquire (metadata)
  • Communicate and collaborate
  • Easy to use, ubiquitous,
  • No tools to install
  • Open to the community
  • Grow extend

Initial prototype implemented by graduate student
Yong Zhao and M. Wilde (U. of Chicago)
25
Large Hadron Collider (LHC) _at_ CERN
  • 27 km Tunnel in Switzerland France

TOTEM
CMS
ALICE
LHCb
Search for Origin of Mass Supersymmetry (2007
?)
ATLAS
26
GriPhyN Achievements
  • Virtual Data paradigm to express science
    processes
  • Unified language (VDL) to express general data
    transformation
  • Advanced planners, executors, monitors,
    predictors, fault recovery? to make the Grid
    like a workstation
  • Virtual Data Toolkit (VDT)
  • Tremendously simplified installation
    configuration of Grids
  • Close partnership with and adoption by multiple
    sciencesATLAS, CMS, LIGO, SDSS, Bioinformatics,
    EU Projects
  • Broad education outreach program (UT
    Brownsville)
  • 25 graduate, 2 undergraduate 3 CS PhDs by end of
    2004
  • Virtual Data for QuarkNet Cosmic Ray project
  • Grid Summer School 2004, 3 MSIs participating

27
Analysis by Globally Distributed Teams
  • Non-hierarchical Chaotic analyses productions
  • Superimpose significant random data flows

28
Virtual Data Toolkit Tools in VDT 1.1.12
  • Globus Alliance
  • Grid Security Infrastructure (GSI)
  • Job submission (GRAM)
  • Information service (MDS)
  • Data transfer (GridFTP)
  • Replica Location (RLS)
  • Condor Group
  • Condor/Condor-G
  • DAGMan
  • Fault Tolerant Shell
  • ClassAds
  • EDG LCG
  • Make Gridmap
  • Cert. Revocation List Updater
  • Glue Schema/Info provider
  • ISI UC
  • Chimera related tools
  • Pegasus
  • NCSA
  • MyProxy
  • GSI OpenSSH
  • LBL
  • PyGlobus
  • Netlogger
  • Caltech
  • MonaLisa
  • VDT
  • VDT System Profiler
  • Configuration software
  • Others
  • KX509 (U. Mich.)

29
VDT Growth (1.1.14 Currently)
VDT 1.1.8 First real use by LCG
VDT 1.1.14 May 10
VDT 1.1.11 Grid2003
VDT 1.0 Globus 2.0b Condor 6.3.1
VDT 1.1.7 Switch to Globus 2.2
VDT 1.1.3, 1.1.4 1.1.5 pre-SC 2002
30
Grid2003 Broad Lessons
  • Careful planning and coordination essential to
    build Grids
  • Community investment of time/resources
  • Operations team needed to operate Grid as a
    facility
  • Tools, services, procedures, documentation,
    organization
  • Security, account management, multiple
    organizations
  • Strategies needed to cope with increasingly large
    scale
  • Interesting failure modes as scale increases
  • Delegation of responsibilities to conserve human
    resources
  • Project, Virtual Org., Grid service, site,
    application
  • Better services, documentation, packaging
  • Grid2003 experience critical for building
    useful Grids
  • Frank discussion in Grid2003 Project Lessons doc

31
Grid2003 ? Open Science Grid
  • Build on Grid2003 experience
  • Persistent, production-quality Grid, national
    international scope
  • Ensure U.S. leading role in international science
  • Grid infrastructure for large-scale collaborative
    scientific research
  • Create large computing infrastructure
  • Combine resources at DOE labs and universities to
    effectively become a single national computing
    infrastructure for science
  • Provide opportunities for educators and students
  • Participate in building and exploiting this grid
    infrastructure
  • Develop and train scientific and technical
    workforce
  • Transform the integration of education and
    research at all levels

http//www.opensciencegrid.org
32
Grid References
  • Grid2003
  • www.ivdgl.org/grid2003
  • Globus
  • www.globus.org
  • PPDG
  • www.ppdg.net
  • GriPhyN
  • www.griphyn.org
  • iVDGL
  • www.ivdgl.org
  • LCG
  • www.cern.ch/lcg
  • EU DataGrid
  • www.eu-datagrid.org
  • EGEE
  • egee-ei.web.cern.ch

2nd Edition www.mkp.com/grid2
33
2004 Grid Summer School
  • First of its kind in the U.S.
  • (EU had one in Summer 2003)
  • Marks new direction for Trillium
  • First attempt to systematically teach Grid
    technologies
  • First attempt to gather relevant materials in one
    place
  • Today Students in CS and Physics
  • Later Students, postdocs, junior senior
    scientists
  • Reaching a wider audience
  • Put materials on the web for direct access
  • Build online Grid courses (www.cnx.rice.edu)
  • Create Grid book (online print) with Georgia
    Tech
  • New funding opportunities
  • NSF new large-scale training education programs
Write a Comment
User Comments (0)
About PowerShow.com