Integrating Universities and Laboratories In National Cyberinfrastructure - PowerPoint PPT Presentation

About This Presentation
Title:

Integrating Universities and Laboratories In National Cyberinfrastructure

Description:

Title: Integrating Universities and Laboratories in Cyberinrfastructure Author: Paul Avery Description: PASI Lecture, Mendoza, Argentina, May 17, 2005 – PowerPoint PPT presentation

Number of Views:453
Avg rating:3.0/5.0
Slides: 82
Provided by: PaulAv158
Learn more at: https://ciara.fiu.edu
Category:

less

Transcript and Presenter's Notes

Title: Integrating Universities and Laboratories In National Cyberinfrastructure


1
  • Integrating Universities and Laboratories In
    National Cyberinfrastructure

Paul Avery University of Florida avery_at_phys.ufl.ed
u
PASI LectureMendoza, ArgentinaMay 17, 2005
2
Outline of Talk
  • Cyberinfrastructure and Grids
  • Data intensive disciplines and Data Grids
  • The Trillium Grid collaboration
  • GriPhyN, iVDGL, PPDG
  • The LHC and its computing challenges
  • Grid3 and the Open Science Grid
  • A bit on networks
  • Education and Outreach
  • Challenges for the future
  • Summary

Presented from a physicists perspective!
3
Cyberinfrastructure (cont)
  • Software programs, services, instruments, data,
    information, knowledge, applicable to specific
    projects, disciplines, and communities.
  • Cyberinfrastructure layer of enabling hardware,
    algorithms, software, communications,
    institutions, and personnel. A platform that
    empowers researchers to innovate and eventually
    revolutionize what they do, how they do it, and
    who participates.
  • Base technologies Computation, storage, and
    communication components that continue to advance
    in raw capacity at exponential rates.

Paraphrased from NSF Blue Ribbon Panel report,
2003
Challenge Creating and operating advanced
cyberinfrastructure andintegrating it in science
and engineering applications.
4
Cyberinfrastructure and Grids
  • Grid Geographically distributed computing
    resources configured for coordinated use
  • Fabric Physical resources networks provide
    raw capability
  • Ownership Resources controlled by owners and
    shared w/ others
  • Middleware Software ties it all together tools,
    services, etc.
  • Enhancing collaboration via transparent resource
    sharing

US-CMS Virtual Organization
5
Data Grids Collaborative Research
  • Team-based 21st century scientific discovery
  • Strongly dependent on advanced information
    technology
  • People and resources distributed internationally
  • Dominant factor data growth (1 Petabyte 1000
    TB)
  • 2000 0.5 Petabyte
  • 2005 10 Petabytes
  • 2010 100 Petabytes
  • 2015-7 1000 Petabytes?
  • Drives need for powerful linked resources Data
    Grids
  • Computation Massive, distributed CPU
  • Data storage and access Distributed hi-speed disk
    and tape
  • Data movement International optical networks
  • Collaborative research and Data Grids
  • Data discovery, resource sharing, distributed
    analysis, etc.

How to collect, manage, access and interpret
this quantity of data?
6
Examples of Data Intensive Disciplines
  • High energy nuclear physics
  • Belle, BaBar, Tevatron, RHIC, JLAB
  • Large Hadron Collider (LHC)
  • Astronomy
  • Digital sky surveys, Virtual Observatories
  • VLBI arrays multiple- Gb/s data streams
  • Gravity wave searches
  • LIGO, GEO, VIRGO, TAMA, ACIGA,
  • Earth and climate systems
  • Earth Observation, climate modeling,
    oceanography,
  • Biology, medicine, imaging
  • Genome databases
  • Proteomics (protein structure interactions,
    drug delivery, )
  • High-resolution brain scans (1-10?m, time
    dependent)

Primary driver
7
Our Vision Goals
  • Develop the technologies tools needed to
    exploit a Grid-based cyberinfrastructure
  • Apply and evaluate those technologies tools in
    challenging scientific problems
  • Develop the technologies procedures to support
    a permanent Grid-based cyberinfrastructure
  • Create and operate a persistent Grid-based
    cyberinfrastructure in support of
    discipline-specific research goals

End-to-end
GriPhyN iVDGL DOE Particle Physics Data Grid
(PPDG) Trillium
8
Our Science Drivers
  • Experiments at Large Hadron Collider
  • New fundamental particles and forces
  • 100s of Petabytes 2007 - ?
  • High Energy Nuclear Physics expts
  • Top quark, nuclear matter at extreme density
  • 1 Petabyte (1000 TB) 1997 present
  • LIGO (gravity wave search)
  • Search for gravitational waves
  • 100s of Terabytes 2002 present
  • Sloan Digital Sky Survey
  • Systematic survey of astronomical objects
  • 10s of Terabytes 2001 present

9
Grid Middleware Virtual Data Toolkit
VDT
NMI
Test
Sources (CVS)
Build
Binaries
Build Test Condor pool 22 Op. Systems
Pacman cache
Package
Patching
RPMs
Build
Binaries
GPT src bundles
Build
Binaries
Test
Many Contributors
A unique laboratory for testing, supporting,
deploying, packaging, upgrading,
troubleshooting complex sets of software!
10
VDT Growth Over 3 Years
www.griphyn.org/vdt/
VDT 1.1.8 First real use by LCG
VDT 1.0 Globus 2.0b Condor 6.3.1
of components
VDT 1.1.11 Grid3
VDT 1.1.7 Switch to Globus 2.2
11
Components of VDT 1.3.5
  • Globus 3.2.1
  • Condor 6.7.6
  • RLS 3.0
  • ClassAds 0.9.7
  • Replica 2.2.4
  • DOE/EDG CA certs
  • ftsh 2.0.5
  • EDG mkgridmap
  • EDG CRL Update
  • GLUE Schema 1.0
  • VDS 1.3.5b
  • Java
  • Netlogger 3.2.4
  • Gatekeeper-Authz
  • MyProxy1.11
  • KX509
  • System Profiler
  • GSI OpenSSH 3.4
  • Monalisa 1.2.32
  • PyGlobus 1.0.6
  • MySQL
  • UberFTP 1.11
  • DRM 1.2.6a
  • VOMS 1.4.0
  • VOMS Admin 0.7.5
  • Tomcat
  • PRIMA 0.2
  • Certificate Scripts
  • Apache
  • jClarens 0.5.3
  • New GridFTP Server
  • GUMS 1.0.1

12
Collaborative RelationshipsA CS VDT
Perspective
Partner science projects Partner networking
projects Partner outreach projects
Requirements
Prototyping experiments
Production Deployment
  • Other linkages
  • Work force
  • CS researchers
  • Industry

Computer Science Research
Virtual Data Toolkit
Larger Science Community
Techniques software
Tech Transfer
Globus, Condor, NMI, iVDGL, PPDG EU DataGrid, LHC
Experiments, QuarkNet, CHEPREO, Dig. Divide
U.S.Grids
Intl
Outreach
13
U.S. Trillium Grid Partnership
  • Trillium PPDG GriPhyN iVDGL
  • Particle Physics Data Grid 12M (DOE) (1999
    2006)
  • GriPhyN 12M (NSF) (2000 2005)
  • iVDGL 14M (NSF) (2001 2006)
  • Basic composition (150 people)
  • PPDG 4 universities, 6 labs
  • GriPhyN 12 universities, SDSC, 3 labs
  • iVDGL 18 universities, SDSC, 4 labs, foreign
    partners
  • Expts BaBar, D0, STAR, Jlab, CMS, ATLAS, LIGO,
    SDSS/NVO
  • Coordinated internally to meet broad goals
  • GriPhyN CS research, Virtual Data Toolkit (VDT)
    development
  • iVDGL Grid laboratory deployment using VDT,
    applications
  • PPDG End to end Grid services, monitoring,
    analysis
  • Common use of VDT for underlying Grid middleware
  • Unified entity when collaborating internationally

14
Goal Peta-scale Data Grids forGlobal Science
Production Team
Single Researcher
Workgroups
Interactive User Tools
Request Execution Management Tools
Request Planning Scheduling Tools
Virtual Data Tools
ResourceManagementServices
Security andPolicyServices
Other GridServices
  • PetaOps
  • Petabytes
  • Performance

Distributed resources(code, storage,
CPUs,networks)
Raw datasource
15
Sloan Digital Sky Survey (SDSS)Using Virtual
Data in GriPhyN
16
The LIGO Scientific Collaboration (LSC)and the
LIGO Grid
  • LIGO Grid 6 US sites

iVDGL has enabled LSC to establish a persistent
production grid
17
Large Hadron Collider its Frontier
Computing Challenges
18
Large Hadron Collider (LHC)_at_ CERN
  • 27 km Tunnel in Switzerland France

TOTEM
CMS
ALICE
LHCb
  • Search for
  • Origin of Mass
  • New fundamental forces
  • Supersymmetry
  • Other new particles
  • 2007 ?

ATLAS
19
CMS Compact Muon Solenoid
Inconsequential humans
20
LHC Data Rates Detector to Storage
40 MHz
TBytes/sec
Physics filtering
Level 1 Trigger Special Hardware
75 GB/sec
75 KHz
Level 2 Trigger Commodity CPUs
5 GB/sec
5 KHz
Level 3 Trigger Commodity CPUs
0.15 1.5 GB/sec
100 Hz
Raw Data to storage( simulated data)
21
Complexity Higgs Decay to 4 Muons
(30 minimum bias events)
All charged tracks with pt gt 2 GeV
Reconstructed tracks with pt gt 25 GeV
109 collisions/sec, selectivity 1 in 1013
22
LHC Petascale Global Science
  • Complexity Millions of individual detector
    channels
  • Scale PetaOps (CPU), 100s of Petabytes (Data)
  • Distribution Global distribution of people
    resources

BaBar/D0 Example - 2004 700 Physicists 100
Institutes 35 Countries
CMS Example- 2007 5000 Physicists 250
Institutes 60 Countries
23
LHC Global Data Grid (2007)
  • 5000 physicists, 60 countries
  • 10s of Petabytes/yr by 2008
  • 1000 Petabytes in lt 10 yrs?

CMS Experiment
Online System
CERN Computer Center
150 - 1500 MB/s
Tier 0
10-40 Gb/s
Tier 1
gt10 Gb/s
Tier 2
2.5-10 Gb/s
Tier 3
Tier 4
Physics caches
PCs
24
University Tier2 Centers
  • Tier2 facility
  • Essential university role in extended computing
    infrastructure
  • 20 25 of Tier1 national laboratory, supported
    by NSF
  • Validated by 3 years of experience (CMS, ATLAS,
    LIGO)
  • Functions
  • Perform physics analysis, simulations
  • Support experiment software
  • Support smaller institutions
  • Official role in Grid hierarchy (U.S.)
  • Sanctioned by MOU with parent organization
    (ATLAS, CMS, LIGO)
  • Selection by collaboration via careful process
  • Local P.I. with reporting responsibilities

25
Grids and Globally Distributed Teams
  • Non-hierarchical Chaotic analyses productions
  • Superimpose significant random data flows

26
Grid3 and Open Science Grid
27
  • Grid3 A National Grid Infrastructure
  • 32 sites, 4000 CPUs Universities 4 national
    labs
  • Part of LHC Grid, Running since October 2003
  • Sites in US, Korea, Brazil, Taiwan
  • Applications in HEP, LIGO, SDSS, Genomics, fMRI,
    CS

Brazil
http//www.ivdgl.org/grid3
28
Grid3 World Map
29
Grid3 Components
  • Computers storage at 30 sites 4000 CPUs
  • Uniform service environment at each site
  • Globus Toolkit Provides basic authentication,
    execution management, data movement
  • Pacman Installs numerous other VDT and
    application services
  • Global virtual organization services
  • Certification registration authorities, VO
    membership services, monitoring services
  • Client-side tools for data access analysis
  • Virtual data, execution planning, DAG management,
    execution management, monitoring
  • IGOC iVDGL Grid Operations Center
  • Grid testbed Grid3dev
  • Middleware development and testing, new VDT
    versions, etc.

30
Grid3 Applications
CMS experiment p-p collision simulations analysis
ATLAS experiment p-p collision simulations analysis
BTEV experiment p-p collision simulations analysis
LIGO Search for gravitational wave sources
SDSS Galaxy cluster finding
Bio-molecular analysis Shake n Bake (SnB) (Buffalo)
Genome analysis GADU/Gnare
fMRI Functional MRI (Dartmouth)
CS Demonstrators Job Exerciser, GridFTP, NetLogger
www.ivdgl.org/grid3/applications
31
Grid3 Shared Use Over 6 months
Usage CPUs
Sep 10
32
Grid3 Production Over 13 Months
33
U.S. CMS 2003 Production
  • 10M p-p collisions largest ever
  • 2x simulation sample
  • ½ manpower
  • Multi-VO sharing

34
Grid3 as CS Research LabE.g., Adaptive
Scheduling
  • Adaptive data placementin a realistic
    environment(K. Ranganathan)
  • Enables comparisonswith simulations

35
Grid3 Lessons Learned
  • How to operate a Grid as a facility
  • Tools, services, error recovery, procedures,
    docs, organization
  • Delegation of responsibilities (Project, VO,
    service, site, )
  • Crucial role of Grid Operations Center (GOC)
  • How to support people ? people relations
  • Face-face meetings, phone cons, 1-1 interactions,
    mail lists, etc.
  • How to test and validate Grid tools and
    applications
  • Vital role of testbeds
  • How to scale algorithms, software, process
  • Some successes, but interesting failure modes
    still occur
  • How to apply distributed cyberinfrastructure
  • Successful production runs for several
    applications

36
Grid3 ? Open Science Grid
  • Iteratively build extend Grid3
  • Grid3 ? OSG-0 ? OSG-1 ? OSG-2 ?
  • Shared resources, benefiting broad set of
    disciplines
  • Grid middleware based on Virtual Data Toolkit
    (VDT)
  • Emphasis on end to end services for
    applications
  • OSG collaboration
  • Computer and application scientists
  • Facility, technology and resource providers
    (labs, universities)
  • Further develop OSG
  • Partnerships and contributions from other
    sciences, universities
  • Incorporation of advanced networking
  • Focus on general services, operations, end-to-end
    performance
  • Aim for Summer 2005 deployment

37
http//www.opensciencegrid.org
38
OSG Organization
Technical Groups
Advisory Committee
Universities,Labs
Service Providers
Executive Board (8-15 representatives Chair,
Officers)
Sites
Researchers
VOs
Research Grid Projects
Enterprise
OSG Council (all members above a certain
threshold, Chair, officers)
Core OSG Staff (few FTEs, manager)
39
OSG Technical Groups Activities
  • Technical Groups address and coordinate technical
    areas
  • Propose and carry out activities related to their
    given areas
  • Liaise collaborate with other peer projects
    (U.S. international)
  • Participate in relevant standards organizations.
  • Chairs participate in Blueprint, Integration and
    Deployment activities
  • Activities are well-defined, scoped tasks
    contributing to OSG
  • Each Activity has deliverables and a plan
  • is self-organized and operated
  • is overseen sponsored by one or more
    Technical Groups

TGs and Activities are where the real work gets
done
40
OSG Technical Groups
Governance Charter, organization, by-laws, agreements, formal processes
Policy VO site policy, authorization, priorities, privilege access rights
Security Common security principles, security infrastructure
Monitoring and Information Services Resource monitoring, information services, auditing, troubleshooting
Storage Storage services at remote sites, interfaces, interoperability
Support Centers Infrastructure and services for user support, helpdesk, trouble ticket
Education / Outreach Training, interface with various E/O projects
Networks (new) Including interfacing with various networking projects
41
OSG Activities
Blueprint Defining principles and best practices for OSG
Deployment Deployment of resources services
Provisioning Connected to deployment
Incidence response Plans and procedures for responding to security incidents
Integration Testing validating integrating new services and technologies
Data Resource Management (DRM) Deployment of specific Storage Resource Management technology
Documentation Organizing the documentation infrastructure
Accounting Accounting and auditing use of OSG resources
Interoperability Primarily interoperability between
Operations Operating Grid-wide services
42
Connections to European ProjectsLCG and EGEE
43
The Path to the OSG Operating Grid
OSG Integration Activity
Readiness plan Effort Resources
Readiness plan adopted
VO Application Software Installation
Software packaging
OSG Deployment Activity
Service deployment
OSG Operations-Provisioning Activity
Release Candidate
Application validation
Middleware Interoperability
Functionality Scalability Tests
feedback
Metrics Certification
Release Description
44
OSG Integration Testbed
Brazil
45
Status of OSG Deployment
  • OSG infrastructure release accepted for
    deployment.
  • US CMS MOP flood testing successful
  • D0 simulation reprocessing jobs running on
    selected OSG sites
  • Others in various stages of readying applications
    infrastructure(ATLAS, CMS, STAR, CDF, BaBar,
    fMRI)
  • Deployment process underway End of July?
  • Open OSG and transition resources from Grid3
  • Applications will use growing ITB OSG resources
    during transition

http//osg.ivdgl.org/twiki/bin/view/Integration/We
bHome
46
Interoperability Federation
  • Transparent use of Federated Grid infrastructures
    a goal
  • There are sites that appear as part of LCG as
    well as part of OSG/Grid3
  • D0 bringing reprocessing to LCG sites through
    adaptor node
  • CMS and ATLAS can run their jobs on both LCG and
    OSG
  • Increasing interaction with TeraGrid
  • CMS and ATLAS sample simulation jobs are running
    on TeraGrid
  • Plans for TeraGrid allocation for jobs running in
    Grid3 model with group accounts, binary
    distributions, external data management, etc

47
Networks
48
Evolving Science Requirements for Networks (DOE
High Perf. Network Workshop)
Science Areas Today End2End Throughput 5 years End2End Throughput 5-10 Years End2End Throughput Remarks
High Energy Physics 0.5 Gb/s 100 Gb/s 1000 Gb/s High bulk throughput
Climate (Data Computation) 0.5 Gb/s 160-200 Gb/s N x 1000 Gb/s High bulk throughput
SNS NanoScience Not yet started 1 Gb/s 1000 Gb/s QoS for Control Channel Remote control and time critical throughput
Fusion Energy 0.066 Gb/s(500 MB/s burst) 0.2 Gb/s(500MB/20 sec. burst) N x 1000 Gb/s Time critical throughput
Astrophysics 0.013 Gb/s(1 TB/week) NN multicast 1000 Gb/s Computational steering and collaborations
Genomics Data Computation 0.091 Gb/s(1 TB/day) 100s of users 1000 Gb/s QoS for Control Channel High throughput and steering
See http//www.doecollaboratory.org/meetings/hpnpw
/
49
UltraLight Advanced Networkingin Applications
Funded by ITR2004
  • 10 Gb/s network
  • Caltech, UF, FIU, UM, MIT
  • SLAC, FNAL
  • Intl partners
  • Level(3), Cisco, NLR

50
UltraLight New Information System
  • A new class of integrated information systems
  • Includes networking as a managed resource for the
    first time
  • Uses Hybrid packet-switched and
    circuit-switched optical network infrastructure
  • Monitor, manage optimize network and Grid
    Systems in realtime
  • Flagship applications HEP, eVLBI, burst
    imaging
  • Terabyte-scale data transactions in minutes
  • Extend Real-Time eVLBI to the 10 100 Gb/s Range
  • Powerful testbed
  • Significant storage, optical networks for testing
    new Grid services
  • Strong vendor partnerships
  • Cisco, Calient, NLR, CENIC, Internet2/Abilene


51
Education and Outreach
52
iVDGL, GriPhyN Education/Outreach
  • Basics
  • 200K/yr
  • Led by UT Brownsville
  • Workshops, portals, tutorials
  • New partnerships with QuarkNet, CHEPREO, LIGO
    E/O,

53
June 2004 Grid Summer School
  • First of its kind in the U.S. (South Padre
    Island, Texas)
  • 36 students, diverse origins and types (M, F,
    MSIs, etc)
  • Marks new direction for U.S. Grid efforts
  • First attempt to systematically train people in
    Grid technologies
  • First attempt to gather relevant materials in one
    place
  • Today Students in CS and Physics
  • Next Students, postdocs, junior senior
    scientists
  • Reaching a wider audience
  • Put lectures, exercises, video, on the web
  • More tutorials, perhaps 2-3/year
  • Dedicated resources for remote tutorials
  • Create Grid Cookbook, e.g. Georgia Tech
  • Second workshop July 1115, 2005
  • South Padre Island again

54
QuarkNet/GriPhyN e-Lab Project
http//quarknet.uchicago.edu/elab/cosmic/home.jsp
55
Student Muon Lifetime Analysis in GriPhyN/QuarkNet
56
CHEPREO Center for High Energy Physics Research
and Educational OutreachFlorida International
University
  • Physics Learning Center
  • CMS Research
  • iVDGL Grid Activities
  • AMPATH network (S. America)
  • Funded September 2003
  • 4M initially (3 years)
  • MPS, CISE, EHR, INT

57
Grids and the Digital DivideRio de Janeiro, Feb.
16-20, 2004
  • Background
  • World Summit on Information Society
  • HEP Standing Committee on Inter-regional
    Connectivity (SCIC)
  • Themes
  • Global collaborations, Grids and addressing the
    Digital Divide
  • Focus on poorly connected regions
  • Next meeting Daegu, Korea
  • May 23-27, 2005

NEWS Bulletin ONE TWOWELCOME BULLETIN
General InformationRegistrationTravel
Information Hotel Registration Participant List
How to Get UERJ/Hotel Computer Accounts Useful
Phone Numbers Program Contact us Secretariat
Chairmen
58
Partnerships Drive Success
  • Integrating Grids in scientific research
  • Lab-centric Activities center around large
    facility
  • Team-centric Resources shared by distributed
    teams
  • Knowledge-centric Knowledge generated/used by
    a community
  • Strengthening the role of universities in
    frontier research
  • Couples universities to frontier data intensive
    research
  • Brings front-line research and resources to
    students
  • Exploits intellectual resources at minority or
    remote institutions
  • Driving advances in IT/science/engineering
  • Domain sciences ? Computer Science
  • Universities ? Laboratories
  • Scientists ? Students
  • NSF projects ? NSF projects
  • NSF ? DOE
  • Research communities ? IT industry

59
Fulfilling the Promise ofNext Generation Science
  • Supporting permanent, national-scale Grid
    infrastructure
  • Large CPU, storage and network capability crucial
    for science
  • Support personnel, equipment maintenance,
    replacement, upgrade
  • Tier1 and Tier2 resources a vital part of
    infrastructure
  • Open Science Grid a unique national
    infrastructure for science
  • Supporting the maintenance, testing and
    dissemination of advanced middleware
  • Long-term support of the Virtual Data Toolkit
  • Vital for reaching new disciplines for
    supporting large international collaborations
  • Continuing support for HEP as a frontier
    challenge driver
  • Huge challenges posed by LHC global interactive
    analysis
  • New challenges posed by remote operation of
    Global Accelerator Network

60
Fulfilling the Promise (2)
  • Creating even more advanced cyberinfrastructure
  • Integrating databases in large-scale Grid
    environments
  • Interactive analysis with distributed teams
  • Partnerships involving CS research with
    application drivers
  • Supporting the emerging role of advanced networks
  • Reliable, high performance LANs and WANs
    necessary for advanced Grid applications
  • Partnering to enable stronger, more diverse
    programs
  • Programs supported by multiple Directorates, a la
    CHEPREO
  • NSF-DOE joint initiatives
  • Strengthen ability of universities and labs to
    work together
  • Providing opportunities for cyberinfrastructure
    training, education outreach
  • Grid tutorials, Grid Cookbook
  • Collaborative tools for student-led projects
    research

61
Summary
  • Grids enable 21st century collaborative science
  • Linking research communities and resources for
    scientific discovery
  • Needed by global collaborations pursuing
    petascale science
  • Grid3 was an important first step in developing
    US Grids
  • Value of planning, coordination, testbeds, rapid
    feedback
  • Value of learning how to operate a Grid as a
    facility
  • Value of building sustaining community
    relationships
  • Grids drive need for advanced optical networks
  • Grids impact education and outreach
  • Providing technologies resources for training,
    education, outreach
  • Addressing the Digital Divide
  • OSG a scalable computing infrastructure for
    science?
  • Strategies needed to cope with increasingly large
    scale

62
Grid Project References
  • Open Science Grid
  • www.opensciencegrid.org
  • Grid3
  • www.ivdgl.org/grid3
  • Virtual Data Toolkit
  • www.griphyn.org/vdt
  • GriPhyN
  • www.griphyn.org
  • iVDGL
  • www.ivdgl.org
  • PPDG
  • www.ppdg.net
  • CHEPREO
  • www.chepreo.org
  • UltraLight
  • ultralight.cacr.caltech.edu
  • Globus
  • www.globus.org
  • Condor
  • www.cs.wisc.edu/condor
  • LCG
  • www.cern.ch/lcg
  • EU DataGrid
  • www.eu-datagrid.org
  • EGEE
  • www.eu-egee.org

63
Extra Slides
64
GriPhyN Goals
  • Conduct CS research to achieve vision
  • Virtual Data as unifying principle
  • Planning, execution, performance monitoring
  • Disseminate through Virtual Data Toolkit
  • A concrete deliverable
  • Integrate into GriPhyN science experiments
  • Common Grid tools, services
  • Educate, involve, train students in IT research
  • Undergrads, grads, postdocs,
  • Underrepresented groups

65
iVDGL Goals
  • Deploy a Grid laboratory
  • Support research mission of data intensive
    experiments
  • Provide computing and personnel resources at
    university sites
  • Provide platform for computer science technology
    development
  • Prototype and deploy a Grid Operations Center
    (iGOC)
  • Integrate Grid software tools
  • Into computing infrastructures of the experiments
  • Support delivery of Grid technologies
  • Hardening of the Virtual Data Toolkit (VDT) and
    other middleware technologies developed by
    GriPhyN and other Grid projects
  • Education and Outreach
  • Lead and collaborate with Education and Outreach
    efforts
  • Provide tools and mechanisms for underrepresented
    groups and remote regions to participate in
    international science projects

66
CMS Grid Enabled Analysis Architecture
  • ROOT (analysis tool)
  • Python
  • Cojac (detector viz)/IGUANA (cms viz)

Analysis Client
Analysis Client
  • Clients talk standard protocols to Grid Services
    Web Server
  • Simple Web service API allows simple or complex
    analysis clients
  • Typical clients ROOT, Web Browser, .
  • Clarens portal hides complexity
  • Key features Global Scheduler, Catalogs,
    Monitoring, Grid-wide Execution service
  • Discovery
  • ACL management
  • Cert. based access

HTTP, SOAP, XML-RPC
Clarens
Grid Services Web Server
Scheduler
Catalogs
Fully- Abstract Planner
Metadata
Sphinx
RefDB
Partially- Abstract Planner
MCRunjob
Virtual Data
MonALISA
ORCA
Applications
Data Manage-ment
Chimera
Monitoring
MOPDB
Replica
Fully- Concrete Planner
BOSS
ROOT
FAMOS
POOL
Grid
Execution Priority Manager
VDT-Server
Grid Wide Execution Service
67
Virtual Data Derivation Provenance
  • Most scientific data are not simple
    measurements
  • They are computationally corrected/reconstructed
  • They can be produced by numerical simulation
  • Science eng. projects are more CPU and data
    intensive
  • Programs are significant community resources
    (transformations)
  • So are the executions of those programs
    (derivations)
  • Management of dataset dependencies critical!
  • Derivation Instantiation of a potential data
    product
  • Provenance Complete history of any existing
    data product
  • Previously Manual methods
  • GriPhyN Automated, robust tools

68
Virtual Data Example HEP Analysis
Scientist adds a new derived data branch
continues analysis
decay bb
decay WW WW ? leptons
decay ZZ
mass 160
decay WW WW ? e??? Pt gt 20
decay WW WW ? e???
decay WW
69
Packaging of Grid Software Pacman
  • Language define software environments
  • Interpreter create, install, configure, update,
    verify environments
  • Version 3.0.2 released Jan. 2005

Combine and manage software from arbitrary
sources.
  • LCG/Scram
  • ATLAS/CMT
  • CMS DPE/tar/make
  • LIGO/tar/make
  • OpenSource/tar/make
  • Globus/GPT
  • NPACI/TeraGrid/tar/make
  • D0/UPS-UPD
  • Commercial/tar/make

1 button install Reduce burden on
administrators
pacman get iVDGLGrid3
Remote experts define installation/
config/updating for everyone at once
70
Virtual Data Motivations
Ive detected a muon calibration error and want
to know which derived data products need to be
recomputed.
Ive found some interesting data, but I need to
know exactly what corrections were applied before
I can trust it.
I want to search a database for 3 muon events.
If a program that does this analysis exists, I
wont have to write one from scratch.
I want to apply a forward jet analysis to 100M
events. If the results already exist, Ill save
weeks of computation.
71
Background Data Grid Projects
Driven primarily by HEP applications
  • U.S. Funded Projects
  • GriPhyN (NSF)
  • iVDGL (NSF)
  • Particle Physics Data Grid (DOE)
  • UltraLight
  • TeraGrid (NSF)
  • DOE Science Grid (DOE)
  • NEESgrid (NSF)
  • NSF Middleware Initiative (NSF)
  • EU, Asia projects
  • EGEE (EU)
  • LCG (CERN)
  • DataGrid
  • EU national Projects
  • DataTAG (EU)
  • CrossGrid (EU)
  • GridLab (EU)
  • Japanese, Korea Projects
  • Many projects driven/led by HEP CS
  • Many 10s x M brought into the field
  • Large impact on other sciences, education

72
Virtual Data Derivation Provenance
  • Most scientific data are not simple
    measurements
  • They are computationally corrected/reconstructed
  • They can be produced by numerical simulation
  • Science eng. projects are more CPU and data
    intensive
  • Programs are significant community resources
    (transformations)
  • So are the executions of those programs
    (derivations)
  • Management of dataset dependencies critical!
  • Derivation Instantiation of a potential data
    product
  • Provenance Complete history of any existing
    data product
  • Previously Manual methods
  • GriPhyN Automated, robust tools

73
Muon Lifetime Analysis Workflow
74
(Early) Virtual Data Language
begin v /usr/local/demo/scripts/cmkin_input.csh
file i ntpl_file_path file i template_file
file i num_events stdout cmkin_param_fileendb
egin v /usr/local/demo/binaries/kine_make_ntpl_pyt
_cms121.exe pre cms_env_var stdin
cmkin_param_file stdout cmkin_log file o
ntpl_fileendbegin v /usr/local/demo/scripts/cms
im_input.csh file i ntpl_file file i
fz_file_path file i hbook_file_path file i
num_trigs stdout cmsim_param_fileendbegin v
/usr/local/demo/binaries/cms121.exe condor
copy_to_spoolfalse condor getenvtrue stdin
cmsim_param_file stdout cmsim_log file o
fz_file file o hbook_fileendbegin v
/usr/local/demo/binaries/writeHits.sh condor
getenvtrue pre orca_hits file i fz_file
file i detinput file i condor_writeHits_log
file i oo_fd_boot file i datasetname stdout
writeHits_log file o hits_dbendbegin v
/usr/local/demo/binaries/writeDigis.sh pre
orca_digis file i hits_db file i
oo_fd_boot file i carf_input_dataset_name
file i carf_output_dataset_name file i
carf_input_owner file i carf_output_owner
file i condor_writeDigis_log stdout
writeDigis_log file o digis_dbend
pythia_input
pythia.exe
cmsim_input
cmsim.exe
writeHits
writeDigis
CMS Pipeline
75
QuarkNet Portal Architecture
  • Simpler interface for non-experts
  • Builds on Chiron portal

76
Integration of GriPhyN and IVDGL
  • Both funded by NSF large ITRs, overlapping
    periods
  • GriPhyN CS Research, Virtual Data
    Toolkit (9/20009/2005)
  • iVDGL Grid Laboratory, applications (9/20019/200
    6)
  • Basic composition
  • GriPhyN 12 universities, SDSC, 4 labs (80
    people)
  • iVDGL 18 institutions, SDSC, 4 labs (100
    people)
  • Expts CMS, ATLAS, LIGO, SDSS/NVO
  • GriPhyN (Grid research) vs iVDGL (Grid
    deployment)
  • GriPhyN 2/3 CS 1/3 physics ( 0 H/W)
  • iVDGL 1/3 CS 2/3 physics (20 H/W)
  • Many common elements
  • Common Directors, Advisory Committee, linked
    management
  • Common Virtual Data Toolkit (VDT)
  • Common Grid testbeds
  • Common Outreach effort

77
GriPhyN Overview
Virtual Data
Science
Review
Researcher
Applications
Planning
Planning
Chimera virtual data system Pegasus
planner DAGman Globus Toolkit Condor Ganglia,
etc.
instrument
planning
planning
Production
data
Manager
Execution
Execution
Services
Services
Virtual Data Toolkit
Grid Fabric
78
Chiron/QuarkNet Architecture
79
Cyberinfrastructure
  • A new age has dawned in scientific
    engineering research, pushed by continuing
    progress in computing, information, and
    communication technology, pulled by the
    expanding complexity, scope, and scale of todays
    challenges. The capacity of this technology has
    crossed thresholds that now make possible a
    comprehensive cyberinfrastructure on which to
    build new types of scientific engineering
    knowledge environments organizations and to
    pursue research in new ways with increased
    efficacy.
  • NSF Blue Ribbon Panel report, 2003

80
Fulfilling the Promise ofNext Generation Science
Our multidisciplinary partnership of physicists,
computer scientists, engineers, networking
specialists and education experts, from
universities and laboratories, has achieved
tremendous success in creating and maintaining
general purpose cyberinfrastructure supporting
leading-edge science. But these achievements
have occurred in the context of overlapping
short-term projects. How can we ensure the
survival of valuable existing cyber-infrastructure
while continuing to address new challenges posed
by frontier scientific and engineering endeavors?
81
Production Simulations on Grid3
US-CMS Monte Carlo Simulation
Used 1.5 ? US-CMS resources
Non-USCMS
USCMS
Write a Comment
User Comments (0)
About PowerShow.com