Status of Interoperability - PowerPoint PPT Presentation

About This Presentation
Title:

Status of Interoperability

Description:

For clients no root privilege. Allow us to distribute ... SQLite DB. Monalisa. Java objects. Using JINI. LCG. MDS. GRIS only. BDII. Production quality GIIS ... – PowerPoint PPT presentation

Number of Views:23
Avg rating:3.0/5.0
Slides: 24
Provided by: lfi65
Category:

less

Transcript and Presenter's Notes

Title: Status of Interoperability


1
Status of Interoperability
  • LCG-2/EGEE and OSG
  • O Keeble L Field

2
Introduction
  • What is LCG?
  • A production grid
  • A software stack
  • EGEE/gLite
  • What does interoperation mean?
  • LCG/EGEE already federal
  • Why do we want it?
  • Why do we want grids at all
  • Efficiency through collaboration

3
Status August 2005
  • Country providing resources
  • Country anticipating joining
  • 160 sites, 40 countries
  • 14,000 cpu
  • 5 PB storage

4
LCG installation
  • YAIM
  • Separates installation and configuration
  • Installation
  • RPMs
  • Create meta packages for service and node types
  • Use apt-get to install nodes and services
  • Tarballs
  • For clients no root privilege
  • Allow us to distribute and install the middleware
    on sites
  • This is what we use for installation on OSG
  • Configuration
  • Split into bash functions for each service
  • Node types made by grouping bash functions
  • Configuration script executes bash functions

5
LCG Interoperation Nordugrid
  • Middleware stack called ARC, diverged from LCGs
    predecessor (EDG) in 2002
  • 54 Sites, 4774 CPUs
  • Ongoing discussions on interoperation, a number
    of strategies
  • Gateway nodes?
  • Mutual deployment of clients?
  • Standardise site interfaces?
  • Avoid divergence

6
LCG/OSG Interoperation Status
  • Progressing very well
  • Intensive activity over the past three weeks
  • LCG to OSG
  • Almost there OSG appears as a single site
  • IS, monitoring, job matches, data transfer
  • A few remaining issues
  • OSG to LCG
  • Job submission works
  • Need to still check the rest

7
Interoperation Timeline
  • 12/08/2005
  • Set up OSG BDII on data.grid.iu.edu
  • Added site IU_iuatlas to the OSG BDII
  • Added this BDII to the LCG GOC DB
  • OSG BDII in the Gstat Grid Status Monitoring
  • 18/08/2005
  • LCG client tools installed as exp software
  • 19/08/2005
  • Site Passes Gstat Grid Status tests
  • LCG to OSG job submissions working via RB
  • 20/08/2005
  • SFTs running at the OSG site but with a few
    failures
  • 23/08/2005
  • OSG to LCG job submissions working

8
Interoperations Todo
  • LCG jobs need to source the environment
  • Requires RB fix
  • Investigate the exp software installation
  • Harmonisation?
  • VOs and their management
  • Common operations or monitoring VO?
  • CAs
  • Accounting
  • Operations
  • What happen when sites have problems
  • EGEE has a very proactive operations policy
  • Monitoring ?
  • MonALISA/R-GMA interoperation

9
Grid Monitoring
  • Monitoring is still a problem
  • Sensors
  • If we dont have common sensors
  • We need a common schema
  • Transport
  • Needs Interoperating Transport layers
  • Visualisation
  • Multiple Visualisation tools is okay
  • Variety is the spice of life ?

10
Grid Monitoring - Sensors
  • OSG
  • Glue info providers
  • MDS info providers
  • Grid 3 info providers
  • GridCat
  • SQL lite database
  • Monalisa Sensors
  • LCG
  • Glue info providers
  • GridIce Sensors
  • Lemon fabric monitoring
  • Custom sensors
  • Job Resource Monitor
  • Job State Monitor
  • Apel
  • Accounting
  • GridFTP Monitor

11
Grid Monitoring - Transport
  • OSG
  • MDS
  • LDAP based
  • GridCat
  • Gatekeeper interface
  • SQLite DB
  • Monalisa
  • Java objects
  • Using JINI
  • LCG
  • MDS
  • GRIS only
  • BDII
  • Production quality GIIS
  • GridIce
  • Parallel GRIS
  • R-GMA
  • ALL data now exported
  • Bus for monitoring data

12
Grid Monitoring - Visualisation
  • OSG
  • Grid Cat
  • ACDC
  • Monalisa
  • LCG
  • Grid Status
  • GridIce
  • Flying Jobs
  • SFTs
  • Accounting
  • GridView
  • GridFTP
  • Job Monitoring

13
Hour Glass Model
  • Job Submission
  • GRAM
  • Service Discovery
  • LDAP
  • Data Management
  • SRM
  • Security
  • GSI
  • MON
  • ????? collaboration

Globus hourglass
Grid Wide Services
Security
MON
SD
JS
DM
Batch System
Storage System
14
Monitoring and Information Services Core
Infrastructure
Latest
Query Server
Query API
Insert Server
Insert API
History
Fwd
  • This could be solving a shared problem
  • The missing interface for Monitoring?
  • Need to have common schemas
  • Martin Swany is currently doing an audit
  • OSG and LCG information

15
Other areas for collaboration
  • MIS-CI
  • Experiment software installation
  • Operations
  • Middleware testing and configuration
  • Identify shared problems/goals and solutions
  • Interaction between VDT, OSG and LCG
  • Remove duplicated effort
  • Improve release quality

16
Summary
  • Interoperability is progressing well
  • Still work to be done
  • Keep up the momentum
  • Include more OSG sites
  • Discussions at LCG Workshop
  • Operations
  • Accounting
  • Start work on MIS-CI
  • Look for other areas of collaboration

17
LCG and Nordugrid comparison
Service/component LCG-2, gLite ARC
Basis GT2 from VDT GT2 own patch, GT3 pre-WS
Data transfer GridFTP, SRM v? (DPM) GridFTP, SRM v1.1 client
Data management EDG RLS, Fireman Co, LFC RC, RLS, Fireman
Information LDAP, GLUE1.1, MDSBDII, R-GMA LDAP, ARC schema, MDS-GIIS
Job description JDL (based on classAds) RSL
Job submission Condor-G to GRAM GridFTP
VO management VOMS, gLite VOMS, CAS (?) VOMS
Try to avoid divergence!
18
Service Challenges
June05 - Technical Design Report
Sep05 - SC3 Service Phase
May06 SC4 Service Phase
Sep06 Initial LHC Service in stable operation
Apr07 LHC Service commissioned
SC2
SC2 Reliable data transfer (disk-network-disk)
5 Tier-1s, aggregate 500 MB/sec sustained at
CERN SC3 Reliable base service most Tier-1s,
some Tier-2s basic experiment software chain
grid data throughput 500 MB/sec,
including mass storage (25 of the nominal final
throughput for the proton period) SC4
All Tier-1s, major Tier-2s capable of
supporting full experiment software chain inc.
analysis sustain nominal final grid
data throughput LHC Service in Operation
September 2006 ramp up to full operational
capacity by April 2007 capable of
handling twice the nominal data throughput

19
What is LCG?
  • LHC Computing Grid.
  • Prototype and deploy computing for LHC.
  • It is also a software stack LCG-2_6_0
  • Phase 1 (2002 -2005)
  • Build a prototype, based on existing grid
    middleware.
  • Gain experience in running a production grid
    service.
  • Phase 2 (2006-2008)
  • Build the initial LHC computing environment.
  • Technical Design Report (TDR)
  • Planning for phase 2.
  • In light of experience from phase 1.

20
EGEE and LCG
  • Enabling Grids for E-science.
  • E-Science infrastructure for many apps.
  • Identical for HEP production within Europe.
  • LCG is not a development project.
  • Relies on other grid projects.
  • middleware development and support
  • EGEE2
  • Similar to EGEE but production focused.
  • Proposal currently under preparation

21
Operations Structure
  • Operations Management Centre (OMC)
  • Based at CERN coordination etc.
  • Core Infrastructure Centres (CIC)
  • Manage daily grid operations.
  • Run essential infrastructure services.
  • Provide 2nd level support to ROCs.
  • Regional Operations Centres (ROC)
  • Front-line support for user and operations
    issues.
  • Provide local knowledge and adaptations.
  • One in each region many distributed
  • User Support Centre (GGUS)
  • Based at FZK support portal provide single
    point of contact (service desk)

22
Grid Operations
  • The grid is flat.
  • Hierarchy of responsibility
  • Essential to scale the operation
  • CICs act as a single Operations Centre
  • Operational oversight (grid operator)
    responsibility
  • rotates weekly between CICs
  • Report problems to ROC/RC
  • ROC is responsible for ensuring problem is
    resolved
  • ROC oversees regional RCs
  • ROCs responsible for organising the operations in
    a region
  • Coordinate deployment of middleware, etc
  • CERN coordinates sites not associated with a ROC

It is in setting up this operational
infrastructure where we have really benefited
from EGEE funding
23
ATLAS jobs on LCG in 2005
Number of jobs/day
10,000 concurrent jobs in the system
Write a Comment
User Comments (0)
About PowerShow.com