Grid Analysis Environment GAE - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Grid Analysis Environment GAE

Description:

ORCA/COBRA, IGUANA, PHYSH,.... MonALISA JINI Network. Web Services (WS) ... Installation of CMS (ORCA, COBRA, IGUANA,...) and LCG (POOL, SEAL,...) software ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 29
Provided by: Frankva6
Category:

less

Transcript and Presenter's Notes

Title: Grid Analysis Environment GAE


1
Grid Analysis Environment (GAE)
  • Overview

2
Outline
System View
Frameworks
Early Results
GAE
User View
Associated Projects
3
Goal
  • Provide a transparent environment for a physicist
    to perform his/her analysis (batch/interactive)
    in a distributed dynamic environment Identify
    your data (Catalogs), submit your (complex) job
    (Scheduling, Workflow,JDL), get fair access to
    resources (Priority, Accounting), monitor job
    progress (Monitor, Steering), get the results
    (Storage, Retrieval), repeat the process and
    refine results
  • Support data transfers ranging from the
    (predictable) movement of large scale (simulated)
    data, to the highly dynamic analysis tasks
    initiated by rapidly changing teams of scientist

4
System View
(Domain) Applications
(Domain) Portal
Monitoring
(High Level Services) Global Services
Service Oriented Architecture (Frameworks)
Local Services
Network
Compute
Storage
Resources
Interface Specifications!
Development
Support/ Feedback
Testing
System Stages
Deployment
5
System View (Details)
  • Domains
  • Virtual Organization and Role management
  • Service Oriented Architecture
  • Authorized Access
  • Access Control Management(groups/individuals)
  • Discoverable
  • Protocols (XML-RPC, SOAP,.)
  • Service Version Management
  • Frameworks Clarens, MonALISA,..
  • Monitoring
  • End-to-end monitoring,collecting and
    disseminating information
  • Provide Visualization of Monitor Data to Users

6
System View (Details)
  • Local Services (Local View)
  • Local Catalogs, Storage Systems, Task Tracking
    (Single User Tasks), Policies, Job Submission
  • Global Services (Global View)
  • Discovery Service, Global Catalogs, Policies
  • High Level Services (Autonomous)
  • Acts on monitor data and has global view
  • Scheduling, Data Transfer, Network Optimization,
    Tasks Tracking (many users)

7
System View (Details)
  • (Domain) Portal
  • One Stop Shop for Applications/Users to access
    and Use Grid Resources
  • Task Tracking (Single User Tasks)
  • Graphical User Interface
  • User session logging (provide feedback when
    failures occur)
  • (Domain) Applications
  • ORCA/COBRA, IGUANA, PHYSH,.

8
Framework (MonALISA)
App
WS
Monitor Sensors (Web Services/Applications/)
App
WS
(1) Publish
SS
SS
MonALISAStation Servers
(2) Disseminate
MonALISA JINI Network
MonALISA based Application Servers
AppS
(3) Subscribe
AppS
(4) Steer/Retrieve
Web Services (WS), Applications (App)
WS
WS
App
  • Service/Software Discovery
  • Policy Dissemination
  • Supporting Global and High Level Services
  • ..

9
Framework (Clarens)
3rd party application
  • Authentication (X509)
  • Access control on Web Services.
  • Remote file access (and access control on files).
  • Discovery of Web Services and Software.
  • Shell service. Shell like access to remote
    machines (managed by access control lists).
  • Proxy certificate functionality
  • Group management VO and role management.
  • Good performance of the Web Service Framework
  • Integration with MonALISA

Service
Clarens
Web server
XML-RPC, SOAP. JavaRMI, JSON RPC, ..
http/ https
Clarens
Client
10
(Single) User View (Analysis)
8
Client Application
1
2
Steering
Dataset service
7
3
Discovery
Catalogs
4
9
Planner/ Scheduler
Job Submission
  • Catalogs to select datasets,
  • Resource Application Discovery
  • Schedulers guide jobs to resources
  • Policies enable fair access to resources
  • Robust (large size) data (set) transfer

Execution
6
Storage Management
5
5
Monitor Information
Data Transfer
Policy
Thousands of user jobs (multi user environment)
Storage Management
  • Feedback to users (e.g. status of their jobs)
  • Crash recovery of components (identify and
    restart)
  • Provide secure authorized access to resources and
    services.

11
Projects associated to Grid Enabled Analysis
  • DISUN (deployment)
  • Deployment and Support for Distributed Scientific
    Analysis
  • Ultralight (development)
  • Treating the network as resource
  • Vertically Integrated Monitor Information
  • Multi User, resource constraint view
  • MCPS (development)
  • Provide Clarens based Web Services for batch
    analysis (workflow)
  • SPHINX (development)
  • Policy based scheduling (global service) exposed
    a Clarens Web Service using MonALISA monitor
    information
  • SRM/Dcache (development)
  • Service based data transfer (local service)
  • Lambda Station (development)
  • Authorized programmability of routers using
    MonALISA CLARENS
  • PHYSH
  • CLARENS based services for command line user
    analysis
  • CRAB
  • Client to support user analysis using Clarens Web
    Service Framework

Identify complementary features and integrate
12
Projects associated to Grid Enabled Analysis
  • Clarens_Application (development/testing)
  • Logging functionality
  • Providing Web Services for catalogs
  • Steering
  • Portal development (GUI)
  • Remote file access
  • Distributed testing environment
  • MonALISA_Application (development)
  • Monitor applications for network, compute and
    storage
  • Providing interface to accounting systems
  • .
  • OSG (deployment/testing)
  • Privilege Project
  • Policy Project
  • Phedex
  • Data transfer
  • Condor
  • High throughput computing
  • ..

Identify complementary features and integrate
13
Combining Grid Projects into Grid Analysis
Environment
Clarens_Applications
MonALISA_Applications
PHEDEX
SPHINX
CRAB
Development
Grid Analysis Environment
Ultralight
SRM/dCache
Support/ Feedback
Testing
MonALISA, Clarens,., Frameworks
OSG
Lambda Station
PHYSH
Deployment
Policy
Privilege Project
Condor
..
DISUN
MCPS
GAE focuses on integration
14
Early Results
15
GAE Deployment
  • Clarens has been deployed on 30 machines. Other
    sites Caltech, Florida, Fermilab, CERN,
    Pakistan, INFN
  • Multiple service instances have been deployed on
    several Clarens servers. Different sets of
    service instances are deployed on each server to
    mimic a realistic distributed service
    environment.
  • Installation of CMS (ORCA, COBRA, IGUANA,) and
    LCG (POOL, SEAL,) software on Caltech GAE
    testbed. Serves as environment to integrate
    applications as web services into the Clarens
    framework.
  • Work with CERN to have the GAE components
    included in CMS software distribution.
  • GAE components being integrated in the DPE and
    VDT distribution used in US-CMS.
  • Demonstrated distributed multi user GAE prototype
    at SC03
  • Ultimate goal The GAE backbone (Clarens)
    deployed on all tier-N, associated to different
    Clarens web servers will be (GAE) services that
    interface with CMS and LCG software, to enable
    physicists to perform analysis in a distributed
    environment.
  • PHEDEX deployed at Caltech, UFL, UCSD and
    transferring data
  • UFL submitting analysis jobs with CRAB

16
GAE Deployment
  • Prototype Completed Jan 14 _at_ Caltech-FL workshop
  • Now extending prototype functionality
  • Now involving physicists/early adopters
  • First round optimized data transfer (UAE
    milestone) coincides with CMS 10 data challenge
    in 2005
  • 4 types of testbeds
  • Developers testbed
  • Network testbed (see network talk) Ideally
    suited to test large scale data movement, and
    scalability of job submissions
  • OSG Integration Beta testbed
  • OSG Operations Alpha testbed


17
Services
(Provenance catalog) CERN
POOL catalog
PubDB
(Data catalog) Caltech/CERN
BOSS
(Job submission) CERN/Caltech/INFN
(Transfer catalog) CERN
TMDB
Refdb
(Data, prvenance catalog)CERN/caltech
Monte carlo processing service
FNAL/Caltech
(Monte carlog)FNAL
MOPDB
(Monte carlo production) FNAL
MCrunjob
UFL
Codesh
Sphinx
(Scheduling) UFL
(Monitoring) Caltech
MonaLisa
Storage resource management
SRM
Service discovery
GROSS
ACL management
Physics analysis job submission
VO management
on wish list to become a service or to
interoperate with this service
accessible through a web service
has javascript front end
File access
service being developed
Clarens core service
18
GAE Distributed Testing
19
Clarens Grid Portals
PDA
Job execution
Catalog service
Collaborative Analysis Destop
20
Software and Web Service Discovery Available
21
MonALISA Integration
Query repositories for monitor information
Gather and publish access patterns on collections
of data
Publish web services information for discovery in
other distribution systems
22
August 2004
Host 1
Querying for datasets
service
(2) Query for dataset
Grid scheduler
(2) Query for dataset
Host 2
(3) Submit orca/root job(s) with dataset(s) for
reconstruction/analysis
runjob
(1) Discover pool catalog, refdb, grid schedulers
Host 3
Host 6
Client
(2) Query for dataset
Host 4
(1) Discover pool catalog, refdb, grid schedulers
(2) Query for dataset
Host 7
Client code has no knowledge about location of
services, except for several urls for discovery
services
Multiple clients will query and submit jobs
23
SC04 November 2004
Scheduling Push Model
service
(1) Submit job(s) with dataset(s) for
reconstruction/analysis
Push model has limitations once the system
becomes resource limited
(3) Submit job(s)
Scheduler
(2) Query resource status
(2) Query resource status
(2) Query resource status
(2) Query resource status
Farm 4
Uniform job submission layer
BOSS
PBS
24
Client code and global manager have no knowledge
about location of services, except for several
urls for discovery services
November 2004
Similarity with other approaches (PEAC)
(1) Discover a global manager
(7) Submit job(s)
service
discover
client
(2) Request session (dataset)
(3) Discover catalog service
Multiple clients query and submit jobs
(4) Get list of farms that have this dataset
(6) Allocate time
(5) Reserve process time
(5) Reserve process time
(5) Reserve process time
(10) Alive signal during processing
(9) Data ready?
(7) Report access statistics to MonaLisa
(9) Data moved
(7) Move data to nodes
Job
(8) Create job
25
Peac test run with MonaLisa
26
Lessons learned
  • Quality of (the) service(s)
  • Lot of exception handling needed for robust
    services (gracefully failure of services)
  • Time outs are important
  • Need very good performance for composite services
  • Discovery service enables location independent
    service composition.
  • Semantics of services are important (different
    name, name space, and/or WSDL)
  • Web service design Not every application is
    developed with a web service interface in mind
  • Interfaces of 3rd party applications change
    Rapid Application Development
  • Social engineering
  • Finding out what people want/need
  • Overlapping functionality of applications (but
    not the same interfaces!)
  • Not one single solution for CMS
  • Not every problem has a technical solution,
    conventions are also important

27
(Future) Work
  • Integration of runjob into current deployment of
    services
  • Full chain of end to end analysis
  • Develop/deploy accounting service (ppdg
    activity?)
  • Steering service (NUST collaboration)
  • Autonomous replication
  • Trend analysis using monitor data
  • Improve exception handling
  • Integrate/interoperability mass storage (e.g.
    SRM) applications into/with Clarens environment
  • E2E error trapping and diagnosis cause and
    effect
  • Strategic Workflow re-planning
  • Adaptive steering and optimization algorithms
  • Multi user
  • Data movement using PHEDEX
  • Improved GUI interface (NUST Collaboration)
  • Core Clarens (NUST Collaboration)

28
Information http//ultralight.caltech.edu http//u
ltralight.caltech.edu/gaeweb/portal WIKI http//u
ltralight.caltech.edu/gaeweb/wiki/
Write a Comment
User Comments (0)
About PowerShow.com