Science Gateways: Progress Using the Clarens Toolkit on TeraGrid - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

Science Gateways: Progress Using the Clarens Toolkit on TeraGrid

Description:

... a G p S s t F T O d f e n n* G ... de0{walT |V zn 4 Sc |QF r o lAdff }m ckm ]O ma ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 28
Provided by: julia5
Category:

less

Transcript and Presenter's Notes

Title: Science Gateways: Progress Using the Clarens Toolkit on TeraGrid


1
Science Gateways Progress Using the Clarens
Toolkit on TeraGrid
  • Julian Bunn
  • Conrad Steenberg, Faisal Kahn, Iosif Legrand,
    Harvey Newman, Michael Thomas and Frank van
    Lingen
  • California Institute of Technology (Caltech)
  • June 2007

2
CERNs Large Hadron Collider
5000 Physicists/Engineers 300 Institutes
70 Countries
CMS
TOTEM
pp, general purpose
pp, general purpose
Atlas
ALICE Heavy Ions
LHCb B-physics
27 km Tunnel under Switzerland France
3
Complexity of LHC Events Higgs decay
4
Black Hole Hunters at the LHC
5
LHC Data Grid Hierarchy
Online custom fast electronics, pipelined,
buffered, FPGAs
PByte/sec
gt10 Tier1 and 150 Tier2 Centers
150-1500 MBytes/sec
Online System
Experiment
CERN Center PBs of Disk Tape Robot
Tier 0 1
Tier 1
10 - 40 Gbps
FNAL Center
IN2P3 Center
INFN Center
RAL Center
10 Gbps
Tier 2
1-10 Gbps
Tier 3
Physicists work on analysis channels Each
institute has 10 physicists working on one or
more channels
Institute
Institute
Institute
Institute
Physics data cache
1 to 10 Gbps
Tens of Petabytes by 2008An Exabyte 5-7 Years
later100 Gbps Data Networks
Workstations/Laptops
Tier 4
6
US Grids for LHC Physics
  • Exploit synergy and location between US-LHC, OSG
    and TeraGrid sites.
  • Example Tier2 sourced analysis relocation to
    TeraGrid, involving high speed transfers of
    cached data over TeraGrid backbone

Open Science Grid
7
LHC Data Analysis Essential Components
  • Data Processing All data needs to be
    reconstructed, first into fundamental components
    like tracks and energy deposition and then into
    physics objects like electrons, muons, hadrons,
    neutrinos, etc.
  • Raw -gt Reconstructed -gt Summarized
  • Simulation, same path. Critical to understanding
    detectors and underlying physics.
  • Data Discovery We must be able to locate events
    of interest (Databases)
  • Data Movement We must be able to move discovered
    data as needed for analysis or reprocessing
    (Networks)
  • Data Analysis We must be able to apply our
    analysis to the reconstructed data
  • Collaborative Tools Vital to sustaining global
    collaborations
  • Policy and Resource Management We must be able
    to share, manage and prioritise in a resource
    scarce environment

8
Grid-Enabled Analysis for LHC Experiments
  • The Acid Test for Grids Crucial for LHC
    experiments
  • Large, Diverse, Distributed Community of users
  • Support for tens of thousands of analysis and
    production tasks, shared among 100 sites
  • Operates in a (compute, storage and network)
    resource-limited and policy-constrained
    environment
  • Dominated by collaboration policy and strategy
  • System efficiency and match to policy depends on
    agile, intelligent data placement
  • High speed data transport, managed network use
    are vital
  • Requires a global intelligent system to adapt
    to dynamicoperating conditions to hide and
    manage complexity
  • Autonomous agents for real-time monitoring,
    end-to-end tracking
  • A system-wide view for right decisions in
    difficult situations
  • Still Need a Simple User-View The Clarens Portal

9
GAE Architecture, Clarens Portal to the Grid
ROOT, CMSSW, IGUANA, IE, Firefox
Analysis Client
Analysis Client
Analysis Client
  • Analysis Clients talk standard protocols to the
    Clarens data/services Portal
  • The Clarens Portal hides the complexity of the
    Grid
  • Simple Web service API allows simple or complex
    clients to benefit from this architecture
    discover access distrib. services
  • Typical clients ROOT, CMSSW, Web browser
  • Key features Global Scheduler, Catalogs,
    Monitoring, and Grid-wide Execution service.

HTTP, SOAP, XML-RPC
Grid Services Web Server
Clarens
Scheduler
Catalogs
Fully- Abstract Planner
Metadata
ROOT
Partially- Abstract Planner
MonALISA
Virtual Data
CMSSW
Applications
Data Management
Monitoring
Replica
Fully- Concrete Planner
Grid
Execution Priority Manager
Condor
Grid Wide Execution Service
10
Clarens the GAE Toolkit
JavaScript (AJAX), Java, Python, Root (analysis
framework)
XML-RPC SOAP Java RMI JSON/RPC
http/ https
The Clarens Toolkit Supports Authentication
User certificates and proxies Access control to
Web Services, remote files Discovery - Web
Services and Software Shell Remote shells with
ACL Virtual Organization membership, management
and roles Services Potentially infinite
3rd party application
11
ROOT Analysis Framework for HEP
A Turbo, Nutter, All Singing and Dancing
Analysis Framework, Designed for Particle
Physicists
12
ROOTlets
  • Basic approach
  • Allow users to execute ROOT analysis code in a
    sandbox. Servlet ROOTlet
  • Many instances of ROOT can run on cluster
  • Service container provides
  • Input parameter vetting
  • Access control/user mapping
  • Logging
  • Job control
  • Loosely coupled components
  • ROOT client
  • Compare with PROOF, which is tightly coupled
  • One or more service hosts with vanilla ROOT
    installed
  • Service host may optionally be cluster head node
  • ROOTlets run either as ordinary processes on
    service host or as batch job on cluster node
  • Rootlet service adds value beyond simple remote
    job submission
  • Monitoring of running jobs
  • Allows file up/download to job sandbox
  • Multiple clients ROOT itself, browser, scripts

13
ROOTlets Operationally
TierN
TeraGrid
Analysis.C, Analysis.h
?
?
?
Clarens Plugin
XML/RPC
?
?
  • ?Physicist at TierN using ROOT on GBytes of
    ntuples
  • ?Loads Clarens ROOT plugin. Connects to Clarens.
    Sends analysis code (.C/.h files)
  • ?Clarens creates ROOTlet at TeraGrid site, sends
    .C/.h files
  • ?ROOTlet executes analysis code on TBytes of
    data, creats high statistics output file.
  • ?ROOT client at Tier3 fetches and plots data

10s TBytes
14
ROOTlets Demonstrated
15
ROOTlets from Clarens Webstart
Root/Rootlet Mass Plots
Clarens Services
ROOTlets service being used to analyze data, with
results shown in a Java-based Grid Browser
16
ROOTlets Demonstration at SC'06
  • Two Parts
  • Local integration ROOT script (.C)
  • Remote analysis ROOT script (.C/.h)
  • Rootlet services running on show floor, Caltech,
    and in Brazil
  • Move to TeraGrid at SDSC after SC06
  • Root analysis script submitted to Rootlet service
  • Runs on Clarens server machine or as batch job
    using batch scheduler
  • Code moved to data
  • Analysis results in the form of a Root histogram
    file on server
  • Results streamed to Root client as they become
    available, or available for later retrieval
  • Continuously running demo to exercise all parts
    of the system

17
The Pythia Portal
  • Pythia Portal allows particle physicists to use
    the Pythia particle generation software.
  • No local software is needed. JSON (JavaScript
    Object Notation) for browser/service calls.
    Powerful (TeraGrid) backend computers can handle
    typical lengthy simulations.

Grid Certificate
Remote File Access
Remote Execution
18
NaradaBrokering Beyond Services
  • Pure web services approach has some inherent
    drawbacks
  • It is a request response system, no way for
    server to contact client unless polled
  • One-to-one communication only, no way to
    broadcast to multiple clients, unless polled
  • Started collaboration with Indiana University
    messaging group
  • Scalable, Open Source messaging system called
    NaradaBrokering http//www.naradabrokering.org/
  • Provides asynchronous message service bus with
    multiple transport protocols TCP (blocking,
    non-blocking), UDP, Multicast, SSL, HTTP and
    HTTPS, Parallel TCP Streams
  • Allows various components in ROOTlet system to
    communicate without continuous polling
  • Publish/subscribe system with hierarchically
    arranged topics
  • Project funded through DOE STTR Research Grant
  • Phase I complete, applied for phase II

19
Monitoring
End To End Monitoring and Global views (MonALISA)
20
Control and Automation
Automatic Re-Routing of Data Transfers in the
Grid
gtbbcopy A/fileX B/path/ OS path
available Configuring interfaces Starting Data
Transfer
Internet
MonALISA Distributed Service System
APPLICATION
DATA
MonALISA Service
Monitor Control
A
B
OS Agent
LISA Agent
TL1
Optical Switch
Detects errors and automatically recreates the
path in less than the TCP timeout (lt1second)
Active light path
Fast Data Transfer - http//monalisa.cern.ch/FDT
/
21
Collaboration Tools
Virtual collaboration and data sharing (VRVS
EVO toolsets)
On demand audio/video conferencing with shared
desktops etc. VRVS in daily use by many
communities in particle physics and other
sciences. New system EVO fully distributed and
features support for many different clients,
protocols and devices.
22
GAE Advanced DesktopMonALISA, EVO, CMSSW, ROOT
Collaborate
Monitor
Analyze
Operate
Discover
  • GAE Users continue to work with familiar tools
  • Integration of Collaborative Tools
  • Automated Service Discovery
  • Remote Dataset Discovery, Movement, Processing
  • Monitoring and Control of Analysis, Data Motion
  • Graduated Access Control and Security
  • Smooth transition from small to large tasks

23
National Virtual Observatory Gateways on
TeraGrid
  • Choice of Clarens Toolkit inspired by
  • Graduated security
  • Anonymous, Registered, Known
  • Multiple interfaces
  • Fat browser, Web proxy, Scripted
  • Services, Workflows, Compatibility
  • Services be discovered by VO registry
  • Services can easily be Astrogrid services
  • Can be used as part of Astrogrid workflow
  • Multiple implementations
  • cacr.caltech.edu and sdsc.teragrid.org
  • Teragrid acceptance of security model
  • Server runs a job as somebody else
  • Anonymous access to TeraGrid!!
  • Some useful services
  • Mosaic, Cutouts, Synoptic coaddition
  • Mashup of VO SIAP services

24
Graduated Security HotGrid Model
From this
To this
power user
and now do some science....
Write proposal
big-ironcomputing
Learn Globus Learn MPI Learn PBS Port code to
Itanium Get certificate Get logged in Wait 3
months for account Write proposal
Strong certificate
morescience....
HotGrid
somescience....
Web form
  • Graduated Certificate Access
  • Anonymous (None) 15 CPU minutes from community
    account
  • HotGrid (Weak) 1 CPU Hour in exchange for
    registering name and valid email address
  • Strong Own certificate obtained from TeraGrid HQ
  • Power NRAC Proposal allocation etc.

25
Fat Browser Portal
NESSSI NVO Extensible Scalable Secure Service
Infrastructure
certificate policies
node
Certificate Authority
select user account
node
load certificate
Clarens/NESSSI
queue
Browser
node
certificate
node
JSON-RPC
sandbox storage
open http
TeraGrid cluster
26
NESSSI Mosaicking Portal for NVO
27
Summary
  • LHC Data Analysis starting up late 2007.
  • Grid Analysis Environment for LHC physicists.
  • Deploying Clarens-based Analysis services on
    TeraGrid
  • Exploiting Synergy/Location with US/LHC and OSG
  • Services predicated on support for novice through
    power users and applications
  • Graduated security with graduated resource access
  • Integration with NaradaBrokering
  • Viability of generic approach proven adoption by
    NVO and deployment of NESSSI services for
    Astronomy
  • Hoping to support new Clarens-portals for
    Seismology on TeraGrid
Write a Comment
User Comments (0)
About PowerShow.com