Title: The NorduGrid project: Using Globus toolkit for building Grid infrastructure presented by Aleksandr Konstantinov
1The NorduGrid projectUsing Globus toolkit for
building Grid infrastructurepresented by
Aleksandr Konstantinov
Mattias Ellert Aleksandr Konstantinov Balázs
Kónya Oxana Smirnova Anders Wäänänen
2Introduction
- Launched in spring 2001, with the aim of creating
a Grid infrastructure in the Nordic countries. - Partners from Denmark, Norway, Sweden, and
Finland. - Powered mainly by ATLAS groups (Lund, Copenhagen,
Stockholm, Uppsala, Oslo). - Relatively short term project - ends in October
2002. - Relies on very limited human resources (3
full-time researchers, few part-time ones) with
funding from NorduNet2. - More info http//www.nordugrid.org/
3Introduction (cont.)
- The purpose of the project is to create and
operate functional testbed. - Use approved tools gt Globus ToolkitTM (developed
at Argonne National Laboratory and University of
Southern California) and tools developed at
European Data Grid project. - Aim at High Energy Physics applications - take
into account while choosing what to implement
first. - No temporary solutions (it is better not to
implement something, than to be forced to provide
backward compatibility for limited solution).
4Globus ToolkitTM evaluation
- Widely accepted de-facto standard for Grid
computing. - Provides collection of (mostly) robust protocols,
libraries and low-level services. - Security built-in.
- Continuously evolving (??).
- Missing few important high-level services
- grid-level scheduler
- job data stagein/stageout
- user-friendly grid entry points (simple
user-interface, web portals, etc.) - grid-level authorization system
- grid-level accounting and quotas
5NorduGrid requirements
- No single point of failure
- No central sandbox (unlike EDG)
- Lightweight brokering integrated into User
Interface - Job should not be Computing Element (cluster)
specific - Non grid-aware jobs allowed ("grid functionality"
is provided by middleware on Computing Element) - Job runs in as restrictive environment as
possible (do not expect network on computing
nodes) - Minimal environment is provided on Computing
Element - Adequate and full (enough) information provided
by InfoSystem - Natural computing unit is cluster
- Queue, job and user information
6NorduGrid architecture
7Information System
- NorduGrid operates an MDS based, hierarchically
distributed Information System - new information model for clusters, queues, jobs,
users, SE, RC - efficient providers
- all the job monitoring, resource discovery,
status monitoring and brokering are exclusively
built on top of the MDS - MDS hierarchy with dynamic site registrations
8Information System(example)
cluster entry
job entry
user entry
queue entry
9Information System (hierarchy)
10Information System (interfaces)
11Grid Manager - cluster middleware
- Provide job control and data handling
functionality (HEP applications requirements are
first priority). - The Grid Manager is based on Globus ToolkitTM
libraries and services. The following parts of
Globus are used - GridFTP - fast and reliable data access for Grid
- GASS Copy interface - support for different data
access protocols - Replica Catalog - metadata storage
- GRAM - resource request
- RSL - expandable Resource Specification Language
12Grid Manager (features)
- Stage in input data and executables. Possible
sources - Job submission machine.
- GridFTP (preferred), FTP, HTTP or HTTPS servers.
- Files registered in Globus Replica Catalog.
Secure authentication. Destination is chosen
automatically or can be forced. - Stage out output data. Possible destinations
- Keep on cluster till user downloads.
- GridFTP, FTP, HTTP or HTTPS servers.
- Files can be registered in Globus Replica
Catalog. Destination and protocol are obtained
from Location information.
13Grid Manager (features)
- E-mail notification of job status changes.
- Support for software runtime environment
configuration. - Jobs will be started with environment setup
properly for requested application - Customizable GridFTP server
- local access through plugins
- certificate oriented local file system access
plugin - job submission/access plugin - start job/upload
input files/download output files through the
same interface - Limitation Data is handled only at that
beginning and end of the job. User must provide
information about input and output data.
14Extensions to RSL (evaluation)
- RSL stands for Resource Specification Language.
Introduced to communicate job requirements to the
Global Resource Allocation Manager (GRAM). - Useful features
- Allows basic logical expressions
- Set of attributes is expandable
- Unknown attributes are passed through.
- Allows different parts to be processed at
different levels. - Can be used to assist in writing brokers or
filters which refine an RSL specification
15Extensions to RSL (new attributes)
- To support additional features new attributes
introduced. The most important are - inputFiles(ltfilegt ltlocationgt) ... - list of
files to be transferred to the computing node
from a given location. - outputFiles(ltfilegt ltlocationgt) ... -list of
files to be preserved after the job completion
and transferred to a given location. - executablesltfile1gt ltfile2gt ... -list of files
to be given executable permissions. - notifyltoptionsgt ltemailgt ... -E-mail notification
on job status change.
16Extensions to RSL (new attributes)
- runTimeEnvironmentltstringgt... -
application-specific runtime environment (e.g.,
ATLAS-3.2.1) - middlewareltstringgt -required middleware (e.g.,
NorduGrid-0.3.0) - clusterltstringgt -specific cluster request
- rerunltnumbergt -number of attempts to re-run the
job - lifeTimeltnumbergt -maximum time for the session
directory to remain on the execution node (can
not override local policy) - ftpThreadsltnumbergt -number of GridFTP threads
to be used for file transfers
17User Interface
- The NorduGrid toolkit user interface consists of
a set of commands that can be executed from the
command line - ngsub - for job submission
- ngstat - to obtain the status of jobs and
clusters - ngcat - to display the stdout or stderr of a
running job - ngget - to retrieve the result from a finished
job - ngkill - to kill a running job
- ngclean - to delete a job from a remote cluster
- ngsync - to recreate local information about jobs
18User Interface
- Job request is done through xRSL
- processes user-level xRSL request and transforms
to one suitable for GM - user-friendly values for some attributes
- conditional submission and xRSL transformation
- Performs brokering
- analyzes information about the different clusters
obtained from the MDS servers - from all suitable queues one is chosen randomly,
with a weight proportional to the amount of free
computing resources - Passes modified job request to GM through GRAM or
GridFTP interface and uploads input files.
19User Authentication Management
- Using Globus certificates
- NorduGrid Certification Authority established
- Access control through gridmapfiles
- User access control is delegated to Virtual
Organization managers - Gridmapfiles are generated automatically from VO
database - GSI enabled secure LDAP server
- contains the Subject Names of the user's
certificates - VO managers
- User Groups and Group Managers
- Local site adminisrators have total control over
their gridmapfiles
20Applications
- It is possible to run any application with
predefined set of input and output data - From as simple as "Hello World"
- ngsub '(executable/bin/echo)(arguments"Hello
World")(stdoutout.txt)'
21Applications (cont.)
- to as difficult as Atlas Data Challenge
- ngsub '(executable prod)(arguments "0002"
"2" "100") - (stdout atlas.0002.log)(join yes)
- (replicacollection ldap//grid.uio.no/lcATLAS,r
cNorduGrid,dcnordugrid, dcorg) - (inputfiles
- ("atlsim.makefile" "")
- ("atlas.kumac" "")
- ("gen0017_1.root" "rc///gen0017_1.root") )
- (outputfiles
- ("atlas.0002.zebra" "rc///results/atlas.0002.zeb
ra") - ("atlas.0002.his" "") )
- (runtimeenvironment"ATLAS-3.2.0")
- (middleware"NorduGrid")'
22Conclusions
- The minimal environment for Grid computing is
established. - Globus tools alone are not enough for convenient
usage, but provide solid base. - Additional layer of tools/services were developed
to provide required infrastructure. - A lot of things to do
- Runtime data handling.
- Accounting.
- Better support for different LRMS.
- Enhanced Information System - more stability,
access control, better and richer information
providers etc. - ...