Our Work at CERN - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Our Work at CERN

Description:

Our Work at CERN Gang CHEN, Yaodong CHENG Computing center, IHEP November 2, 2004 Outline Conclusion of CHEP04 computing fabric New developments of CASTOR Storage ... – PowerPoint PPT presentation

Number of Views:235

Avg rating:3.0/5.0

Slides: 45

Provided by: wuwen

Category:

Tags: cern | work

more less

Transcript and Presenter's Notes

Title: Our Work at CERN

1
Our Work at CERN

Gang CHEN, Yaodong CHENG
Computing center, IHEP
November 2, 2004

2
Outline

Conclusion of CHEP04
computing fabric
New developments of CASTOR
Storage Resource Manager
Grid File System
GGF-WG, GFAL
ELFms
Quattor, Lemon, Leaf
Others
AFS, wireless network,Oracle,condor,SLC, InDiCo,
lyon visiting, CERN openday (Oct, 16)

3
CHEP04
4
Conclusion of CHEP04

CHEP04, from Sept. 26 to Oct. 1
Plenary conference in every morning
Seven parallel sessions at each afternoon
Online computing
Event processing
Core software
Distributed computing services
Distributed computing systems and experiences
Computing fabrics
Wide area network
Documents www.chep2004.org
Our Presentations (one talk each person)
two on Sep. 27 and one on Sep. 30

5
Computing fabrics

Computing nodes, disk servers, tape servers,
network bandwidth at different HEP institutes
Fabrics at Tier0, Tier1 and Tier2
Installation, configuration, maintenance and
management of large Linux farms
Grid Software installation
Monitoring of computing fabrics
OS choice move to RHES3/Scientific Linux
Storage observations

6
Storage stack
Expose to WAN
SRB
gfarm
SRM
StoRM
gfarm
SRB
NFS v2 v3
Chimera PNFS dCache
Expose to LAN
Lustre
StoRM
GoogleFS
PVFS
CASTOR
Local network
10Gb eth
Infiniband
1Gb eth
File Systems
GPFS
ext2/3
XFS
SAN FS
Disk Organisation
HW Raid 5
HW Raid 1
SW Raid 5
SW Raid 0
SATA array direct connect
Disks
FibreChannel/SATA SAN
EIDE/SATA in a box
iSCSI
Tape Store
dCache/TSM
CASTOR
ENSTORE
JASMine
HPSS
7
Storage observations

Castor and dCache are in full growth
Growing numbers of adopters outside the
development sites
SRM supports all major managers
SRB at Belle (KEK)
Not always going for largest disks (capacity
driver), already choosing smaller for performance
Key issue for LHC
Cluster file system comparisons
SW based solutions allow HW reuse

8
Architecture Choice

64 bits are coming soon and HEP is not really
ready for it!
Infiniband for HPC
Low latency
High bandwidth (gt700MB/s for CASTOR/RFIO)
Balance of CPU to disk resources
Security issues
Which servers exposed to users or WAN?
High performance data access and computing
support
Gfarm file system (Japan)

9
New CASTOR developments
10
Castor Current Status

Usage at CERN
370 disk servers, 50 stagers (disk pool managers)
90 tapes drives, More than 3PB in total
Dev team (5), Operations team (4)
Associated problems
Management is more and more difficult
Performance
Scalability
I/O request scheduling
Optimal use of resource

11
Challenge for CASTOR

LHC is a big challenge
A single stager should scale up to handle peak
rates of 500/1000 requests per second
Expected system configuration
4PB of disk cache, 10 PB stored on tapes per year
Tens of millions of disk resident files
peak rate of 4GB/s from online
10000 disks, 150 tape drives
Increase of small files
The current CASTOR stager cannot do it

12
Vision

With clusters of 100s of disk and tape servers,
the automated storage management faces more and
more the same problems as CPU clusters management
(Storage) Resource management
(Storage) Resource sharing
(Storage) Request access scheduling
Configuration
Monitoring
The stager is the main gateway to all resources
managed by CASTOR

Vision Storage Resource Sharing Facility
13
Ideas behind new stager

Pluggable framework rather than total solution
True request scheduling third party schedulers,
e.g. Maui or LSF
Policy attributes externalize policy engines
governing the resource matchmaking. move toward
full-fledged policy languages, GUILE
Restricted access to storage resources
All requests are scheduled
No random rfiod eating up the resources behind
the back of the scheduling system
Database centric architecture
Stateless components all transactions and
locking provided by the DB system
Allows for easy stop/restarting components
Facilitates development/debugging

14
New Stager Architecture
15
ArchitectureRequest handling scheduling
Fabric Authentication service e.g. Kerberos-V
server
RequestHandler
Request repository (Oracle, MySQL)
Scheduler
Thread pool
Scheduling Policies
Catalogue
Job Dispatcher
16
Security

Implementing strong authentication
(Encryption is not planned for the moment)
Developed a plugin system, based on the GSSAPI so
as to use the mechanisms
GSI, KBR5
And support KBR4 for back compatibility
Modifying various CASTOR components to integrate
the security layer
Impact on the config of machines (need for
service keys etc)

17
Castor GUI Client

Prototype was developed by LIU aigui, on the
platform of Kylix 3.
If possible, it will be downloadable on CASTOR
web site.
Still exists many problems
Need to be optimized
Functionality and performance tests are very
necessary

18
Storage Resource Manager
19
Introduction of SRM

SRMs are middleware components that manage shared
storage resources on the Grid and provide
Uniform access to heterogeneous storage
Protocol negotiation
Dynamic Transfer URL allocation
Access to permanent and temporary types of
storage
Advanced space and file reservation
Reliable transfer services
Storage resources refers to
DRM disk resource managers
TRM Tape resource managers
HRM Hierarchical resource manager

20
SRM Collaboration

Jefferson Lab
Bryan HessAndy KowalskiChip Watson
Fermilab
Don PetravickTimur Perelmutov
LBNL
Arie ShoshaniAlex SimJunmin Gu

EU DataGrid WP2 Peter KunsztHeinz
StockingerKurt StockingerErwin Laure EU
DataGrid WP5 Jean-Philippe BaudStefano
OcchettiJens JensenEmil KnezoOwen Synge
21
SRM versions

Two SRM Interface specifications
SRM v1.1provides
Data access/transfer
Implicit space reservation
SRM v2.1 adds
Explicit space reservation
Namespace discovery and manipulation
Access permissions manipulation
Fermilab SRM implements SRM v1.1 specification
SRM v2.1 by the end of 2004
Reference http//sdm.lbl.gov/srm-wg

22
High Level View of SRM
Client USER/APPLICATIONS
Grid Middleware
SRM
SRM
SRM
SRM
SRM
Enstore
DCache
JASMine
CASTOR
23
Role of SRM on the GRID
24
Main Advantages of using SRM

Provides smooth synchronization between shared
resources
Eliminates unnecessary burden from the client
Insulate them from storage systems failures
Transparently deal with network failures.
Enhance the efficiency of the grid, eliminating
unnecessary file transfers by sharing files.
Provide a streaming model to the client

25
Grid File System
26
Introduction

There can be many hundreds of petabytes of data
in grids, among which a very large percentage is
stored in files
A standard mechanism to describe and organize
file-based data is essential for facilitating
access to this large amount of data.
GGF GFS-WG
GFAL- Grid File Access Library

27
GGF GFS-WG

Global Grid forum, Grid File System Working Group
Two goals (two documents)
File System Directory Services
Manage namespace for files, access control, and
metadata management
Architecture for Grid File System Services
Provides functionality of virtual file systemin
grid environment
Facilitates federation and sharing of virtualized
data
Uses File System Directory Services and standard
access protocols
They will be submitted in GGF13 and GGF14 (2005)

28
GFS view

Transparent access to dispersed file data in a
Grid
POSIX I/O APIs
Applications can access Gfarm file system without
any modification as if it is mounted at /gfs
Automatic and transparent replica selection for
fault tolerance and access-concentration avoidance

Virtual Directory Tree
File system metadata
mapping
GRID File System
File replica creation
29
GFAL

Grid File Access Library
Grid storage interactions today require using
several existing software components
the replica catalog services to locate valid
replicas of files.
The SRM software to ensure files exist on disk
or space is allocated on disk for new files
GFAL hides these interactions and presents a
Posix interface for the I/O operations. The
currently supported protocols are file for local
access, dcap (dCache access protocol) and rfio
(CASTOR access protocol).

30
Compile and Link

The function names are obtained by prepending
gfal_ to the Posix names, for example gfal_open,
gfal_read, gfal_close ... The argument lists and
the values returned by the functions are
identical.
The header file gfal_api.h needs to be included
in the application source code
Linked with libGFAL.so
Security libraries libcgsi_plugin_gsoap_2.3,
libglobus_gssapi_gsi_gcc32dbg and
libglobus_gss_assist_gcc32dbg are used internally

31
Basic Design
Physics applications
GFAL
VFS
Posix I/O
SRM Client
Local File I/O
Root I/O Open() Read()
rfio I/O Open() Read()
dCap I/O Open() Read()
Replica Catolog Client
RC Services
SRM services
RFIO services
dCap services
Wide Area Access
MSS services
Local DISK
32
File system implementation

Two options have been considered to offer a File
System view
the way to run standard applications without
modifying the source and without re-linking
The Pluggable File System (PFS) built on top of
Bypass and developed by University of Wisconsin
The Linux Userland File System (LUFS)
File system view /grid/vo/
CASTORfs based on LUFS
I developed it
Available
Low efficiency

33
Extremely Large Fabric management system
34
ELFms

ELFms Extremely Large Fabric management
system
Sub Systems
QUATTOR system installation and configuration
tool suite
LEMON monitoring framework
LEAF Hardware and State management

35
Deploy at CERN

ELFms manages and controls most of the nodes
in the CERN CC
2100 nodes out of 2400, to be scaled up to gt
8000 in 2006-8 (LHC)
Multiple functionality and cluster size (batch
nodes, disk servers, tape servers, DB, web, )
Heterogeneous hardware (CPU, memory, HD size,..)
Linux (RH) and Solaris (9)

36
Quattor

Quattor takes care of the configuration,
installation and management of fabric nodes
A Configuration Database holds the desired
state of all fabric elements
Node setup (CPU, HD, memory, software RPMs/PKGs,
network, system services, location, audit info)
Cluster (name and type, batch system, load
balancing info)
Defined in templates arranged in hierarchies
common properties set only once
Autonomous management agents running on the node
for
Base installation
Service (re-)configuration
Software installation and management
Quattor was developed in the scope of EU
DataGrid. Development and maintenance now
coordinated by CERN/IT

37
Quattor Architecture

Configuration Management
Configuration Database
Configuration access and caching
Graphical and Command
Line Interfaces
Node and Cluster Management
Automated node installation
Node Configuration Management
Software distribution and management

Node
Configuration Management
Node Management
38
LEMON

Monitoring sensors and agent
Large amount of metrics ( 10 sensors
implementing 150 metrics)
Plug-in architecture new sensors and metrics can
easily be added
Asynchronous push/pull protocol between sensors
and agent
Available for Linux and Solaris
Repository
Data insertion via TCP or UDP
Data retrieval via SOAP
Backend implementations for text file and Oracle
SQL
Keeps current and historical samples no aging
out of data but archiving on TSM and CASTOR
Correlation Engines and self-healing Fault
Recovery
allows plug-in correlations accessing collected
metrics and external information (eg. quattor
CDB, LSF), and also launch configured recovery
actions
Eg. average number of users on LXPLUS, total
number of active LCG batch nodes
Eg. cleaning up /tmp if occupancy gt x , restart
daemon D if dead,
LEMON is an EDG development now maintained by
CERN/IT

39
LEMON Architecture

LEMON stands for LHC Era Monitoring

40
LEAF -LHC Era Automated Fabric)

Collection of workflows for automated node
hardware and state management
HMS (Hardware Management System)
eg. installation, moves, vendor calls, retirement
Automatically requests installs, retires etc. to
technicians
GUI to locate equipment physically
SMS (State Management System)
Automated handling high-level configuration
steps, eg.
Reconfigure, reboot,Reallocate nodes,reconfig
extensible framework plug-ins for site-specific
operations possible
Issues all necessary (re)configuration commands
on top of quattor CDB and NCM
HMS and SMS interface to Quattor and LEMON for
setting/getting node information respectively