Our Work at CERN - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Our Work at CERN

Description:

Our Work at CERN Gang CHEN, Yaodong CHENG Computing center, IHEP November 2, 2004 Outline Conclusion of CHEP04 computing fabric New developments of CASTOR Storage ... – PowerPoint PPT presentation

Number of Views:235
Avg rating:3.0/5.0
Slides: 45
Provided by: wuwen
Category:
Tags: cern | work

less

Transcript and Presenter's Notes

Title: Our Work at CERN


1
Our Work at CERN
  • Gang CHEN, Yaodong CHENG
  • Computing center, IHEP
  • November 2, 2004

2
Outline
  • Conclusion of CHEP04
  • computing fabric
  • New developments of CASTOR
  • Storage Resource Manager
  • Grid File System
  • GGF-WG, GFAL
  • ELFms
  • Quattor, Lemon, Leaf
  • Others
  • AFS, wireless network,Oracle,condor,SLC, InDiCo,
  • lyon visiting, CERN openday (Oct, 16)

3
CHEP04
4
Conclusion of CHEP04
  • CHEP04, from Sept. 26 to Oct. 1
  • Plenary conference in every morning
  • Seven parallel sessions at each afternoon
  • Online computing
  • Event processing
  • Core software
  • Distributed computing services
  • Distributed computing systems and experiences
  • Computing fabrics
  • Wide area network
  • Documents www.chep2004.org
  • Our Presentations (one talk each person)
  • two on Sep. 27 and one on Sep. 30

5
Computing fabrics
  • Computing nodes, disk servers, tape servers,
    network bandwidth at different HEP institutes
  • Fabrics at Tier0, Tier1 and Tier2
  • Installation, configuration, maintenance and
    management of large Linux farms
  • Grid Software installation
  • Monitoring of computing fabrics
  • OS choice move to RHES3/Scientific Linux
  • Storage observations

6
Storage stack
Expose to WAN
SRB
gfarm
SRM
StoRM
gfarm
SRB
NFS v2 v3
Chimera PNFS dCache
Expose to LAN
Lustre
StoRM
GoogleFS
PVFS
CASTOR
Local network
10Gb eth
Infiniband
1Gb eth
File Systems
GPFS
ext2/3
XFS
SAN FS
Disk Organisation
HW Raid 5
HW Raid 1
SW Raid 5
SW Raid 0
SATA array direct connect
Disks
FibreChannel/SATA SAN
EIDE/SATA in a box
iSCSI
Tape Store
dCache/TSM
CASTOR
ENSTORE
JASMine
HPSS
7
Storage observations
  • Castor and dCache are in full growth
  • Growing numbers of adopters outside the
    development sites
  • SRM supports all major managers
  • SRB at Belle (KEK)
  • Not always going for largest disks (capacity
    driver), already choosing smaller for performance
  • Key issue for LHC
  • Cluster file system comparisons
  • SW based solutions allow HW reuse

8
Architecture Choice
  • 64 bits are coming soon and HEP is not really
    ready for it!
  • Infiniband for HPC
  • Low latency
  • High bandwidth (gt700MB/s for CASTOR/RFIO)
  • Balance of CPU to disk resources
  • Security issues
  • Which servers exposed to users or WAN?
  • High performance data access and computing
    support
  • Gfarm file system (Japan)

9
New CASTOR developments
10
Castor Current Status
  • Usage at CERN
  • 370 disk servers, 50 stagers (disk pool managers)
  • 90 tapes drives, More than 3PB in total
  • Dev team (5), Operations team (4)
  • Associated problems
  • Management is more and more difficult
  • Performance
  • Scalability
  • I/O request scheduling
  • Optimal use of resource

11
Challenge for CASTOR
  • LHC is a big challenge
  • A single stager should scale up to handle peak
    rates of 500/1000 requests per second
  • Expected system configuration
  • 4PB of disk cache, 10 PB stored on tapes per year
  • Tens of millions of disk resident files
  • peak rate of 4GB/s from online
  • 10000 disks, 150 tape drives
  • Increase of small files
  • The current CASTOR stager cannot do it

12
Vision
  • With clusters of 100s of disk and tape servers,
    the automated storage management faces more and
    more the same problems as CPU clusters management
  • (Storage) Resource management
  • (Storage) Resource sharing
  • (Storage) Request access scheduling
  • Configuration
  • Monitoring
  • The stager is the main gateway to all resources
    managed by CASTOR

Vision Storage Resource Sharing Facility
13
Ideas behind new stager
  • Pluggable framework rather than total solution
  • True request scheduling third party schedulers,
    e.g. Maui or LSF
  • Policy attributes externalize policy engines
    governing the resource matchmaking. move toward
    full-fledged policy languages, GUILE
  • Restricted access to storage resources
  • All requests are scheduled
  • No random rfiod eating up the resources behind
    the back of the scheduling system
  • Database centric architecture
  • Stateless components all transactions and
    locking provided by the DB system
  • Allows for easy stop/restarting components
  • Facilitates development/debugging

14
New Stager Architecture
15
ArchitectureRequest handling scheduling
Fabric Authentication service e.g. Kerberos-V
server
RequestHandler
Request repository (Oracle, MySQL)
Scheduler
Thread pool
Scheduling Policies
Catalogue
Job Dispatcher
16
Security
  • Implementing strong authentication
  • (Encryption is not planned for the moment)
  • Developed a plugin system, based on the GSSAPI so
    as to use the mechanisms
  • GSI, KBR5
  • And support KBR4 for back compatibility
  • Modifying various CASTOR components to integrate
    the security layer
  • Impact on the config of machines (need for
    service keys etc)

17
Castor GUI Client
  • Prototype was developed by LIU aigui, on the
    platform of Kylix 3.
  • If possible, it will be downloadable on CASTOR
    web site.
  • Still exists many problems
  • Need to be optimized
  • Functionality and performance tests are very
    necessary

18
Storage Resource Manager
19
Introduction of SRM
  • SRMs are middleware components that manage shared
    storage resources on the Grid and provide
  • Uniform access to heterogeneous storage
  • Protocol negotiation
  • Dynamic Transfer URL allocation
  • Access to permanent and temporary types of
    storage
  • Advanced space and file reservation
  • Reliable transfer services
  • Storage resources refers to
  • DRM disk resource managers
  • TRM Tape resource managers
  • HRM Hierarchical resource manager

20
SRM Collaboration
  • Jefferson Lab
  • Bryan HessAndy KowalskiChip Watson
  • Fermilab
  • Don PetravickTimur Perelmutov
  • LBNL
  • Arie ShoshaniAlex SimJunmin Gu

EU DataGrid WP2 Peter KunsztHeinz
StockingerKurt StockingerErwin Laure EU
DataGrid  WP5 Jean-Philippe BaudStefano
OcchettiJens JensenEmil KnezoOwen Synge
21
SRM versions
  • Two SRM Interface specifications
  • SRM v1.1provides
  • Data access/transfer
  • Implicit space reservation
  • SRM v2.1 adds
  • Explicit space reservation
  • Namespace discovery and manipulation
  • Access permissions manipulation
  • Fermilab SRM implements SRM v1.1 specification
  • SRM v2.1 by the end of 2004
  • Reference http//sdm.lbl.gov/srm-wg

22
High Level View of SRM
Client USER/APPLICATIONS
Grid Middleware
SRM
SRM
SRM
SRM
SRM
Enstore
DCache
JASMine
CASTOR
23
Role of SRM on the GRID
24
Main Advantages of using SRM
  • Provides smooth synchronization between shared
    resources
  • Eliminates unnecessary burden from the client
  • Insulate them from storage systems failures
  • Transparently deal with network failures.
  • Enhance the efficiency of the grid, eliminating
    unnecessary file transfers by sharing files.
  • Provide a streaming model to the client

25
Grid File System
26
Introduction
  • There can be many hundreds of petabytes of data
    in grids, among which a very large percentage is
    stored in files
  • A standard mechanism to describe and organize
    file-based data is essential for facilitating
    access to this large amount of data.
  • GGF GFS-WG
  • GFAL- Grid File Access Library

27
GGF GFS-WG
  • Global Grid forum, Grid File System Working Group
  • Two goals (two documents)
  • File System Directory Services
  • Manage namespace for files, access control, and
    metadata management
  • Architecture for Grid File System Services
  • Provides functionality of virtual file systemin
    grid environment
  • Facilitates federation and sharing of virtualized
    data
  • Uses File System Directory Services and standard
    access protocols
  • They will be submitted in GGF13 and GGF14 (2005)

28
GFS view
  • Transparent access to dispersed file data in a
    Grid
  • POSIX I/O APIs
  • Applications can access Gfarm file system without
    any modification as if it is mounted at /gfs
  • Automatic and transparent replica selection for
    fault tolerance and access-concentration avoidance

Virtual Directory Tree
File system metadata
mapping
GRID File System
File replica creation
29
GFAL
  • Grid File Access Library
  • Grid storage interactions today require using
    several existing software components
  • the replica catalog services to locate valid
    replicas of files.
  • The SRM software to ensure files exist on disk
    or space is allocated on disk for new files
  • GFAL hides these interactions and presents a
    Posix interface for the I/O operations. The
    currently supported protocols are file for local
    access, dcap (dCache access protocol) and rfio
    (CASTOR access protocol).

30
Compile and Link
  • The function names are obtained by prepending
    gfal_ to the Posix names, for example gfal_open,
    gfal_read, gfal_close ... The argument lists and
    the values returned by the functions are
    identical.
  • The header file gfal_api.h needs to be included
    in the application source code
  • Linked with libGFAL.so
  • Security libraries libcgsi_plugin_gsoap_2.3,
    libglobus_gssapi_gsi_gcc32dbg and
    libglobus_gss_assist_gcc32dbg are used internally

31
Basic Design
Physics applications
GFAL
VFS
Posix I/O
SRM Client
Local File I/O
Root I/O Open() Read()
rfio I/O Open() Read()
dCap I/O Open() Read()
Replica Catolog Client
RC Services
SRM services
RFIO services
dCap services
Wide Area Access
MSS services
Local DISK
32
File system implementation
  • Two options have been considered to offer a File
    System view
  • the way to run standard applications without
    modifying the source and without re-linking
  • The Pluggable File System (PFS) built on top of
    Bypass and developed by University of Wisconsin
  • The Linux Userland File System (LUFS)
  • File system view /grid/vo/
  • CASTORfs based on LUFS
  • I developed it
  • Available
  • Low efficiency

33
Extremely Large Fabric management system
34
ELFms
  • ELFms Extremely Large Fabric management
    system
  • Sub Systems
  • QUATTOR system installation and configuration
    tool suite
  • LEMON monitoring framework
  • LEAF Hardware and State management

35
Deploy at CERN
  • ELFms manages and controls most of the nodes
    in the CERN CC
  • 2100 nodes out of 2400, to be scaled up to gt
    8000 in 2006-8 (LHC)
  • Multiple functionality and cluster size (batch
    nodes, disk servers, tape servers, DB, web, )
  • Heterogeneous hardware (CPU, memory, HD size,..)
  • Linux (RH) and Solaris (9)

36
Quattor
  • Quattor takes care of the configuration,
    installation and management of fabric nodes
  • A Configuration Database holds the desired
    state of all fabric elements
  • Node setup (CPU, HD, memory, software RPMs/PKGs,
    network, system services, location, audit info)
  • Cluster (name and type, batch system, load
    balancing info)
  • Defined in templates arranged in hierarchies
    common properties set only once
  • Autonomous management agents running on the node
    for
  • Base installation
  • Service (re-)configuration
  • Software installation and management
  • Quattor was developed in the scope of EU
    DataGrid. Development and maintenance now
    coordinated by CERN/IT

37
Quattor Architecture
  • Configuration Management
  • Configuration Database
  • Configuration access and caching
  • Graphical and Command
  • Line Interfaces
  • Node and Cluster Management
  • Automated node installation
  • Node Configuration Management
  • Software distribution and management

Node
Configuration Management
Node Management
38
LEMON
  • Monitoring sensors and agent
  • Large amount of metrics ( 10 sensors
    implementing 150 metrics)
  • Plug-in architecture new sensors and metrics can
    easily be added
  • Asynchronous push/pull protocol between sensors
    and agent
  • Available for Linux and Solaris
  • Repository
  • Data insertion via TCP or UDP
  • Data retrieval via SOAP
  • Backend implementations for text file and Oracle
    SQL
  • Keeps current and historical samples no aging
    out of data but archiving on TSM and CASTOR
  • Correlation Engines and self-healing Fault
    Recovery
  • allows plug-in correlations accessing collected
    metrics and external information (eg. quattor
    CDB, LSF), and also launch configured recovery
    actions
  • Eg. average number of users on LXPLUS, total
    number of active LCG batch nodes
  • Eg. cleaning up /tmp if occupancy gt x , restart
    daemon D if dead,
  • LEMON is an EDG development now maintained by
    CERN/IT

39
LEMON Architecture
  • LEMON stands for LHC Era Monitoring

40
LEAF -LHC Era Automated Fabric)
  • Collection of workflows for automated node
    hardware and state management
  • HMS (Hardware Management System)
  • eg. installation, moves, vendor calls, retirement
  • Automatically requests installs, retires etc. to
    technicians
  • GUI to locate equipment physically
  • SMS (State Management System)
  • Automated handling high-level configuration
    steps, eg.
  • Reconfigure, reboot,Reallocate nodes,reconfig
  • extensible framework plug-ins for site-specific
    operations possible
  • Issues all necessary (re)configuration commands
    on top of quattor CDB and NCM
  • HMS and SMS interface to Quattor and LEMON for
    setting/getting node information respectively

41
LEAF screenshot
42
Other Activities
  • AFS
  • AFS documents download
  • AFS DB servers configuration
  • Wireless network deployment
  • Oracle license for LCG
  • Condor deployment at some HEP institutes
  • SLC Scientific Linux CERN version
  • lyon visiting (Oct. 27, CHEN gang)
  • CERN OpenDay (Oct, 16)

43
(No Transcript)
44
Thank you!!
Write a Comment
User Comments (0)
About PowerShow.com