The Mass Storage Challenge for the Experiments at LHC: the Solution Developed at INFNCNAF

About This Presentation

Title:

The Mass Storage Challenge for the Experiments at LHC: the Solution Developed at INFNCNAF

Description:

for Large Hadron Collider (LHC) experiments at CERN: Alice, Atlas, CMS, LHCB ... Computing at Large Hadron Collider is based on multi-tiers models ... – PowerPoint PPT presentation

Number of Views:49

Avg rating:3.0/5.0

Slides: 22

Provided by: elisabetta8

Category:

more less

Transcript and Presenter's Notes

Title: The Mass Storage Challenge for the Experiments at LHC: the Solution Developed at INFNCNAF

1
The Mass Storage Challenge for the Experiments at
LHC the Solution Developed at INFN-CNAF
IEEE NSS-MIC, Orlando 28 Oct 2009
A. Cavalli1, S. Dal Pra1, L. dellAgnello1, A.
Forti1, D. Gregori1, L. Li Gioi1, B. Martelli1,
A. Prosperini1, P. P. Ricci1, Elisabetta
Ronchieri1, V. Sapunenko1, V. Vagnoni2, R.
Zappi1 1 INFN-CNAF, Bologna, Italy 2 INFN,
Sezione di Bologna, Bologna, Italy
2
Overview

INFN-CNAF
Mass Storage Challenge
SRM layer
Solution I
StoRM with GPFS
Results
Solution II
StoRM with GPFS and TSM
Results
Conclusions

3
Overview

INFN-CNAF
Mass Storage Challenge
SRM layer
Solution I
StoRM with GPFS
Results
Solution II
StoRM with GPFS and TSM
Results
Conclusions

4
INFN CNAF computing center

CNAF is the main INFN computing resource center
for Large Hadron Collider (LHC) experiments at
CERN Alice, Atlas, CMS, LHCB
.... and many others
e.g., BABAR (SLAC), CDF (FNAL), VIRGO (Italy),
ARGO (Tibet), AMS (Satellite), PAMELA
(Satellite), MAGIC (Canary Islands)
offering the following resources
CY 2009 23K HS06, 2.8 PB of disk space and 2.65
PB of tape space on line (2 libraries)
CY 2010 68K HS06, 8.5 PB of disk, 6.6 PB of
tape space on line

5
High Energy Physics computing

Computing at Large Hadron Collider is based on
multi-tiers models
Tier0 (CERN), Tier1s and Tier2s form WLCG/EGEE
Grid
INFN-CNAF is a Tier1
Uniform access policies to computing resources
(CPU, storage) have been defined in the framework
of Grid (WLCG/EGEE)

6
Overview

INFN -CNAF
Mass Storage Challenge
SRM layer
Solution I
StoRM with GPFS
Results
Solution II
StoRM with GPFS and TSM
Results
Conclusions

7
Mass Storage Challenge

Several PetaBytes of data (online and near-line)
need to be accessed at any time from thousands of
concurrent processes
Aggregated data throughput required, both on
Local Area Network (LAN) and Wide Area Network
(WAN), is of the order of several GB/s.
Long term transparent archiving of data is needed.

8
Overview

INFN-CNAF
Mass Storage Challenge
SRM layer
Solution I
StoRM with GPFS
Results
Solution II
StoRM with GPFS and TSM
Results
Conclusions

9
Storage Resource Manager

In WLCG/EGEE an abstraction layer, called Storage
Resource Manager (SRM), is defined in order to
let client applications interact with the backend
storage system via a common interface (either
disk or tape).
SRM transparently supports several access
protocols for LAN (e.g. POSIX, RFIO, DCAP) and
for WAN (GridFTP).
SRM allows remote space management and
preparation of files for read/write access.

10
A typical simplified workflow exploiting SRM
services

A remote user wants to analyze a dataset stored
on tape
Check whether files exist, e.g. perform remote
file listing by using the SRM service
Order the SRM to move data files from tape to
disk if needed
Remotely poll, asking the SRM what is the status
of the tape recalls
When all files have been recalled to disk, submit
the analysis job
The analysis job contacts again the SRM in order
to get the protocol needed to access the files
and as well as their physical location
Finally, once files have been processed, the
analysis job contacts the SRM in order to release
the files
They can e.g. be deleted from disk through a
garbage collector mechanism if disk space is
reclaimed for other activities

11
Overview

INFN CNAF
Mass Storage Challenge
SRM layer
Solution I
StoRM with GPFS
Results
Solution II
StoRM with GPFS and TSM
Results
Conclusions

12
StoRM

StoRM is an implementation of the SRM solution
designed to leverage the advantages of cluster
file systems (like GPFS) and standard POSIX file
systems in a Grid environment developed at
INFN-CNAF.

13
StoRM with GPFS

GPFS by IBM has been chosen at CNAF as the
solution for disk-based storage demonstrating
outstanding I/O performances and stability
Large GPFS configuration with StoRM as SRM layer
set up at CNAF
2 PB of net disk space (to be doubled in 2010)
partitioned in several GPFS clusters
150 disk-servers (NSD GridFTP) connected to
the Storage Area Network (SAN)

14
Some results
15
Overview

INFN-CNAF
Mass Storage Challenge
SRM layer
Solution I
StoRM with GPFS
Results
Solution II
StoRM with GPFS and TSM
Results
Conclusions

16
StoRM with GPFS and TSM

We combined GPFS specific features (available
commencing version 3.2) and TSM (also by IBM)
with StoRM to provide a transparent Grid-enabled
Hierarchical Storage Manager (HSM) solution
An interface between GPFS and TSM has been
realized in order to allow for intelligent tape
recalls. By means of this interface, files are
recalled in the best order as they appear on
tape.
StoRM has been extended to include the SRM
methods required to manage the tapes
Pre-production test-bed built to satisfy the
scale of our largest experiment
First stress tests on GPFS-TSM only
Verification of the GPFS-TSM integration and
scalability
User tests on complete system

17
HW set-up
8 tape drives T10KB - 1 TB per tape, - 1 Gbps
per drive
18
GPFS with TSM validation tests

Concurrent r/w accesses to the storage for tape
migrations and recalls, and from batch nodes
where analysis jobs run
StoRM not used in these tests
3 HSM nodes serving 8 T10KB drives
6 drives (at maximum) used for recalls
2 drives (at maximum) used for migrations
Order of 1GB/s of aggregated traffic

550 MB/s from tape to disk
100 MB/s from disk to tape
400 MB/s from disk to the computing nodes (not
shown in this graph)

19
Results with StoRM

24 TB of data moved from tape to disk
Recalls of five days typical usage by a large LHC
experiment (namely CMS) compacted in one shot and
completed in 19h
Files were spread on 100 tapes
Average throughput 400MB/s
0 failures
Up to 6 drives used for recalls
Simultaneously, up to 3 drives used for
migrations of new data files

400 MB/s
20
Overview

INFN Tier1
Mass Storage Challenge
SRM layer
Solution I
StoRM with GPFS
Results
Solution II
StoRM with GPFS and TSM
Results
Conclusions

21
Conclusions

We implemented a full HSM system based on GPFS
and TSM able to satisfy the requirements of a
WLCG Tier-1 centre like INFN-CNAF
StoRM, the Storage Resource Manager software
layer developed at CNAF, has been extended in
order to cope with tape support.
We tested the overall system (StoRM/GPFS/TSM)
with excellent performance results
We were able to fully saturate the bandwidth of
the 8 tape drives in use (1 Gbps each) reaching
about 800 MB/s of aggregated throughput to/from
the tape drives
We migrated our largest HSM user (the LHC
experiment CMS) to StoRM and more will follow in
the near future