The Mass Storage Challenge for the Experiments at LHC: the Solution Developed at INFNCNAF - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

The Mass Storage Challenge for the Experiments at LHC: the Solution Developed at INFNCNAF

Description:

for Large Hadron Collider (LHC) experiments at CERN: Alice, Atlas, CMS, LHCB ... Computing at Large Hadron Collider is based on multi-tiers models ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 22
Provided by: elisabetta8
Category:

less

Transcript and Presenter's Notes

Title: The Mass Storage Challenge for the Experiments at LHC: the Solution Developed at INFNCNAF


1
The Mass Storage Challenge for the Experiments at
LHC the Solution Developed at INFN-CNAF
IEEE NSS-MIC, Orlando 28 Oct 2009
A. Cavalli1, S. Dal Pra1, L. dellAgnello1, A.
Forti1, D. Gregori1, L. Li Gioi1, B. Martelli1,
A. Prosperini1, P. P. Ricci1, Elisabetta
Ronchieri1, V. Sapunenko1, V. Vagnoni2, R.
Zappi1 1 INFN-CNAF, Bologna, Italy 2 INFN,
Sezione di Bologna, Bologna, Italy
2
Overview
  • INFN-CNAF
  • Mass Storage Challenge
  • SRM layer
  • Solution I
  • StoRM with GPFS
  • Results
  • Solution II
  • StoRM with GPFS and TSM
  • Results
  • Conclusions

3
Overview
  • INFN-CNAF
  • Mass Storage Challenge
  • SRM layer
  • Solution I
  • StoRM with GPFS
  • Results
  • Solution II
  • StoRM with GPFS and TSM
  • Results
  • Conclusions

4
INFN CNAF computing center
  • CNAF is the main INFN computing resource center
  • for Large Hadron Collider (LHC) experiments at
    CERN Alice, Atlas, CMS, LHCB
  • .... and many others
  • e.g., BABAR (SLAC), CDF (FNAL), VIRGO (Italy),
    ARGO (Tibet), AMS (Satellite), PAMELA
    (Satellite), MAGIC (Canary Islands)
  • offering the following resources
  • CY 2009 23K HS06, 2.8 PB of disk space and 2.65
    PB of tape space on line (2 libraries)
  • CY 2010 68K HS06, 8.5 PB of disk, 6.6 PB of
    tape space on line

5
High Energy Physics computing
  • Computing at Large Hadron Collider is based on
    multi-tiers models
  • Tier0 (CERN), Tier1s and Tier2s form WLCG/EGEE
    Grid
  • INFN-CNAF is a Tier1
  • Uniform access policies to computing resources
    (CPU, storage) have been defined in the framework
    of Grid (WLCG/EGEE)

6
Overview
  • INFN -CNAF
  • Mass Storage Challenge
  • SRM layer
  • Solution I
  • StoRM with GPFS
  • Results
  • Solution II
  • StoRM with GPFS and TSM
  • Results
  • Conclusions

7
Mass Storage Challenge
  • Several PetaBytes of data (online and near-line)
    need to be accessed at any time from thousands of
    concurrent processes
  • Aggregated data throughput required, both on
    Local Area Network (LAN) and Wide Area Network
    (WAN), is of the order of several GB/s.
  • Long term transparent archiving of data is needed.

8
Overview
  • INFN-CNAF
  • Mass Storage Challenge
  • SRM layer
  • Solution I
  • StoRM with GPFS
  • Results
  • Solution II
  • StoRM with GPFS and TSM
  • Results
  • Conclusions

9
Storage Resource Manager
  • In WLCG/EGEE an abstraction layer, called Storage
    Resource Manager (SRM), is defined in order to
    let client applications interact with the backend
    storage system via a common interface (either
    disk or tape).
  • SRM transparently supports several access
    protocols for LAN (e.g. POSIX, RFIO, DCAP) and
    for WAN (GridFTP).
  • SRM allows remote space management and
    preparation of files for read/write access.

10
A typical simplified workflow exploiting SRM
services
  • A remote user wants to analyze a dataset stored
    on tape
  • Check whether files exist, e.g. perform remote
    file listing by using the SRM service
  • Order the SRM to move data files from tape to
    disk if needed
  • Remotely poll, asking the SRM what is the status
    of the tape recalls
  • When all files have been recalled to disk, submit
    the analysis job
  • The analysis job contacts again the SRM in order
    to get the protocol needed to access the files
    and as well as their physical location
  • Finally, once files have been processed, the
    analysis job contacts the SRM in order to release
    the files
  • They can e.g. be deleted from disk through a
    garbage collector mechanism if disk space is
    reclaimed for other activities

11
Overview
  • INFN CNAF
  • Mass Storage Challenge
  • SRM layer
  • Solution I
  • StoRM with GPFS
  • Results
  • Solution II
  • StoRM with GPFS and TSM
  • Results
  • Conclusions

12
StoRM
  • StoRM is an implementation of the SRM solution
    designed to leverage the advantages of cluster
    file systems (like GPFS) and standard POSIX file
    systems in a Grid environment developed at
    INFN-CNAF.

13
StoRM with GPFS
  • GPFS by IBM has been chosen at CNAF as the
    solution for disk-based storage demonstrating
    outstanding I/O performances and stability
  • Large GPFS configuration with StoRM as SRM layer
    set up at CNAF
  • 2 PB of net disk space (to be doubled in 2010)
    partitioned in several GPFS clusters
  • 150 disk-servers (NSD GridFTP) connected to
    the Storage Area Network (SAN)

14
Some results
15
Overview
  • INFN-CNAF
  • Mass Storage Challenge
  • SRM layer
  • Solution I
  • StoRM with GPFS
  • Results
  • Solution II
  • StoRM with GPFS and TSM
  • Results
  • Conclusions

16
StoRM with GPFS and TSM
  • We combined GPFS specific features (available
    commencing version 3.2) and TSM (also by IBM)
    with StoRM to provide a transparent Grid-enabled
    Hierarchical Storage Manager (HSM) solution
  • An interface between GPFS and TSM has been
    realized in order to allow for intelligent tape
    recalls. By means of this interface, files are
    recalled in the best order as they appear on
    tape.
  • StoRM has been extended to include the SRM
    methods required to manage the tapes
  • Pre-production test-bed built to satisfy the
    scale of our largest experiment
  • First stress tests on GPFS-TSM only
  • Verification of the GPFS-TSM integration and
    scalability
  • User tests on complete system

17
HW set-up
8 tape drives T10KB - 1 TB per tape, - 1 Gbps
per drive
18
GPFS with TSM validation tests
  • Concurrent r/w accesses to the storage for tape
    migrations and recalls, and from batch nodes
    where analysis jobs run
  • StoRM not used in these tests
  • 3 HSM nodes serving 8 T10KB drives
  • 6 drives (at maximum) used for recalls
  • 2 drives (at maximum) used for migrations
  • Order of 1GB/s of aggregated traffic
  • 550 MB/s from tape to disk
  • 100 MB/s from disk to tape
  • 400 MB/s from disk to the computing nodes (not
    shown in this graph)

19
Results with StoRM
  • 24 TB of data moved from tape to disk
  • Recalls of five days typical usage by a large LHC
    experiment (namely CMS) compacted in one shot and
    completed in 19h
  • Files were spread on 100 tapes
  • Average throughput 400MB/s
  • 0 failures
  • Up to 6 drives used for recalls
  • Simultaneously, up to 3 drives used for
    migrations of new data files

400 MB/s
20
Overview
  • INFN Tier1
  • Mass Storage Challenge
  • SRM layer
  • Solution I
  • StoRM with GPFS
  • Results
  • Solution II
  • StoRM with GPFS and TSM
  • Results
  • Conclusions

21
Conclusions
  • We implemented a full HSM system based on GPFS
    and TSM able to satisfy the requirements of a
    WLCG Tier-1 centre like INFN-CNAF
  • StoRM, the Storage Resource Manager software
    layer developed at CNAF, has been extended in
    order to cope with tape support.
  • We tested the overall system (StoRM/GPFS/TSM)
    with excellent performance results
  • We were able to fully saturate the bandwidth of
    the 8 tape drives in use (1 Gbps each) reaching
    about 800 MB/s of aggregated throughput to/from
    the tape drives
  • We migrated our largest HSM user (the LHC
    experiment CMS) to StoRM and more will follow in
    the near future
Write a Comment
User Comments (0)
About PowerShow.com