Data Handling System at ECMWF - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Data Handling System at ECMWF

Description:

Bulk of the data, few files. Interfaced through an ECMWF application. ... One should also reorganise tables from time to time. Running reorgs need to be done offline. ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 17
Provided by: robh192
Category:

less

Transcript and Presenter's Notes

Title: Data Handling System at ECMWF


1
Data Handling System at ECMWF
  • Francis Dequenne, Stephen Richards
  • HUF 2007
  • francis.dequenne_at_ecmwf.int
  • Stephen.Richards_at_ecmwf.int

2
ECMWF Computer Environment
2x 155 p5-575 4.5 TFlops sustained 9TB of
memory 100 TB disks
To be replaced by two considerably more powerful
clusters
IBM 3584
3
Data Handling Applications
  • MARS
  • Meteorological Archive and Retrieval System.
  • Bulk of the data, few files
  • Interfaced through an ECMWF application.
  • Depends heavily on tape get-partials.
  • ECFS
  • HSM-like service for ad-hoc files.
  • Millions of files, many very small.
  • Both services uses HPSS as their underlying
    archival system.

4
Volume of data stored
100 PB around 2014
6 PB of data, (2 PB more for second copy)
NB These values do not include the second backup
copy of our most critical data.
5
HPSS data growth
HPSS data growth
Typical Daily Write Workload MARS data
5.0 TB/day ECFS data 3.2 TB/day 2nd
Copy 3.4 TB/day Total HPSS
11.6 TB/day
Typical Daily Read Workload MARS HPSS
1.0 TB/day (MARS Total 3.5
TB/day) ECFS 1.2
TB/day Total HPSS 2.2 TB/day
6
Number of files stored
ECFS files
ECFS files smaller than ½ MB
MARS files
7
DHS Services MARS
  • Over 4.5 Petabytes of data (plus 1.6 PB backups).
  • 4.5 million files.
  • Between 4 and 5.5 TB stored every day.
  • Data indexed by in-house application, providing a
    powerful virtualisation engine.
  • Requires many tape drives able to load and
    position tapes quickly.
  • Medium to long term archive.
  • Comprises
  • MARS Operational (40 of the data,WORM,
    backed-up).
  • MARS Research (60,WORS, no backup).
  • HPSS provides
  • Good support for partial retrieves from tape.
  • Good metadata query tools.
  • Good scaling.

8
MARS

Meta data
HPSS
Clients
Large files
Forecast results, observations,...
HPSS API
MARS
Get pressures and temperatures over the
atlantic on July 17th, 1989
Clients
Multiple file parts (one user request may require
access to hundreds of tapes)
Cache
9
DHS Services ECFS
  • 1.3 Petabyte of data ( 350 TB backup copy).
  • 35 Million files.
  • 2.5 TB of data added daily (but peaks of more
    than 150GB/hour).
  • Volatile data.
  • Avg 35,000 retrieves/Day, peaks gt 70,000 have
    been observed.
  • A lot of small files
  • 10.5 million files lt 512 KB.
  • Represent 30-50 of the ECFS retrieval activity.
  • HPSS provides
  • Ability to support tens of millions of files.
  • Customisation allowing some type of files to stay
    cached for long period of time.
  • Good levels of performances.

10
ECFS User View
ec/syf
Ec/syf/dir1
ecd /syf/dir1 ecp local ecremote
Clients, using local commands ecp, els, ecd,...
ec/rdx
els /rdx/dir2 ecp ec/rdx/dir2/remote local2
Ec/rdx/dir2
Users have a logical view of a few remote virtual
file systems, accessed through rcp-like commands.
Ec/rdx/dir2/remote
11
ECFS implementation
  • Behind the scenes, one virtual file system is
    mapped onto several HPSS filesets.
  • The back-end archiving-system interfaces are
    hidden from the clients.

ECFS Server
Where should this file be ?
ECFS Client
HPSS fileset
ECFS Mover
Get/put it there
HPSS fileset
DHS access
12
3 TB High Performance
80 TB High Capacity
30 TB Sata Disk
Client
HPSS Core P570-4 CPU
Client
SAN
HPSS Mover 6M2-8 CPU
HPSS Mover 6M2-6 CPU
ECFS Server 6M2-6 CPU
Client
Client
Also used as a mover
13
What is new since last HUF?
  • Version 6.2 of the code was installed.
  • Bye bye, DCE! (eeeeehhhhhhhaaaaaa)
  • A few issues encountered initially, now pretty
    stable.
  • New IBM TS3500 Library has been installed, with
    LTO/3 drives
  • CONAN the Librarian
  • Replace our old ADIC robot.
  • Used for writing secondary tape copies.
  • A few problems discovered with the PVR, esp
    Checkin processing.
  • Discovered some issues in moving shelved tapes
    from one robot to another.
  • Usage of 3592-XL tapes on TS1120 drives (aka
    3592-E05)
  • 700GB/tape.
  • Some problems with latest versions of microcode,
    resulted in a few files becoming unreadable.

14
Issues to be addressed real soon now
  • Poor performances of writing small files to
    tapes. We are eager to see the new mechanisms
    which do not require tape marks between each
    file, expected in version 7.1
  • Need to stop PVL/PVR/Movers to add/modify
    devices/drives.
  • Some administrative functions still require the
    GUI.
  • To add 40 disks or modify the configuration of 30
    drives through the GUI is
  • Error-prone,
  • NOT fun.

15
Issue DB2 maintenance tools
  • Suggested practice is that DB2 runstats should be
    run on regular basis.
  • This updates tables access statistics, and is
    used to optimise table access.
  • Cannot be done on a busy system Errors are
    reported, statistics are incorrect.
  • One should also reorganise tables from time to
    time.
  • Running reorgs need to be done offline.
  • At ECMWF requires HPSS downtime, for 5 to 10
    hours.
  • We need to find an efficient, non-disruptive,
    reliable way to do these operations.

16
More issues
  • In a tape -gt tape hierarchy, no automatic attempt
    to access secondary tape copies if first copy is
    down.
  • Migration optimisation in disk -gttape hierarchies
  • Especially in family-rich environment
  • Migration does not seem to manage all files from
    one family at the same time.
  • Seems to result in tapes being dismounted/remounte
    d unnecessarily
  • Some tapes are mounted 40 times a day (over 100
    times a day in trashing conditions) for migration
    purposes.

17
HPSS needs a checksum facility.
  • The data on your disk subsystem is probably OK
  • Errors undetected by hardware are much more
    common than expected.
  • Some studies have shown that one undetected
    error appears per 100TB moved.
  • Technology migration of 10PB implies 100 errors
    could be generated!
  • We need for HPSS to support optional bitfile
    check-summing
  • Checksum generated either by user application or
    clients when data is first introduced in the
    system.
  • Checksum is validated every time the bitfile is
    moved.
  • Checksums can be used to validate disks contents.

18
Other things that we would like to see
  • Ability to balance the allocation of scratch
    tapes across multiple tape robots.
  • Could become a significant issue if we have to
    start using smaller robots.
  • Ability for movers to claim tape drives ownership
    as needed.
  • E.g. if mover shares platform with application
    which writes data straight to tape.
  • Undelete facility.
  • New methods of indexing the bitfiles (Content
    Addressed Storage?)
  • Think google, longhorn,

19
Future developments.
  • Test and deploy Copan MAIDs ??
  • SAN-3P deployment.
  • AIX 5.3
  • Keep being supported.
  • Support for luns of size gt 1TB.
  • 64 bits applications.
  • STK silos replacement.
  • Silos will not be supported after end 2010
  • What to replace them with?

20
show_hpss_tapevols
  • List the HPSS status of volumes for a given
    library.
  • Can be used for example to find out which tapes
    are shelved.

show_hpss_tapevols -m mode -p pvr_id -h
mode can be one of
mount_pending - volume is waiting to be mounted
mounted - volume is mounted
dismount_pending - volume is waiting for
a dismount to complete dismounted
- volume is dismounted shelf -
volume shelved shelf_pending -
volume is waiting to be checked out
checkin - volume has been checked in
checkin_pending - volume is waiting to be
checked in displayed - volume is
in checkin or mount display window
move_pending - volume is waiting to be checked
into new library eject_pending -
volume is waiting to be ejected pvr_id is
the HPSS server name for a pvr.
Example of output show_hpss_tapevols p
CONAN Cartridge Status
PVR X00189 dismounted
CONAN X00190 dismounted
CONAN X00191 dismounted
CONAN X00192 shelf
CONAN X00193 shelf
CONAN X00194 shelf
CONAN X00195 shelf
CONAN X00196 dismounted
CONAN X00197 dismounted
CONAN X00198 dismounted
CONAN X00199 dismounted CONAN
21
tape_recover
  • A utility able to read bitfiles from tapes
    without using DB2 or HPSS.
  • Could be used for
  • Last chance recovery scenario (e.g. DB2 can not
    be restored)
  • Recovery of files accidentally deleted from HPSS
  • Migration to another archiving system
  • Requires a list of files and attached tape
    segments to be generated
  • At ECMWF, this is done every week, piggy-backing
    the calls to list_file_subsys done to generated
    user files lists.
  • Requires tapes to be mounted manually, e.g.
    through acsls, mtlib,
  • Developed for non-striped tapes.

Hopefully in many many years
WORK IN PROGRESS! A working prototype is
currently tested.
Write a Comment
User Comments (0)
About PowerShow.com