Title: HPSS at ECMWF
1HPSS at ECMWF
- F. Dequenne
- June 2005
- francis.dequenne_at_ecmwf.int
2What is ECMWF?
We predict the weather, using big machines.
- European Centre for Medium-range Weather Forecast
- European meteorological organisation supported by
25 European States. - Develops medium-range and seasonal forecasting
through numerical methods. - Provides on daily basis to its Member States
10-days worldwide-weather forecast and other
weather-related products. - Provides to its Member States an organised
on-line access to 27 years of weather forecasts
and observations. - Provides to the world meteorological and research
community retrieval facilities for specific
datasets. - Provides extensive facilities for weather
modelling research.
3Member States
Belgium Ireland Portugal Denmark Italy
Switzerland Germany Luxembourg
Finland Spain The Netherlands Sweden France Norw
ay Turkey Greece Austria United
Kingdom Co-operation agreements or working
arrangements with Czech Republic Romania
ACMAD Croatia Serbia Montenegro EUMETSAT Icelan
d Slovenia WMO Hungary JRC CTBTO
4ECMWF Computer Environment
4224 x 1.9Ghz P690-CPUs Around 30 Tflops peak 50
TB of disk cache
5Data Handling Applications
- MARS
- Meteorological Archive and Retrieval System.
- Bulk of the data, few files
- Interfaced through an ECMWF application.
- Depends heavily on tape get-partials.
- ECFS
- HSM-like service for ad-hoc files.
- Millions of files, many very small.
6Volume of data stored
NB These values do not include the second backup
copy of our most critical data.
7MARS
Meta data
HPSS
Clients
Large files
Forecast results, observations,...
HPSS API
MARS
Get pressures and temperatures over the
atlantic between July 17th, 1989 and December
5th, 2000
Clients
Multiple file parts (one user request may require
access to hundreds of tapes)
Cache
8DHS Services MARS
- Over 1.6 Petabyte of data.
- 2.4 million files.
- 2.3 TB stored every day.
- Data indexed by in-house application, providing a
powerful virtualisation engine. - Requires many tape drives able to load and
position tapes quickly. - Medium to long term archive.
- Comprises
- MARS Operational (40 of the data,WORM,
backed-up) - MARS Research (60,WORS, no backup)
9ECFS User View
ec/syf
Ec/syf/dir1
Ecd /syf/dir1 Ecp local ecremote
Clients, using local commands ecp, els, ecd,...
ec/rdx
Els /rdx/dir2 Ecp ec/rdx/dir2/remote local2
Ec/rdx/dir2
Users have a logical view of a few remote virtual
file systems, accessed through rcp-like commands.
Ec/rdx/dir2/remote
10ECFS implementation
ECFS Server
Metadata Commands Junctions mapping
ECFS Command Line Interface
HPSS core
HPSS API calls
ECFS Data Mover
Internal protocol
HPSS Movers
COS selection based on size And number of tape
copies
ECFS client machines
Files are kept longer in COSes reserved for small
files.
11DHS Services ECFS
- ½ PB of data ( backup copy)
- 17 Million files
- 2 TB of data added daily (but up to 50GB/hour
peaks) - Volatile data (600 GB are deleted or overwritten
each day) - A lot of small files
- 6 million of files lt 512 KB
12HPSS configuration 2005
2Gb Brocade switches
IBM P660-6H1 Mover 8GB Mem 6 CPUs
IBM P660-6M1 Core server 8GB Mem 6 CPUs
IBM P570 Core server 12GB Mem 4 CPUs
IBM P650-6M2 Mars Application 12GB Mem 6 CPUs
13HPSS Data Growth
Daily Write Workload MARS data 2.7
TB/day ECFS data 2.0 TB/day
(1.5 TB growth) Total workload 4.7
TB/day
Daily HPSS Read Workload MARS 0.1 to 0.4 TB/day
(1 to 1.5 TB with cache) ECFS 0.2 to 0.8
TB/day
14What happened since HUF 2004.
- ECFS migration from TSM is completed.
- Various upgrades (CPU, memory, disks, tape
drives). - ECFS now uses API to speak to HPSS, instead of
ftp. - Various consolidations.
- Disaster recovery testing.
- Continue to develop monitoring tools. This time,
mostly for ECFS usage patterns.
15ECFS Disk retrieval
16Issues / Wishlist
- Core server is reaching capacity.
- How to cope with the unavailability of a set of
tapes/robot? - How to provide service with remaining assets?
- SAN-3P for tape drives.
- In the context of MARS (data stored in tape-only
hierarchies). - Reduce the need for dedicated mover platforms.
- Reduce network traffic.
- Performances of writing small files to tapes.
- Need to stop PVL/PVR to add devices.
- CLI for all administrative functions.
17Future developments.
- Complete our disaster recovery plans
consolidation. - Answer to increasing load.
- Install AIX 5.2.
- Install HPSS 6.2.
- Get rid of DCE.
- Test Copan MAIDs
- Evaluate and install new drives and new
robotics. - SAN-3P
- For ECFS.
- For MARS, if tape drive support is provided.
YABEDABEDOOO !