AHM04: Sep 2004 Nottingham - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

AHM04: Sep 2004 Nottingham

Description:

e- Science Data Management Group. CCLRC Daresbury Laboratory, UK. AHM04: Sep 2004. Nottingham ... provides virtual file system each user has a home directory ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 14
Provided by: marti227
Category:

less

Transcript and Presenter's Notes

Title: AHM04: Sep 2004 Nottingham


1
eMinerals Environment from the Molecular Level
Managing simulation data
  • Lisa Blanshard
  • e- Science Data Management Group
  • CCLRC Daresbury Laboratory, UK

2
University of Reading
Royal Institution
3
(No Transcript)
4
Discover
Search for a crystal structure
Retrieve
Download crystal structure data
Transform
Convert crystal data into format suitable for
application
Transfer
Transfer crystal structure to compute node
Analysis
Run calculation to perform some analysis on
crystal
Transform Results
Convert results into format suitable for storage
Results Storage
Transfer results to permanent archive
Catalogue
Catalogue the results
Publish
Make results available online
5
CCLRC Data Portal
Discover
  • search many data resources simultaneously
  • uses CCLRC standard for scientific metadata XML
    on the wire
  • download scientific datasets directly to own
    machine for preparation
  • transfer datasets to compute node

Retrieve
Transform
Transfer
Analysis
Transform Results
Results Storage
Catalogue
Publish
6
Discover
  • data not necessarily in correct format for
    application
  • cut paste
  • conversion code
  • common format e.g. CML
  • some codes on project now produce CML

Retrieve
Transform
Transfer
Analysis
Transform Results
Results Storage
Catalogue
Publish
7
Storage Resource Broker
Discover
  • each institution wants to manage its own files
  • however shared access desirable within project
  • deployed SRB vaults at several locations
  • coordinating SRB and database at CCLRC
  • provides virtual file system each user has a
    home directory
  • many different interfaces and APIs
  • provides a personal workspace independent of
    computational grid where users can upload there
    input files
  • SRB vaults professionally managed and backed up
    preventing loss of data
  • SRB provides sophisticated access control system

Retrieve
Transform
Transfer
Analysis
Transform Results
Results Storage
Catalogue
Publish
8
Discover
  • designated job submission nodes
  • allows users to create simple scripts to download
    input from SRB, run job on minigrid and transfer
    results back to SRB
  • uses Condor-G as client to Globus running on
    compute clusters
  • uses SRB S-commands to download and upload files
  • so results are automatically in permanent archive
  • however results are stored as generated

Retrieve
Transform
Transfer
Analysis
Transform Results
Results Storage
Catalogue
Publish
9
Metadata Editor
Discover
  • forms based web application to manually create
    annotation for groups of files
  • files are grouped into datasets and datasets into
    studies
  • each study holds details of investigators,
    description of study, dates, key words or topics
  • datasets hold location of a directory of files in
    SRB or on other file system
  • once entered metadata and files are available via
    Data Portal

Retrieve
Transform
Transfer
Analysis
Transform Results
Results Storage
Catalogue
Publish
10
Discover
Data Portal search across data resources
simultaneously
but have to link up more resources
Retrieve
Download data from Data Portal or Storage
Resource Broker
Transform
Probably have to change format of file manually
use CML for input / output some codes to address
this
Transfer
Transfer file to SRB
manually via SRB tools
Analysis
Script downloads input from SRB, runs job on grid
using Condor-G
Transform Results
Not yet tackled
some output in CML though
Results Storage
Script transfers results to SRB
results stored as they are either in text files
or CML
Catalogue
Metadata Editor - catalogue the results via web
form
need to generate metadata automatically
Publish
Results then available online via Data Portal
11
Particular successes
  • deployment of distributed data resources via SRB
  • set up project RDBMS for metadata/catalogue info
    and interfaces to add/edit metadata and searching
  • used CCLRC Multi-disciplinary Scientific Metadata
    Format for transport of metadata
  • use of CML to format input/output to some codes
  • integration with data and computation dedicated
    nodes to submit jobs via Condor-G, scripts to
    download input and upload results to SRB

12
Issues to overcome
  • many codes still input /output proprietary text
    format
  • auto-generate metadata
  • results stored as generated need to consider
    more sophisticated data storage
  • further use of CML
  • integrated portals for compute and data to manage
    whole workflow
  • integrate more data resources for discovery

13
Further Information
Environment from the Molecular Level http//www.e-
science.clrc.ac.uk/web/projects/eminerals http//e
minerals.org/ UK CCLRC e-Science
Centre http//www.e-science.clrc.ac.uk
Write a Comment
User Comments (0)
About PowerShow.com