The Data Management Requirements at SNS - PowerPoint PPT Presentation

About This Presentation
Title:

The Data Management Requirements at SNS

Description:

... or access datasets that have been made public, in a scalable fashion ... are facing the emerging challenge of managing the scientific data that can grow ... – PowerPoint PPT presentation

Number of Views:61
Avg rating:3.0/5.0
Slides: 16
Provided by: steve1450
Learn more at: https://sdm.lbl.gov
Category:

less

Transcript and Presenter's Notes

Title: The Data Management Requirements at SNS


1
The Data Management Requirements at SNS
  • Shelly Ren Steve Miller
  • Scientific Computing Group, SNS-ORNL

December 11, 2006
2
SNS Neutron Scattering User Facility
3
Neutron Scattering Science Areas
  • Chemistry microstructures
  • Complex Fluids fluid properties
  • Crystalline Materials molecular structure
  • Disordered Materials structure characterization
  • Engineering study material stress/strain
  • Magnetism Superconductivity material
    properties
  • Polymers studying giant molecules
  • Structural Biology - proteins

4
SNS Instrument Commissioning Schedule
5
SNS Potential Data Volume
Production Data Rate
Just Instrument Data Here
Total Stored Data
6
Integrating Computation with Experimentation
Key
Web Browser
Hardware
Metadata
Access and authorization control
Data
Portal
Control portal
Data portal
Analysis portal
Software
HPC Support
interactive feedback
Decision Support Intelligent Control
acquisition
High Performance Computing
Automation
simulation
analysis
Diagnostics
Controls
Instrument Simulation
Materials simulation
Treatment
Analysis
Acquisition
Instrument
visualization
Sample Environment
Vis
Vis
Vis
Vis
Vis
Electronic notebook
Database
Instrument simulation
Materials simulation
Raw
Intermediate
Scientific
Database
Intermediate
Scientific
Raw
Notebook
Sample environment
Simulation
data
Simulation
Publications
Repository
Proposal
Documentation
7
Creating, Processing and Storing Data
  • Event Histogramming
  • Detector to Pixel mapping
  • Instrument Geometry
  • Metadata extraction
  • Create NeXus file
  • Catalog and Store
  • Reduce Data
  • All subsystems functional to some degree

8
Current SNS Data Hierarchy
  • SNS data are stored on NFS mounted file system
  • Direct Attached Storage (DAS) - incrementally
    growing the storage resources based upon need
  • A data server for DAS - Terabytes internal hard
    drive storage
  • SNS metadata are stored in Oracle database

Data Hierarchy
/facility/instrument/proposalID/experimentId/runNu
mber /Nexus/NeXus files
/preNeXus/metadata files /analysis
e.g. /SNS/BSS/2006_1_2_SCI/1/100/NeXus/BSS_100.nxs
/SNS/BSS/2006_1_2_SCI/1/100/preNeXus/cvinf
o.xml /SNS/BSS/2006_1_2_SCI/1/100/preNeXus/
cvbeam.xml
icat-search
user-workspace
ICAT metadata -- Oracle DB ICAT Appl Server --
JBoss
live-catalog
sns-checkin
data browser
9
SNS Data Access Through Unix Shell
  • Symbolic links are created in the users home
    directory to link to the proposal directories
    he/she is a member of
  • Symbolic links are created for the user in the
    users home directory to link to the public
    directory where public data reside
  • Disk quota may be allocated for users to perform
    analysis, simulation

User Workspace
/facility/users/neutron_boy/workspace (write)
/proposalID (read
only) /proposalID
(read only) /public
(read only) /facility/users/public/proposalID
/proposalID
Gray names are symbolic links to data hierarchy
10
SNS Data Access Through Portal
First SNS Data
ISAW Plot
NeXus tags
NeXu Files
metadata
11
Search Your Data via the Web
  • Enter search text
  • Select search fields
  • Select files of interest to browse or to download

12
Monte Carlo Simulation via the Portal
13
SNS Data Management Requirements
  • Archive, catalog and maintain data produced by
    SNS instruments so users can access them from
    anywhere at anytime and not worry about data
    storage issues
  • Grant authorized access to SNS data and metadata
    for both shell and portal users (ensure data is
    private to the experiment team)
  • Provide services for efficient search, browse,
    download SNS data and metadata
  • Allow users to share datasets with their
    collaborators or access datasets that have been
    made public, in a scalable fashion
  • Provide data management service to HFIR, LUJAN,
    IPNS and other interested neutron facilities.
  • Extend dataset storage to spin disc, HPSS and
    other archival systems
  • Manage distributed dataset storage and perform
    data transport for the end users
  • Federate data storage with partner neutron
    facilities like ISIS so that the users would see
    all their experiment data by logging into one
    facility.

14
SNS Long Term Data Management Needs
  • Create a single file hierarchy for accessing data
    distributed across multiple storage systems and
    multiple facilities even extending beyond neutron
    scattering facilities
  • Support the management, collaboration, controlled
    sharing, replication, transfer, and preservation
    of distributed data
  • Capture metadata for user produced data
  • Automate data transfer
  • Improve data processing -- parallel and scalable
  • Search large volumes of data for patterns to find
    certain structures within their data -- data
    mining
  • Establish a unified user authentication service
    across neutron facilities
  • Provide users with ease of use portal service to
    search, browse, download and upload data to
    search, annotate, and update metadata
  • Integrate experiment with simulation, launch
    simulation jobs that need programmatic access to
    the distributed data resources.

15
Summary
  • As more instruments are going through instrument
    commissioning phase and diving into new science
    discovery era, we are facing the emerging
    challenge of managing the scientific data that
    can grow to petabytes scale in a few years
  • As a user facility, SNS will have a steady stream
    of users to run experiment, generate raw and
    analysis data files we will need not only disc
    cache but also long term storage system like HPSS
  • Promise to search and retrieve SNS data and
    metadata for end users anywhere anytime in a
    timely fashion
  • Grow our data management resources and
    collaborate with the community
  • Looking for opportunities to work with and
    leverage resources beyond our facility
  • Eager to reach out, learn and collaborate with
    data management experts working on the data
    management discipline in all domain areas
  • Wish to understand and utilize new software
    applications to manage distributed data storage
    to transport, search and retrieve data more
    effectively and efficiently
Write a Comment
User Comments (0)
About PowerShow.com