Title: A Data Analysis Framework for the Neutron Community
1 (Distributed Data Analysis for Neutron
Scattering Experiments)
A Data Analysis Framework for the Neutron
Community
Michael M. McKerns Materials Science and
Applied Physics Center for Advanced Computing
Research California Institute of Technology
2Serving a Growing Community
- With the availability of OPAL, SNS, and JPARK
fast approaching, the neutron community has the
potential to undergo a large growth spurt. - Software is a vital part of neutron scattering,
and unless the software is both robust and easy
to use, that growth may be limited. - Mature packages do exist (McStas, ISAW, DAVE, ),
and commercial packages are also used (Matlab,
IDL, Abaqus, IGOR Pro, ) in the analysis
process. However, groups often use cryptic
legacy code for at least one step. - To grow as a community, we need
- a way to cultivate and maintain the valuable
portions of these legacy codes - to make legacy and community-standard codes
interoperable - define common data structures and interfaces
- stop reproduction of effort
- to allow scientists to concentrate more on
science by lowering the barriers to software
engineering
3There is much to do
- Software is needed to support the massive
quantity of data that will be produced at modern
neutron facilities. - Existing software may be incapable of utilizing
the full richness of the data that will be
produced. - Although the barrier to developing new software
must be reduced, it is also critical that more
complex software technologies (i.e.
high-performance and grid-based computing) are
enabled. - Time is short we must use the best existing
tools to provide a robust solution yet be
flexible enough to allow for the easy
substitution of better future solutions.
4Software User Stereotypes I
- Instrument Scientist
- author of prepackaged and specialized tools
- wants
- portable building and debugging tools
- large toolkit of robust modules and support code
- rapid application development
- GUI builder to compose interactive widgets,
forms, and wizards - to focus on supporting the instrument, not
writing software - Visiting Scientist
- user of prepackaged and specialized tools
- wants
- UI that is simple to understand easy to use
- reasonable defaults for most choices
- well diagnosed and explained error messages
- intelligently concealed complexity
5Software User Stereotypes II
- Established Researcher
- coordinator/author/reviewer, designer of new
applications - wants
- flexible UI that enables interactive exploration
- access to a comprehensive set of data
transformations - access to modeling and simulation packages
- tools to compare outputs of different analyses
- casually useable high-end graphics
- Beginning Student
- user of tools and documentation as learning
environment - wants
- well documented interface and modules
- access to a set of standard applications
- flexible UI that enables interactive exploration
6Software User Stereotypes III
- Analysis Expert
- author of analysis, modeling, or simulation
software - wants
- portable building and debugging tools
- large toolkit of robust modules and support code
- easy access to sample data
- to solve physics problems, not software
engineering problems - Software Engineer
- binds software to common environment, extends
software to the framework - wants
- portable building and debugging tools
- large toolkit of robust modules and support code
- well documented access to the software and
framework integration layer - validation, verification, and regression testing
- Framework Maintainer
- maintains and extends the software infrastructure
7What is DANSE?
- a five-year NSF IMR-MIP software construction
project - a collaborative effort between software
professionals, neutron scattering scientists, and
facilities - a software engineering effort
- open-source development environment
- framework for the interoperability of modular
components - integration of legacy codes and
community-standard software - connectivity to facility databases and software
repositories - a scientific endeavor
- to develop software modules for different
subfields of neutron scattering - to enhance neutron scattering research and
facilitate new science - to build tools for education, collaboration, and
plausibility assessment - an integration framework for building data
analysis, visualization, modeling, and instrument
simulation tools for all areas of neutron
scattering
8The Power of Python
- The fundamental commodity for neutron scattering
software is found within the cores of time-tested
community-standard software. Rather than rewrite
or duplicate this software, we can use python to
provide an integration path into a common
language. - Python is
- a modern object-oriented language
- robust, portable, mature, well-supported,
well-documented - easily extendable
- supports rapid application development
- Python scripting enables us to
- compose computations at runtime and discover
capabilities without recompilation or relinking - organize large numbers of user-tunable parameters
- Binding Python to other languages (C, Fortran,
) allows integration without measurable impact
on performance or scalability
9Building a Scientific Toolkit
- Through Python, DANSE will have access to many
tools - basic data structures, optimization algorithms,
numerical libraries - basic data reduction library obtain I(Q), S(Q),
S(E), S(Q,E) - graphical/plotting environments
- IDL, Matlab, Matplotlib, Gnuplot, Grace,
ParaView, ACIS (AutoCAD), - instrument simulation
- McStas, VITESS, sample simulation framework,
- materials simulation
- ABINIT, VASP, GAMESS, NWChen, NAMD, CHARMM,
- crystallography
- cctbx, FOX, ObjCryst,
- molecular viewers and format translators
- OpenBabel, Molden, PyMol, ViewMol, DRAWxtl, VMD,
AtomEye, - and MORE!
- ISAW, texture analysis (MAUD), SLD calculator,
scattering intensity,
10The Power of a Framework
- While a single application can be built
relatively quickly without using a framework,
much effort will be spent on error handling,
logging, UI construction, and other services. - A software framework provides
- a specification for organization of the software
- a description of the crucial structural elements
and their interfaces - a specification of the possible collaborations of
these elements - a strategy for the composition of new elements
- flexibility and robustness under evolutionary
pressures - services
- life cycle management, logging and monitoring
- network client and server support, authentication
- should not be rewritten for every application,
but simply reused - A framework increases reusability decreases the
development cycle
11DANSE uses Pyre Framework
- Pyre software architecture
- robust, stable, open-source foundation
- gt75,000 lines of Python 30,000 lines of C
- component-based runtime environment
- components are pre-compiled and connected by the
user at runtime - user directs component interconnections using
visual, script-based, or shell programming - a set of co-operating abstract services
- framework provides structural girdle
- executive layer manages application life cycle
- applications built from modular components
- components tie software cores to data streams
- UI independent of underlying framework
12Modularity of Components
- granularity allows reusability of object-oriented
components - rebinning application
- modularity provides flexibility and
extensibility
instrument info
Selector
energy bins
filename
times
filename
Energy
NeXusWriter
NeXusReader
Selector
time interval
Bckgrnd
raw counts
Selector
13Component Data Flow Paradigm
- scientific analysis codes constitute the cores of
software components - components mediate interaction between cores and
environment - inherit methods (such as message passing and
error handling) from environment - responsible for initialization of programs within
their component core - access centralized mechanism for logging status,
errors, and history - negotiate data exchanges with XML-based data
exchange protocols - components utilize data streams to pass
information between ports - interact with executive layer to negotiate
execution flow - facilitate physical decoupling of computation
among distributed resources
14Component Implementation
- build core engine (Python, Fortran, C, Java,
Matlab, IDL, ) - legacy or custom code and third-party libraries
- provide life-cycle management and exception
handling strategy - construct Python bindings
- select entry points to expose to Python
- modularize entry points to monolithic compiled
libraries - cast as a component
- extend and leverage framework services
- describe user-configurable parameters
- provide meta-data that specify the IO port
characteristics - test code
- satisfy functional requirements with concurrent
test development - utilize interactive runtime testing within Python
interpreter - demonstrate integration with other components
15Building Abstract Applications
- DANSE uses a design pattern that enables the
assembly of components at runtime under user
control - Facilities are named abstract application
requirements - Components are concrete named engines that
satisfy the requirements - Power of an API
- the application author provides
- a specification of the application facilities as
part of the application definition - a component to be used as the default
- the application user can construct scripts that
create alternative components that comply with
the facility interface - the end user can
- configure the properties of the component
- select which component is to be bound to a given
facility - Abstraction is required for dynamic and
distributed applications
16Visual Programming Interface
- Workflow graphs are a naturally dynamic interface
due to the correspondence between logical and
physical descriptions of the computation. - There are multiple views
of each computation - data flow
- control flow
- deployment of distributed components
- Should allow interactive editing of component
state - access to modify component properties
- dynamic interface generation from
component-supplied specifications
17Distributed/Parallel Computing
- Enabled by design
- component framework utilizing data streams
- requirements for building distributed and
parallel computations nearly the same as those
for building applications in a visual programming
interface - Pyre originally designed to compose and control
parallel applications - bindings to mpi
- encapsulation of python interpreter in mpi
- Enable distributed computing with currently
available technologies - initial authentication and deployment based on
ssh scp - authentication and security using pyre services
- access constrained to user space
- Take advantage of Grid services as they become
available
18Broad Scientific Scope
- data reduction and experiment simulation
- diffraction, engineering diffraction, and
inelastic scattering data reduction - SANS/USANS and neutron reflectometry data
reduction - instrument and microstructure simulation
- modeling
- full profile modeling in real and reciprocal
space (GSAS, FullProf, PDFFIT) - finite element modeling (ABAQUS) self-consistent
modeling - constrained fitting by use of data from other
experimental techniques - 1D/2D model fitting model independent peak
fitting - direct modeling of physical systems ab-initio
modeling - scattering kernel multiple scattering
- neutron weight correction separation of nuclear
and spin scattering - micromagnetic simulations (OOMMF) disordered
spin dynamics - chemical spectroscopy dynamics (CLIMAX)
19 Facilitates New Better Science
- better data analysis
- FEM calculations of strains in microstructures
- Monte-Carlo inversions of S(Q,E) to obtain
parameters of structure and dynamics models - model refinements with multiple data sets
- integration of theory
- micromechanics using correlations of local
strains - phase diagrams from thermodynamic functions
- ab-initio calculations of spin interactions
- soft matter structure using atomic force fields
guided by diffraction - experiment planning and execution
- single crystals on chopper spectrometers
- feedback control and real-time assessment
- plausibility testing and contingency planning
- assessment of science/data trends from previous
data
20Goals for DANSE
- enable the non-expert, while not hindering the
expert - enable distributed and parallel computing as a
framework service - create a community-supported open-source neutron
scattering software framework - lower the barrier to software development
- provide powerful applications for analysis,
modeling, and simulation
21DANSE Project Information
- milestones for the DANSE software
- project start 2006
- beta release 2008
- release 1.0 2009
- transition to the SNS 2010
- documentation, tutorials, and further information
- the DANSE wiki at http//wiki.cacr.caltech.edu/dan
se - the Pyre homepage at http//www.cacr.caltech.edu/p
rojects/pyre - contacts
- Brent Fultz btf_at_caltech.edu Michael Aivazis
aivazis_at_caltech.edu - Simon Billinge, Ersan Üstündag, Paul Butler, Paul
Kienzle, Ian Anderson - Michael McKerns mmckerns_at_caltech.edu