A Data Analysis Framework for the Neutron Community - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

A Data Analysis Framework for the Neutron Community

Description:

... scripting ... interconnections using visual, script-based, or shell programming ... user can construct scripts that create alternative components ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 22
Provided by: mmsn
Category:

less

Transcript and Presenter's Notes

Title: A Data Analysis Framework for the Neutron Community


1

(Distributed Data Analysis for Neutron
Scattering Experiments)
A Data Analysis Framework for the Neutron
Community
Michael M. McKerns Materials Science and
Applied Physics Center for Advanced Computing
Research California Institute of Technology
2
Serving a Growing Community
  • With the availability of OPAL, SNS, and JPARK
    fast approaching, the neutron community has the
    potential to undergo a large growth spurt.
  • Software is a vital part of neutron scattering,
    and unless the software is both robust and easy
    to use, that growth may be limited.
  • Mature packages do exist (McStas, ISAW, DAVE, ),
    and commercial packages are also used (Matlab,
    IDL, Abaqus, IGOR Pro, ) in the analysis
    process. However, groups often use cryptic
    legacy code for at least one step.
  • To grow as a community, we need
  • a way to cultivate and maintain the valuable
    portions of these legacy codes
  • to make legacy and community-standard codes
    interoperable
  • define common data structures and interfaces
  • stop reproduction of effort
  • to allow scientists to concentrate more on
    science by lowering the barriers to software
    engineering

3
There is much to do
  • Software is needed to support the massive
    quantity of data that will be produced at modern
    neutron facilities.
  • Existing software may be incapable of utilizing
    the full richness of the data that will be
    produced.
  • Although the barrier to developing new software
    must be reduced, it is also critical that more
    complex software technologies (i.e.
    high-performance and grid-based computing) are
    enabled.
  • Time is short we must use the best existing
    tools to provide a robust solution yet be
    flexible enough to allow for the easy
    substitution of better future solutions.

4
Software User Stereotypes I
  • Instrument Scientist
  • author of prepackaged and specialized tools
  • wants
  • portable building and debugging tools
  • large toolkit of robust modules and support code
  • rapid application development
  • GUI builder to compose interactive widgets,
    forms, and wizards
  • to focus on supporting the instrument, not
    writing software
  • Visiting Scientist
  • user of prepackaged and specialized tools
  • wants
  • UI that is simple to understand easy to use
  • reasonable defaults for most choices
  • well diagnosed and explained error messages
  • intelligently concealed complexity

5
Software User Stereotypes II
  • Established Researcher
  • coordinator/author/reviewer, designer of new
    applications
  • wants
  • flexible UI that enables interactive exploration
  • access to a comprehensive set of data
    transformations
  • access to modeling and simulation packages
  • tools to compare outputs of different analyses
  • casually useable high-end graphics
  • Beginning Student
  • user of tools and documentation as learning
    environment
  • wants
  • well documented interface and modules
  • access to a set of standard applications
  • flexible UI that enables interactive exploration

6
Software User Stereotypes III
  • Analysis Expert
  • author of analysis, modeling, or simulation
    software
  • wants
  • portable building and debugging tools
  • large toolkit of robust modules and support code
  • easy access to sample data
  • to solve physics problems, not software
    engineering problems
  • Software Engineer
  • binds software to common environment, extends
    software to the framework
  • wants
  • portable building and debugging tools
  • large toolkit of robust modules and support code
  • well documented access to the software and
    framework integration layer
  • validation, verification, and regression testing
  • Framework Maintainer
  • maintains and extends the software infrastructure

7
What is DANSE?
  • a five-year NSF IMR-MIP software construction
    project
  • a collaborative effort between software
    professionals, neutron scattering scientists, and
    facilities
  • a software engineering effort
  • open-source development environment
  • framework for the interoperability of modular
    components
  • integration of legacy codes and
    community-standard software
  • connectivity to facility databases and software
    repositories
  • a scientific endeavor
  • to develop software modules for different
    subfields of neutron scattering
  • to enhance neutron scattering research and
    facilitate new science
  • to build tools for education, collaboration, and
    plausibility assessment
  • an integration framework for building data
    analysis, visualization, modeling, and instrument
    simulation tools for all areas of neutron
    scattering

8
The Power of Python
  • The fundamental commodity for neutron scattering
    software is found within the cores of time-tested
    community-standard software. Rather than rewrite
    or duplicate this software, we can use python to
    provide an integration path into a common
    language.
  • Python is
  • a modern object-oriented language
  • robust, portable, mature, well-supported,
    well-documented
  • easily extendable
  • supports rapid application development
  • Python scripting enables us to
  • compose computations at runtime and discover
    capabilities without recompilation or relinking
  • organize large numbers of user-tunable parameters
  • Binding Python to other languages (C, Fortran,
    ) allows integration without measurable impact
    on performance or scalability

9
Building a Scientific Toolkit
  • Through Python, DANSE will have access to many
    tools
  • basic data structures, optimization algorithms,
    numerical libraries
  • basic data reduction library obtain I(Q), S(Q),
    S(E), S(Q,E)
  • graphical/plotting environments
  • IDL, Matlab, Matplotlib, Gnuplot, Grace,
    ParaView, ACIS (AutoCAD),
  • instrument simulation
  • McStas, VITESS, sample simulation framework,
  • materials simulation
  • ABINIT, VASP, GAMESS, NWChen, NAMD, CHARMM,
  • crystallography
  • cctbx, FOX, ObjCryst,
  • molecular viewers and format translators
  • OpenBabel, Molden, PyMol, ViewMol, DRAWxtl, VMD,
    AtomEye,
  • and MORE!
  • ISAW, texture analysis (MAUD), SLD calculator,
    scattering intensity,

10
The Power of a Framework
  • While a single application can be built
    relatively quickly without using a framework,
    much effort will be spent on error handling,
    logging, UI construction, and other services.
  • A software framework provides
  • a specification for organization of the software
  • a description of the crucial structural elements
    and their interfaces
  • a specification of the possible collaborations of
    these elements
  • a strategy for the composition of new elements
  • flexibility and robustness under evolutionary
    pressures
  • services
  • life cycle management, logging and monitoring
  • network client and server support, authentication
  • should not be rewritten for every application,
    but simply reused
  • A framework increases reusability decreases the
    development cycle

11
DANSE uses Pyre Framework
  • Pyre software architecture
  • robust, stable, open-source foundation
  • gt75,000 lines of Python 30,000 lines of C
  • component-based runtime environment
  • components are pre-compiled and connected by the
    user at runtime
  • user directs component interconnections using
    visual, script-based, or shell programming
  • a set of co-operating abstract services
  • framework provides structural girdle
  • executive layer manages application life cycle
  • applications built from modular components
  • components tie software cores to data streams
  • UI independent of underlying framework

12
Modularity of Components
  • granularity allows reusability of object-oriented
    components
  • rebinning application
  • modularity provides flexibility and
    extensibility

instrument info
Selector
energy bins
filename
times
filename
Energy
NeXusWriter
NeXusReader
Selector
time interval
Bckgrnd
raw counts
Selector
13
Component Data Flow Paradigm
  • scientific analysis codes constitute the cores of
    software components
  • components mediate interaction between cores and
    environment
  • inherit methods (such as message passing and
    error handling) from environment
  • responsible for initialization of programs within
    their component core
  • access centralized mechanism for logging status,
    errors, and history
  • negotiate data exchanges with XML-based data
    exchange protocols
  • components utilize data streams to pass
    information between ports
  • interact with executive layer to negotiate
    execution flow
  • facilitate physical decoupling of computation
    among distributed resources

14
Component Implementation
  • build core engine (Python, Fortran, C, Java,
    Matlab, IDL, )
  • legacy or custom code and third-party libraries
  • provide life-cycle management and exception
    handling strategy
  • construct Python bindings
  • select entry points to expose to Python
  • modularize entry points to monolithic compiled
    libraries
  • cast as a component
  • extend and leverage framework services
  • describe user-configurable parameters
  • provide meta-data that specify the IO port
    characteristics
  • test code
  • satisfy functional requirements with concurrent
    test development
  • utilize interactive runtime testing within Python
    interpreter
  • demonstrate integration with other components

15
Building Abstract Applications
  • DANSE uses a design pattern that enables the
    assembly of components at runtime under user
    control
  • Facilities are named abstract application
    requirements
  • Components are concrete named engines that
    satisfy the requirements
  • Power of an API
  • the application author provides
  • a specification of the application facilities as
    part of the application definition
  • a component to be used as the default
  • the application user can construct scripts that
    create alternative components that comply with
    the facility interface
  • the end user can
  • configure the properties of the component
  • select which component is to be bound to a given
    facility
  • Abstraction is required for dynamic and
    distributed applications

16
Visual Programming Interface
  • Workflow graphs are a naturally dynamic interface
    due to the correspondence between logical and
    physical descriptions of the computation.
  • There are multiple views

    of each computation
  • data flow
  • control flow
  • deployment of distributed components
  • Should allow interactive editing of component
    state
  • access to modify component properties
  • dynamic interface generation from
    component-supplied specifications

17
Distributed/Parallel Computing
  • Enabled by design
  • component framework utilizing data streams
  • requirements for building distributed and
    parallel computations nearly the same as those
    for building applications in a visual programming
    interface
  • Pyre originally designed to compose and control
    parallel applications
  • bindings to mpi
  • encapsulation of python interpreter in mpi
  • Enable distributed computing with currently
    available technologies
  • initial authentication and deployment based on
    ssh scp
  • authentication and security using pyre services
  • access constrained to user space
  • Take advantage of Grid services as they become
    available

18
Broad Scientific Scope
  • data reduction and experiment simulation
  • diffraction, engineering diffraction, and
    inelastic scattering data reduction
  • SANS/USANS and neutron reflectometry data
    reduction
  • instrument and microstructure simulation
  • modeling
  • full profile modeling in real and reciprocal
    space (GSAS, FullProf, PDFFIT)
  • finite element modeling (ABAQUS) self-consistent
    modeling
  • constrained fitting by use of data from other
    experimental techniques
  • 1D/2D model fitting model independent peak
    fitting
  • direct modeling of physical systems ab-initio
    modeling
  • scattering kernel multiple scattering
  • neutron weight correction separation of nuclear
    and spin scattering
  • micromagnetic simulations (OOMMF) disordered
    spin dynamics
  • chemical spectroscopy dynamics (CLIMAX)

19
Facilitates New Better Science
  • better data analysis
  • FEM calculations of strains in microstructures
  • Monte-Carlo inversions of S(Q,E) to obtain
    parameters of structure and dynamics models
  • model refinements with multiple data sets
  • integration of theory
  • micromechanics using correlations of local
    strains
  • phase diagrams from thermodynamic functions
  • ab-initio calculations of spin interactions
  • soft matter structure using atomic force fields
    guided by diffraction
  • experiment planning and execution
  • single crystals on chopper spectrometers
  • feedback control and real-time assessment
  • plausibility testing and contingency planning
  • assessment of science/data trends from previous
    data

20
Goals for DANSE
  • enable the non-expert, while not hindering the
    expert
  • enable distributed and parallel computing as a
    framework service
  • create a community-supported open-source neutron
    scattering software framework
  • lower the barrier to software development
  • provide powerful applications for analysis,
    modeling, and simulation

21
DANSE Project Information
  • milestones for the DANSE software
  • project start 2006
  • beta release 2008
  • release 1.0 2009
  • transition to the SNS 2010
  • documentation, tutorials, and further information
  • the DANSE wiki at http//wiki.cacr.caltech.edu/dan
    se
  • the Pyre homepage at http//www.cacr.caltech.edu/p
    rojects/pyre
  • contacts
  • Brent Fultz btf_at_caltech.edu Michael Aivazis
    aivazis_at_caltech.edu
  • Simon Billinge, Ersan Üstündag, Paul Butler, Paul
    Kienzle, Ian Anderson
  • Michael McKerns mmckerns_at_caltech.edu
Write a Comment
User Comments (0)
About PowerShow.com