Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578 - PowerPoint PPT Presentation


PPT – Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578 PowerPoint presentation | free to download - id: 7055e2-MGRmZ


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578


Title: Slide 1 Author: Jet Propulsion Laboratory Last modified by: crichton Created Date: 4/10/2010 3:36:28 PM Document presentation format: On-screen Show (4:3) – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 47
Provided by: JetPro4
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578

Domain Specific Software Architectures for
Science Lecture for Software Architectures USC
  • Dan Crichton
  • April 2010

  • Introduction who am I?
  • Architecture what is means to me
  • Challenges in Developing Architectures
  • Reference Architecture vs Domain Specific
    Software Architectures
  • Experience in Science
  • Lessons Learned
  • QA

Who am I?
  • Employed by Jet Propulsion Laboratory since 1995
    prior software engineering positions at Hughes
    Aircraft Company and in private industry
  • MS in Computer Science, USC 20 years of
  • Program Manager Principal Computer Scientist
  • Planetary Data System Engineering in Solar System
    Exploration Directorate
  • Data Systems and Technology in Earth and
    Technology Directorate
  • Principal Investigator for
  • Informatics Center, Early Detection Research
    Network, National Cancer Institute
  • Facilitating Integration of NASA and Earth System
    Grid, NASA
  • Object Oriented Data Technology
  • Several co-Investigator Tasks

Architecture why do I care?
  • Architecture is a game changer in our business
  • Enable scientific discovery, novel engineering,
  • Coordination across multiple enterprises
  • Data system costs per mission, project,
    investigation, etc is high
  • Technology infusion is limited
  • Experience and knowledge reuse

But, there are challenges
  • Lack of true architects
  • Most think of point solutions or confuse
    architecture and implementation
  • Abstracting is difficult
  • Governance is often at a project level little
    view at an enterprise level
  • Limited planning and understanding of the
    reference requirements

Architects what are they?
  • Effective Architects have
  • Years of experience
  • Holistic view of domain
  • Look at both aesthetics and practical details
  • Variable technical depth
  • Lifecycle roles
  • Strong involvement up-front
  • May oversee development
  • Chooses stable steps in development
  • Effective Architects are not
  • Lone inventors or scientists
  • The architect is a good communicator and
    politician -- architectures must be sold and
    explained and their integrity maintained
  • Architecting is not a science, but depends on
  • Purely technologists
  • Architecture is a strategy
  • Top level only designers
  • Details are often critical
  • Collaborators
  • A coherent vision is critical they drive it

Architecture what is it?
  • The fundamental organization of a system embodied
    in its components, their relationships to each
    other, and to the environment, and the principles
    guiding its design and evolution. (ANSI/IEEE
    Std. 1471-2000)

Communicating an architecture
  • A good architecture is one that can be
    communicated to the stakeholders
  • A good architecture presents viewpoints of the
    system that address stakeholder concerns
  • A good architecture uses models and descriptions
    that are relevant to the stakeholders
  • Different models may be used to present different
    viewpoints (e.g., A UML model of the system may
    be appropriate for some but not all stakeholders)

Viewpoints and views
  • A viewpoint is a template for constructing a view
  • Enterprise, Functional, Informational, etc
  • A view is a description of the entire system from
    the perspective of a set of related concerns. A
    view is composed of one or more models.
  • A model is an abstraction or representation of
    some aspect of a thing
  • Examples RM-ODP, FEAF, TOGAF, etc

(Project Managers, Engineers, Scientists,
Business Analysts, )
Reference Architectures
  • Show components, functions, and interfaces at a
    high level of abstractions
  • Likewise, we consider information models to also
    be part of a reference architecture (at a
    sufficient abstract level)
  • In observing systems, the information model
    patterns are highly compatible as a reference
    information model
  • Implementation neutral architectural frameworks
    can be useful in defining a structure for a
    reference architecture
  • We use Reference Architectures to give us a
    strategic advantage as well as improve enterprise
    scale software

Domain Specific Software Architectures
  • Domain model
  • Leverage experts who have the holistic view and
    can drive the need for product lines
  • An unambiguous view is critical (in fact, this
    has been a problem in science arenas)
  • Reference requirements
  • Drives the reference architecture
  • However, it is critical to map domain models to
    reference requirements in order to understand the
    solution space
  • Reference architecture
  • Satisfies an abstracted set of functions from the
    reference requirements
  • Its engineered for the ilities reusability,
    extensibility and configurability
  • It demonstrates the separation of functional
    elements of the architecture

Tracz, Will, Domain-Specific Software
Architecture, ACM SIGSOFT, 1995
RAs vs DSSAs in Science
  • In science data systems, construction of multiple
    architecture viewpoints of a system is critical
  • Process/Enterprise
  • Information/Data
  • Technology
  • We find the viewpoints are similar, but models
    can be domain specific
  • This is the opportunity to develop a reusable
    reference architecture if the patterns can be

Scientific data systems
  • Covers a wide variety of disciplines
  • Solar system exploration
  • Astrophysics
  • Earth science
  • Biomedicine
  • etc
  • Each has its own communities, standards and
  • But, there is an underlying reference
    architecture and discipline software
    architectures in each!

The e-science trend
  • Highly distributed, multi-organizational systems
  • Systems are moving towards loosely coupled
    systems or federations in order to solve science
    problems which span center and institutional
  • Sharing of data and services which allow for the
    discovery, access, and transformation of data
  • Systems are moving towards publishing of services
    and data in order to address data and
    computationally-intensive problems
  • Infrastructures which are being built to handle
    future demand
  • Address complex modeling, inter-disciplinary
    science and decision support needs
  • Need a dynamic environment where data and
    services can be used quickly as the building
    blocks for constructing predictive models and
    answering critical science questions
  • Changing the way in which data analysis is
  • Moving towards analysis of distributed data to
    increase the study power
  • Enabling greater collaboration across centers

Context Space data systems
Relay Satellite
Simple Information Object
Spacecraft and Scientific Instruments
Spacecraft / lander
Science Data Archive
External Science Community
Primitive Information Object
Primitive Information Object
Science Information Package
Science Information Package
Science Data Processing
Science Products - Information Objects
Telemetry Information Package
Science Information Package
Data Analysis and Modeling
Science Information Package
Planning Information Object
Instrument Planning Information Object
Science Team
Data Acquisition and Command
Mission Operations
Instrument /Sensor Operations
  • Common Meta Models for Describing Space
    Information Objects
  • Common Data Dictionary end-to-end

Earth Science Data Systems
SMAP, Desdyni
Infrastructure to support Analysis of Distributed
Cancer research
Patterns in scientific data systems
  • Instrument and Spacecraft Commands
  • Instruments that capture observations
  • Generation of Engineering and Science Data
  • Data Processing
  • Data Management
  • Data Distribution
  • Distributed Facilities
  • Data Movement

Finding the reference architecture
  • Simple SOA-style pattern
  • Data/Information Architecture
  • Components, middleware, and communication
  • NOTE Process is implicit here

Ilities in science data systems
  • Usability
  • Diversity within the domain
  • Scalability
  • Reliability
  • Portability
  • NOTE Our reference architecture must address
    these ilities long term

Specialization within domains
  • Domain information models
  • Planetary Science Ontology
  • Cancer Biomarker Ontology
  • Etc
  • Specific services and domain implementations are
    derived from the reference architecture
  • Reference Architecture-gtDomain Specific Software
    Architecture-gt Domain Implementations
  • In these science domains, the architectures need
    to be long-lived (20 years)

Derived Planetary Data System Architecture
Software product lines
  • This is about strategy more than technology
  • Goal is a software product line that
  • Implements our reference architecture
  • Allows for construction of core software
    components that can be reused across projects and
    science disciplines
  • Can demonstrate sufficient cost and schedule
    benefits without sacrificing flexibility in
    meeting requirements and adapting to technology
  • Extensions can be applied at the discipline level

Object Oriented Data Technology
  • Represents both a reference architecture AND a
    software product line for science data systems
  • Exploits common patterns
  • Delivers reusable software components as building
    blocks for construction of higher order data
  • Applied to multiple science disciplines
  • Funded originally back in 1998 runner up for
    NASA Software of the Year in 2003
  • Heavily used by NASA and NIH projects

Architectural principles
  • Separate the technology and the information
  • Encapsulate the messaging layer to support
    different messaging implementations
  • Encapsulate individual data systems to hide
  • Provide data system location independence
  • Require that communication between distributed
    systems use metadata
  • Define a model for describing systems and their
  • Provide scalability in linking both number of
    nodes and size of data sets
  • Allow systems using different data dictionaries
    and metadata implementations to be integrated
  • Leverage existing software, where possible (e.g.,
    open source, etc)

Crichton, D, Hughes, J. S, Hyon, J, Kelly, S.
Science Search and Retrieval using
XML, Proceedings of the 2nd National Conference
on Scientific and Technical Data, National
Academy of Science, Washington DC, 2000.
Architectural focus
  • Consistent distributed capabilities
  • Resource discovery (data, metadata, services,
    etc), grid-ing loosely coupled science system,
    workflow management
  • On-demand, shared services (E.g. processing,
    translation, etc)
  • Processing
  • Translation
  • Deploy high throughput data movement mechanisms
  • End-to-end capabilities across the science
  • Reduce local software solutions that do not scale
  • Increasing importance in developing an
    enterprise approach with common services
  • Build value-added services and capabilities on
    top of the infrastructure

Exploiting common patterns
  • How data is managed (registry/repository,
    information objects themselves)
  • How data is generated, captured, etc (e.g.,
    workflow and data processing)
  • How data is accessed (metadata, data)
  • How information is discovered
  • How data is distributed (e.g., transformed)
  • How data is visualized

What does OODT do?
  • Tie together loosely coupled distributed
    heterogeneous data systems into a virtual data
  • Support critical functions
  • Data Production and workflow
  • Data Distribution
  • Data Discovery (including query optimization
    across highly distributed systems)
  • Data Access
  • An architectural approach first, an
    implementation second
  • Adapt to different distributed computing
  • Promotes a REST-style architectural pattern for
    search and retrieval
  • Scalability in linking together large,
    distributed data sets

OODT data architecture focus
  • On types of and relationships among a software
    systems data
  • Decomposition of data within a software system to
    its logical components and interactions
  • Components Data Elements, Data Dictionary, Data
    Models of individual data sources
  • Interactions Mappings between Data Dictionary
    to Data Models, Data Element structural
  • Some standards currently exist for data
  • ISO ISO-11179 Standardization and Specification
    of Data Elements
  • Dublin Core Metadata Initiative Dublin Core
    Data Elements to describe any electronic resource
  • Specifications for the Data Architecture
  • Common XML schema for managing information about
    data resources
  • Common XML schema for messaging between
    distributed services
  • Methods for integrating existing domain models
    within architecture

OODT data architecture models
Based on Dublin Core
Request/Response Model
Resource Metadata Model
Based on ISO/IEC 11179
OODT software components
  • Profile Service A server-based registry that is
    able to either serve local XML profiles or
    plug-into an existing catalog. This component
    provides resource discovery.
  • Product Service A server component that plugs
    into existing repositories and serves products.
    This includes translation serves, etc
  • Catalog and Archive Service Transaction-based
    server that catalogs and archives products
    providing profile and product servers for
    discovery and distribution
  • Query Service Provides query management across
    distributed services to enable discovery.

Distributed architecture
3. Repositories for storing and retrieving
many types of data
1. Science data tools and applications use APIs
to connect to a virtual data repository
2. Middleware creates the data grid
infrastructure connecting distributed
heterogeneous systems and data
Mission Data Repositories
OODT Reusable Data Grid Framework
Visualization Tools
Biomedical Data Repositories
Web Search Tools
Engineering Data Repositories
Analysis Tools
Technology architecture
Service Registry
Name Server
Name Server
Registry Server
Node 1 Profile Server
Web I/F
Node 1 Profile Server
Query Integration
Node 1 Profile Server
XML Request
Information Object
Product Catalogs
XML Request
Repository Product Server
XML Request
Desktop I/F
Information Object
Information Object
Science Products
XML Request
Repository Product Server
Info Object
Information Object
Science Products

XML Request
Repository/Archive Server
  • Common Meta Models for Describing Space
    Information Objects
  • Common Data Dictionary end-to-end

Science Products
OODT software implementation
  • OODT is Open Source
  • Developed using open source software (i.e.
    Java/J2EE and XML)
  • Implemented reusable, extensible Java-based
    software components
  • Core software for building and connecting data
    management systems
  • Provided messaging as a plug-in component that
    can be replaced independent of the other core
    components. Messaging components include
  • CORBA, Java RMI, JXTA, Web Services, etc
  • REST seems to have prevailed
  • Provided client APIs in Java, C, HTTP, Python,
  • Simple installation on a variety of platforms
    (Windows, Unix, Mac OS X, etc)
  • Used international data architecture standards
  • ISO/IEC 11179 Specification and Standardization
    of Data Elements
  • Dublin Core Metadata Initiative
  • W3Cs Resource Description Framework (RDF) from
    Semantic Web Community

EDRN Knowledge Environment
  • EDRN has been a pioneer in the use of informatics
    technologies to support biomarker research
  • EDRN has developed a comprehensive
    infrastructure to support biomarker data
    management across EDRNs distributed cancer
  • Twelve institutions are sharing data
  • Same architectural framework as planetary science
  • It supports capture and access to a diverse set
    of information and results
  • Biomarkers
  • Proteomics
  • Biospecimens
  • Various technologies and data products (image,
    micro-satellite, )
  • Study Management

Deployed EDRN System
Application to planetary science
  • Often unique, one of a kind missions
  • Can drive technological changes
  • Instruments are competed and developed by
    academic, industry and industrial partners
  • Highly distributed acquisition and processing
    across partner organizations
  • Highly diverse data sets given heterogeneity of
    the instruments and the targets (i.e. solar
  • Missions are required to share science data
    results with the research community requiring
  • Common domain information model used to drive
    system implementations
  • Expert scientific help to the user community on
    using the data
  • Peer-review of data results to ensure quality
  • Distribution of data to the community
  • Planetary science data from NASA (and some
    international) missions is deposited into the
    Planetary Data System

Earth Science Data Systems
Other Data Systems
Distributed Data Analysis
Web Portal
Airborne Instruments
Data Production/Processing
Data Integration
Data Acquisition/Ingestion
Multi-mission Policies Rules
Local Storage (Models, Data, etc)

(Testbed and Operational Deployed Environments)
Special Product Processing Environment
/ Computational Infra
Surface Instruments
Modeling and Visualization Facility
Application to Climate Research
  • Highly distributed modeling and observational
  • Heterogeneous implementations
  • Different purposes
  • But, brought together as a virtual system,
    provides new science discovery opportunities

NASA Earth System Grid
Lessons Learned
  • A reference architecture is critical for driving
    a strategy and support large-scale/enterprise
  • However, limited experience in organizations to
    build reference architectures
  • Useful ways to represent the architecture can be
  • How detailed to make the reference architecture
    is an art! (Dont let the implementation drive
    the RA)
  • Products lines are useful to providing reusable
    components based on the reference architecture

More Lessons Learned.
  • Distributed service architectures
  • Not anything new (my experience with them goes
    back to the early 1990s)
  • But, often, newer technologies and approaches are
    seen as a panacea
  • Technology is not a replacement for a conceptual
  • My experience is that definition of the
    architecture independent of technology is
  • The goal should be stability in the architecture
    model the selection of appropriate technology
    will change over time
  • This is why an architect is much more of a
    strategist than a technologist

Final Thoughts
  • Software architecture in science is critical to
  • Reducing cost of building science data systems
  • Building virtual organizations
  • Constructing software product lines
  • Driving standards
  • Supporting new paradigms in mission operations
    and scientific research
  • Science is still learning how to best leverage
    technology in a collaborative discovery
    environment, but significant progress is being

  • (1) Tracz, Will. Domain-Specific Software
    Architecture. ACM SIGSOFT, 1995.
  • (2) D. Crichton, S. Kelly, C. Mattmann, Q. Xiao,
    J. S. Hughes, J. Oh, M. Thornquist, D. Johnsey,
    S. Srivastava, L. Esserman, and B. Bigbee. A
    Distributed Information Services Architecture to
    Support Biomarker Discovery in Early Detection of
    Cancer. In Proceedings of the 2nd IEEE
    International Conference on e-Science and Grid
    Computing, pp. 44, Amsterdam, the Netherlands,
    December 4th-6th, 2006.
  • (3) C. Mattmann, D. Crichton, N. Medvidovic and
    S. Hughes. A Software Architecture-Based
    Framework for Highly Distributed and Data
    Intensive Scientific Applications. In Proceedings
    of the 28th International Conference on Software
    Engineering (ICSE06), pp. 721-730, Shanghai,
    China, May 20th-28th, 2006.

EDRNs Ontology Model
  • EDRN has developed a High level ontology model
    for biomarker research which provides standards
    for the capture of biomarker information across
    the enterprise
  • Specific models are derived from this high level
  • Model of biospecimens
  • Model for each class of science data
  • EDRN is specifically focusing on a granular
    model for annotating biomarkers, studies and
    scientific results
  • EDRN has a set of EDRN Common Data Elements
    which is used to provide standard data elements
    and values for the capture and exchange of data

EDRN Biomarker Ontology Model