Knowledgebased InformationDataDriven Science and The Cyberinfrastructure Opportunities for Collabora - PowerPoint PPT Presentation


PPT – Knowledgebased InformationDataDriven Science and The Cyberinfrastructure Opportunities for Collabora PowerPoint presentation | free to download - id: a62b1-NjY3N


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation

Knowledgebased InformationDataDriven Science and The Cyberinfrastructure Opportunities for Collabora


Knowledgebased InformationDataDriven Science and The Cyberinfrastructure Opportunities for Collabora – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Knowledgebased InformationDataDriven Science and The Cyberinfrastructure Opportunities for Collabora

Knowledge-based Information/Data-Driven Science
and The Cyberinfrastructure Opportunities for
Collaboration at Rutgers
  • Manish Parashar
  • The Applied Software Systems Laboratory
  • ECE/CAIP, Rutgers University
  • http//

Outline of this presentation
  • Cyberinfrastructure - Unprecedented Opportunities
  • seamless, access, aggregation, interactions
  • Cyberinfrastructure - Unprecedented Complexity
  • scale, heterogeneity, dynamism, uncertainty
  • Knowledge-based, data-driven science and the
  • research challenges and opportunities
  • Enabling opportunities at Rutgers University
  • seeding, supporting multidisciplinary
    computational science
  • Concluding remarks

Cyberinfrastructure - Unprecedented Opportunities
  • Cyberinfrastructure integrates hardware for
    computing, data and networks, digitally-enabled
    sensors, observatories and experimental
    facilities, and an interoperable suite of
    software and middleware services and tools
  • - NSFs Cyberinfrastructure Vision for 21st
    Century Discovery
  • A global phenomenon
  • Strategic Roadmap for the National Collaborative
    Research Infrastructure Strategy
  • Australian Ministry for Education, Science and
  • Grids and Basic Research Programs
  • Organization for Economic Co-operation and
    Development (OECD), Paris, France.
  • A new paradigm for planetary-scale investigations
  • seamless access
  • resources, services, data, information,
  • seamless aggregation
  • seamless (opportunistic) interactions/couplings

Emerging Computational Science Engineering
  • If you want to understand life, dont think
    about vibrant, throbbing gels and oozes, think
    about information technology
  • - Richard Dawkin, The Blind Watchmaker, 1986
  • Science and engineering research and education
    are foundational drivers of cyberinfrastructure.
  • - NSFs Cyberinfrastructure Vision for 21st
    Century Discovery
  • Information/systems technology and high
    performance applications have become critical
    research modalities in science and engineering
  • Conceptual and numerical models
  • Computational and computer science
  • Information and infrastructure
  • Knowledge-based, data-driven scientific/engineerin
    g investigation can provide dramatic insights
  • symbiotically and opportunistically combine
    computations, experiments, observations, and
    real-time data to model, manage, control, adapt,

Knowledge-based, Information/Data-driven
Scientific Investigation
Examples of Application Areas
  • Hazard prevention, mitigation and response
  • Earthquakes, hurricanes, tornados, wild fires,
    floods, landslides, tsunamis, terrorist attacks
  • Critical infrastructure systems
  • Condition monitoring and prediction of future
  • Transportation of humans and goods
  • Safe, speedy, and cost effective transportation
    networks and vehicles (air, ground, space)
  • Energy and environment
  • Safe and efficient power grids, safe and
    efficient operation of regional collections of
  • Health
  • Reliable and cost effective health care systems
    with improved outcomes
  • Enterprise-wide decision making
  • Coordination of dynamic distributed decisions for
    supply chains under uncertainty
  • Next generation communication systems
  • Reliable wireless networks for homes and
  • Report of the Workshop on Dynamic Data Driven
    Applications Systems, F. Darema et al., March

Source M. Rotea, NSF
The Cyber Infrastructure Unprecedented
  • Unprecedented complexity, challenges
  • Very large scales
  • Ad hoc (amorphous) structures/behaviors of
    virtual organizations
  • p2p/hierarchical architecture
  • Dynamic
  • entities join, leave, move, change behavior
  • Heterogeneous
  • capability, connectivity, reliability,
    guarantees, QoS
  • Unreliable, lack of guarantees
  • components, communication
  • Lack of common/complete knowledge
  • number, type, location, availability,
    connectivity, protocols, semantics, etc.

Grid Computing An Evolving Vision, M. Parashar
and C. Lee, Proceedings of the IEEE, Special
Issue on Grid Computing, IEEE Press, Vol. 19, No.
3, March 2005.
Computational Modeling of Natural Phenomena
  • Realistic, physically accurate computational
  • Enormous computation requirements
  • turbulent flow simulations using active flow
    control for biomedical engineering requires
    5000x1000x5002.5109 grid points and
    approximately 107 time steps, i.e. with 1GFlop
    processors, requires a runtime of 7106 CPU
    hours, or about one month on 10,000 CPUs! (with
    perfect speedup). Also with 700B/pt the memory
    requirement is 1.75TB of run time memory and
    800TB of storage.
  • simulation of the core-collapse of supernovae in
    3D with reasonable resolution (5003) would
    require 20 teraflops for 1.5 months and about
    200 terabytes of storage
  • Dynamic and complex couplings
  • multi-physics, multi-model, multi-resolution, .
  • Dynamic and complex (ad hoc, opportunistic)
  • application ? application, application ?
    resource, application ? data, application ? user,
  • Software/systems engineering issue
  • volume and complexity of code, community of
  • scores of models, hundreds of components,
    millions of lines of code,

Knowledge-based, Information/Data-driven
Investigation Basic Building Blocks
  • A hierarchy of heterogeneous simulation models
  • A system to gather data from archival and dynamic
  • Algorithms to analyze/predict system behavior by
    blending simulation models and data
  • Algorithms to steer and control the data
    gathering and model validation processes
  • The software infrastructure supporting model
    execution, data gathering, analysis/prediction
    and control algorithms

Source M. Rotea, NSF
Enabling Computational Science on CI A
Convergence of Biology and Information Technology
  • Computing has evolved and matured to provide
    specialized solutions to satisfy relatively
    narrow and well defined requirements in isolation
  • performance, security, dependability,
    reliability, availability, throughput,
    pervasive/amorphous, automation, reasoning, etc.
  • current programming paradigms, middleware/systems
    software, management tools, are inadequate to
    handle the scale, complexity, dynamism and
    heterogeneity of emerging systems
  • In case of emerging applications/environments,
    requirements, objectives, execution contexts are
    dynamic and not known a priori
  • requirements, objectives and choice of specific
    solutions (algorithms, behaviors, interactions,
    etc.) depend on runtime state, context, and
  • applications should be aware of changing
    requirements and executions contexts and to
    respond to these changes are runtime
  • Convergence of Biology and Information Technology
    Autonomic Computing
  • nature has evolved to cope with scale,
    complexity, heterogeneity, dynamism and
    unpredictability, lack of guarantees
  • self configuring, self adapting, self optimizing,
    self healing, self protecting, highly
    decentralized, heterogeneous architectures that
    work !!!
  • Goal of autonomic computing is to build a
    self-managing systems/applications that
    self-manage using high level guidance from humans

Project AutoMate Enabling Autonomic
  • Conceptual models and implementation
    architectures for Autonomic Computing
  • programming systems based on popular programming
  • object, component and service based prototypes
  • content-based coordination and messaging
  • amorphous and emergent overlays

AutoMate Enabling Autonomic Grid
Applications, M. Parashar et al, Cluster
Computing The Journal of Networks, Software
Tools, and Applications, Special Issue on
Autonomic Computing, Kluwer Academic Publishers.
Vol. 9, No. 2, pp. 161 174, 2006.
Project AutoMate Components
  • Accord A Programming System for Autonomic Grid
  • Rudder/Comet Decentralized Coordination
  • ACE Autonomic Composition Engine
  • Meteor Content-based Middleware
  • Squid Decentralized Information Discovery and
    Content-based Routing
  • SESAME Context-Aware Access Management
  • DAIS Cooperative Protection against Network
  • More information/Papers http//automate.rutgers.

SciDIT Scientific Data-In-Transit
Knowledge-based Data-driven Management of
Subsurface Geosystems The Instrumented Oil Field
Closing the loop with optimization
The Instrumented Oil Field of the Future (UT-CSM,
UT-IG, RU, OSU, UMD, ANL) (NSF ITR 01, 04)
  • Production of oil and gas can take advantage of
    permanently installed sensors that will monitor
    the reservoirs state as fluids are extracted
  • Knowledge of the reservoirs state during
    production can result in better engineering
  • economical evaluation physical characteristics
    (bypassed oil, high pressure zones) productions
    techniques for safe operating conditions in
    complex and difficult areas

Application of Grid-Enabled Technologies for
Solving Optimization Problems in Data-Driven
Reservoir Studies, M. Parashar, H. Klie, U.
Catalyurek, T. Kurc, V. Matossian, J. Saltz and M
Wheeler, FGCS. The International Journal of Grid
Computing Theory, Methods and Applications
(FGCS), Elsevier Science Publishers, Vol. 21,
Issue 1, pp 19-26, 2005.
Effective Oil Reservoir Management Well
  • Why is it important
  • Better utilization of existing reservoirs
  • Discovering new reservoirs
  • Minimizing adverse effects to the environment

Less Bypassed Oil
Much Bypassed Oil
Effective Oil Reservoir Management Well
  • What needs to be done
  • Exploration of possible well placements and
    configurations for optimized production
  • Understanding field properties and interactions
    between and across subdomains
  • Tracking and understanding long term changes in
    field characteristics
  • Challenges
  • Geologic uncertainty Key engineering properties
  • Large search space Infinitely many production
    strategies possible
  • Complex physical properties and interactions.
    Complex numerical models

Autonomic Well Placement/Configuration
Autonomic Oil Well Placement/Configuration
Contours of NEval(y,z,500)(10)
Pressure contours 3 wells, 2D profile
Requires NYxNZ (450) evaluations. Minimum appears
VFSA solution walk found after 20 (81)
Autonomic Oil Well Placement/Configuration (VFSA)
An Reservoir Framework for the Stochastic
Optimization of Well Placement, V. Matossian, M.
Parashar, W. Bangerth, H. Klie, M.F. Wheeler,
Cluster Computing The Journal of Networks,
Software Tools, and Applications, Kluwer Academic
Publishers (to appear). Autonomic Oil Reservoir
Optimization on the Grid, V. Matossian, V. Bhat,
M. Parashar, M. Peszynska, M. Sen, P. Stoffa and
M. F. Wheeler, Concurrency and Computation
Practice and Experience, John Wiley and Sons,
Volume 17, Issue 1, pp 1 26, 2005.
Autonomic Oil Well Placement/Configuration (SPSA)
Solution for 7 different initial guesses
Convergence history
Optimal Well Placement
Comparison of optimization approaches
Optimal solution F -1.098E8
  • Learned lessons
  • Robust stochastic algorithms increases the
    chances to find (near) optimal solutions (VFSA)
  • Several trials of a fast algorithm pay off
    against sophisticated algorithms (SPSA)
  • Need to develop hybrid strategies

Knowledge-based Data-driven Management of
Subsurface Geosystems
Knowledge-based Data-driven Management of
Subsurface Geosystems The Instrumented Oil Field
Detect and track changes in data during
production. Invert data for reservoir
properties. Detect and track reservoir
changes. Assimilate data reservoir properties
into the evolving reservoir model. Use
simulation and optimization to guide future
Data Driven
Model Driven
Management of the Ruby Gulch Waste Repository
(DoE SciDAC, with UT-CSM, INL, OU, RU)
Adaptive Fusion of Stochastic Information for
Imaging Fractured Vadose Zones (NSF SEIII, with U
of AZ, OSU, U of IW)
  • Near-Real Time Monitoring, Characterization and
    Prediction of Flow Through Fractured Rocks

Parameters, Boundary Initial Conditions
System responses
Inverse Modeling
Forward Modeling
Comparison With observations
Network design
Data-Driven Forest Fire Simulation (DoE SCIDAC,
with U of AZ)
  • Predict the behavior and spread of wildfires
    (intensity, propagation speed and direction,
    modes of spread)
  • based on both dynamic and static environmental
    and vegetation conditions
  • factors include fuel characteristics and
    configurations, chemical reactions, balances
    between different modes of hear transfer,
    topography, and fire/atmosphere interactions.

Self-Optimizing of Large Scale Wild Fire
Simulations, J. Yang, H. Chen, S. Hariri and
M. Parashar, Proceedings of the 5th International
Conference on Computational Science (ICCS 2005),
Atlanta, GA, USA, Springer-Verlag, May 2005.
System for Laser Treatment of Cancer UT, Austin
Source L. Demkowicz, UT Austin
Synthetic Environment for Continuous
Experimentation Purdue University
Source A. Chaturvedi, Purdue Univ.
Integrated Wireless Phone Based Emergency
Response System Notre Dame
  • Detect abnormal patterns in mobile call activity
    and locations
  • Initiate dynamic data driven simulations to
    predict the evolution of the abnormality
  • Initiate higher resolution data collection in
    localities of interest
  • Interface with emergency response Decision
    Support Systems

Source G. Madey, ND
Adaptive Cyberinfrastructure for Threat
Management in Urban Water Distribution Systems
NC State Univ.
Source K. Mahinthakumar, NCSU
Structural Health Monitoring and Critical Event
Prediction Stanford University
Source C. Farhat, SU
Auto-Steered Information-Decision Processes for
Electrical Systems Asset Management Iowa State
  • Develop a hardware-software prototype for
    auto-steering the information-decision cycles
    inherent to managing operations, maintenance,
    planning of high-voltage electric power
    transmission systems.

Source J. McCalley, Iowa State
Scientific Investigation and CI Challenges and
  • Applications
  • Algorithms
  • Measurement systems
  • Systems software
  • Community testbeds
  • Report of the Workshop on Dynamic Data Driven
    Applications Systems, F. Darema et al., March

Challenges and Opportunities Applications (I)
  • Dynamic and continuous validation of models,
    algorithms, systems, and (emergent) system of
  • Uncertainty quantification
  • novel methods are needed to effectively quantify
    the end-to-end measures of uncertainty, as
    models, data, and algorithms are cycled in and
  • Dynamic data driven, on demand scaling and
  • application driven on demand scaling and
    configuration of measurement systems for dynamic
    multi-resolution data inputs
  • Observability, identifiability and tractability
  • establishing the pedigree of data and
    applications that contributed to the result is
    critical and challenging

Challenges and Opportunities Applications (II)
  • Self-organization of measurement systems, models,
    applications, workflows, and organizations
  • Human in the loop decision-making and
    distribution of tasks among human and non-human
  • new theories and paradigms are needed to
    distribute intelligence and tasks among human and
    non-human actors
  • Community frameworks/tools for rapid
  • Accommodation of plurality of system demands
  • (thoughts social, economic impact vs. rapid
    response, and view points outside the system or
    inside the system)

Challenges and Opportunities Algorithms (I)
  • Methods related to the measurement/data feedback
    on the computational model
  • mechanisms in which the computational models
    respond to the information coming from the
  • examples include new ways of data assimilation
    (ensemble Kalman-like or hybrid stochastic
    estimation for messy, incomplete, and possibly
    out of time order data) and uncertainty
    estimation, adaptive parameter updating, dynamic
    model selection, and fast nonlinear update
    methods combined with multiscale interpolation of
    sensor data to form self correction methods, etc.
  • Methods related to the computational model
    feedback on the measurement/data
  • mechanisms in which the data collection and
    instrumentation responds to the results from the
    computational models
  • examples include instrument placement and
    control, data relevance assessment, noise
    quantification and qualification, and robust
    dynamic optimization, sensor steering, and
    targeted observations/data refinement.

Challenges and Opportunities Algorithms (II)
  • Advances in computational simulation tools for
  • new advances in existing computational
  • uncertainty estimation and propagation, adaptive
    discretization schemes, fast linear and nonlinear
    solvers that enable high-fidelity real-time
    simulations, fast error estimation, dynamic
    validation and verification, appropriate
    couplings for multiscale, multiphysics
    simulations, and system identification in the
    context of dynamic data assimilation, are
    examples of simulation-level tools that need to
    be developed
  • System and integration of related algorithms
  • real time algorithms for managing large dataware,
    managing processing, parallel prototyping,
    guaranteed and consistent resources (including
    networks), dynamic scheduling of processors,
    networks, Grids, and changes in allocations,
    process migration, check-pointing, fast
    visualization techniques, algorithms appropriate
    for sensor embedding, and secure transmission

Challenges and Opportunities Measurement Systems
  • Components
  • instruments, sensors, databases, human inputs and
    other devices for taking measurements data
    quality assessment, data formatting, and feature
    extraction and data measurement steering
  • Issues
  • on-demand data collection and management, data
    streaming into the simulation, data
    representation, data models and congruence of
    data, real-time constraints, data
    processing/preprocessing, data collection rates,
    consumption rates, available bandwidths and other
    resources and how to discover, obtain, establish
    the authenticity and correctness and how to
    maintain an audit trail

Challenges and Opportunities Systems Software (I)
  • Programming models and system
  • support the formulation of application components
    and applications that are capable of correctly
    and consistently adapting their behaviors,
    interactions and compositions in real time in
    response to dynamic data and application/system
    state, while satisfying real time, space,
    functional, performance, reliability, security,
    and quality of service constraints
  • Data management (acquisition, assimilation,
    transport, manipulation, exploration) mechanisms
    and services
  • enable seamless acquisition of high data rate
    high volume data from varied, distributed and
    possible unreliable data sources, while
    addressing stringent real time, space and data
    quality constraints, and enabling the seamless
    and dynamic integration of this data with
    computation and resources

Challenges and Opportunities Systems Software
  • Runtime execution and middleware services
  • support dynamic, data and knowledge driven and
    time constrained executions, adaptations,
    interactions, compositions for application
    elements, while guaranteeing reliable and
    resilient execution and predictable and
    controllable response times
  • Computation infrastructures
  • support immediate on-demand and anticipatory
    resource co-allocations and co-scheduling,
    dynamic, secure and seamless aggregation of and
    interactions betweens distributed resources and
    varied data sources, dynamic configuration and
    instantiation and reliable and traceable
    execution of application workflows, real time and
    soft real time QoS guarantees, pervasive remote
    monitoring and access and human-in-the-loop
  • Community testbeds

Building Collaborations at Rutgers (IMHO)
  • Application are naturally large and complex
  • Computational science is the critical
  • applied computer (computational ...) science
    is now playing the role which mathematics the
    role which mathematics did from the seventeenth
    through the twentieth centuries providing an
    orderly, formal framework and exploratory
    apparatus for other sciences
  • - G. Djorgovski, Director, CACR, Caltech,
  • Need to seed, catalyze and nurture
    multi-disciplinary collaborations
  • productive, lasting collaborations are
  • built bottom-up
  • must be win-win
  • need time and effort
  • Resources and infrastructure is crucial
  • compute (capability, capacity), communication
    (NLR), collaboration, data, visualization, etc.
  • Home for Multidisciplinary Computational Science
    at Rutgers !

  • Next Generation Scientific Investigation
    Knowledge-based, data and information driven,
    dynamically adaptive applications on the
  • Unprecedented opportunities for global scientific
  • can enable accurate solutions to complex
    applications provide dramatic insights into
    complex phenomena
  • Unprecedented research challenges
  • scale, complexity, heterogeneity, dynamism,
    reliability, uncertainty,
  • applications, algorithms, measurements,
    data/information, software
  • Project AutoMate Autonomic Computational Science
    on the Grid
  • Accord, Rudder/Comet, Meteor, Squid, Topos,
  • More Information, publications, software

The Team
  • TASSL, CAIP/ECE Rutgers University 
  • Viraj Bhat  
  • Sumir Chandra
  • Andres Q. Hernandez
  • Nanyan Jiang
  • Zhen Li (Jenny)  
  • Vincent Matossian 
  • Cristina Schmidt    
  • Mingliang Wang
  • Li Zhang  
  • Key CE/CS Collaborators
  • Rutgers Univ.
  • D. Silver, D. Raychaudhuri, P. Meer, M. Bushnell,
  • Univ. of Arizona
  • S. Hariri
  • Ohio State Univ.
  • T. Kurc, J. Saltz
  • GA Tech
  • Key Applications Collaborators
  • Rutgers Univ.
  • R. Levy, S. Garofilini
  • D. Foran, M. Reisse
  • CSM/IG, Univ. of Texas at Austin
  • H. Klie, M. Wheeler, M. Sen, P. Stoffa
  • S. Klasky, C.S. Chang
  • CRL, Sandia National Lab., Livermore
  • J. Ray, J. Steensland
  • Univ. of Arizona/Univ. of Iowa, OSU
  • T. C. J. Yeh, J. Daniels, A. Kruger
  • Idaho National Laboratory
  • R. Versteeg
  • PPPL
  • R. Sataney
  • ASCI/CACR, Caltech