Title: Cyberinfrastructure As A Critical Component in Advancing Research and Education
1Cyberinfrastructure As A Critical Component in
Advancing Research and Education
Bob Wilhelmson National Center for
Supercomputing Applications Director of Cyber
Applications and Communities Chief Science
Officer Department of Atmospheric
Sciences Professor bw_at_ncsa.uiuc.edu
2It's All About Enabling Science, Engineering,
Humanities, and the Arts
3In a Multi World!
- Multidisciplinary (e.g. earth science, virtual
lung, .) - Multiscale
- Multipurpose (research, prediction, education)
- Multiservices (simulation, data collection,
mining, archiving visualization) - Multiresources (shared and distributed memory,
innovative hardware, high speed networking,
rotating disk and archival store, desktop to
petascale functionality) - Multiagency (NSF, DOE, NIH, NASA, NOAA, USGS,
DOD,.) - Multicommunity (LEAD, GEON, SEEK, LTER, CLEANER,
CUAHSI, NEON, ..)
Collaboration Needed
4In a Multi World!
- Multidisciplinary (e.g. earth science, virtual
lung, .) - Multiscale
- Multipurpose (research, prediction, education)
- Multiservices (simulation, data collection,
mining, archiving visualization) - Multiresources (shared and distributed memory,
innovative hardware, high speed networking,
rotating disk and archival store, desktop to
petascale functionality) - Multiagency (NSF, DOE, NIH, NASA, NOAA, USGS,
DOD,.) - Multicommunity (LEAD, GEON, SEEK, LTER, CLEANER,
CUAHSI, NEON, ..)
Innovation Needed
5In a Multi World!
- Multidisciplinary (e.g. earth science, virtual
lung, .) - Multiscale
- Multipurpose (research, prediction, education)
- Multiservices (simulation, data collection,
mining, archiving visualization) - Multiresources (shared and distributed memory,
innovative hardware, high speed networking,
rotating disk and archival store, desktop to
petascale functionality) - Multiagency (NSF, DOE, NIH, NASA, NOAA, USGS,
DOD,.) - Multicommunity (LEAD, GEON, SEEK, LTER, CLEANER,
CUAHSI, NEON, ..)
Leadership Needed
6Cyberinfrastructure Beyond Just Computing
- Todays computer is a coordinated set of
hardware, software, and services providing an
end-to-end resource. - Cyberinfrastructure captures how the science and
engineering community has redefined computer
The computer as an integrated set of resources
Source Fran Berman
7Cyberinfrastructure
Cyberinfrastructure is the coordinated
aggregate of software, hardware and other
technologies, as well as human expertise,
required to support current and future
discoveries in science and engineering.
NSF Blue Ribbon Panel (Atkins) Report provided
compelling and comprehensive vision of an
integrated cyberinfrastructure
Thanks to cyberinfrastructure and information
systems, todays scientific tool kit includes
distributed systems of hardware, software,
databases and expertise that can be accessed in
person or remotely. Arden Bement, NSF
Director February, 2005
Source Adapted from Fran Berman
8When Did CI Emerge?
Cyberinfrastructure
TCS, DTF, ETF
Terascale
NPACI and Alliance
PACI
NSF Networking
Prior Computing Investments
Supercomputing Centers
1985 1990 1995 2000
2005 2010
Source NSF CISE
9Background Reports
10CI Needs Driven By
- From simple to complex
- From prototype CI to reliable, robust software
that - Dramatically increases productivity
- That enables new science
- That requires very stable base middleware
- From the desktop (high resolution stereo) to
the petaflop (10,000 to 100,000 processors just
for you) - From focused systems to balanced systems
- From observations, theory, and modeling to new
knowledge - From interactive to on-demand to real-time to
batch - From hundreds/thousands of capability (most)
simulations to large (both space and time)
leadership computations - From 10s of gigabytes to petabytes of data
- From disciplinary to interdisciplinary
collaboration - From small to large communities
11General Principles for CI in the Geosciences
- Cyberinfrastructure must serve geoscience.
Therefore, it must be developed in response to
the community, not imposed on the community - Much of the development must be done in
partnership with computer scientists because it
involves substantial computer science innovation - Cyberinfrastructure development is expensive,
therefore we should encourage as much re-use of
developments as is possible - Work specific to individual fields should be
reviewed by those fields as they are the ones who
will ultimately use the infrastructure
Leinen
12CI-Enhanced Knowledge CommunitiesCI Two Years
after the Blue Ribbon Panel Report Dan Atkins
Elevator Speech Advanced CI is critical to
innovation. Innovation is critical to leadership
in global, knowledge-based economies.
- We must now invest in IT as institutionalized,
sustained, evolving but robust infrastructure
that researchers will bet their careers on - My history of technology and civil engineering
friends are quick to remind me that
infrastructure is among the most complex and
costly undertakings of modern society - There exists a stew bubbling with activities
called cyberinfrastructure, e-science, grids,
collaboratories complementary visions and
activities - We need to cooperate in new ways in order to
compete (co-optition)
13Virtual Organization Conceptual View of
Information Infrastructure
Identity management
SCIENCE
Portal
Scientists
Registration service
Local resources
Information resources
Task
Observing Systems
Credit Leinen Meacham
Hardware resources
14NSFS CYBERINFRASTRUCTURE VISION FOR 21ST CENTURY
DISCOVERY
- Call to Action
- Strategic Plan for High Performance Computing
(2006-2010) - Strategic Plan for Data, Data Analysis and
Visualization (2006-2010) - Strategic Plan for Collaboratories, Observatories
and Virtual Organizations (2006-2010) - Strategic Plan for Education and Workforce
(2006-2010)
15Its a Cyber World!
16Definitions
- A cybercommunity is a distributed group of people
with common goals and it ranges from a few
individuals to an interdisciplinary or
international group. These groups can include,
researchers, policy makers, responders,
educators, and citizens and often have a long
term identity and purpose. - A cyberenvironment is a subset of general CI
capabilities and functionality that is designed
and built to meet the needs of a particular
community. It includes use of broadly used
middleware and networks as well a community
specific facilities, software frameworks,
networks, and people. Further, itis persistent,
robust, and supported. - A cyberservice is a web or grid service, a
software tool or a toolkit, a model or collection
of models, etc.
17Definitions
Cyberenvironments composed of cyberservices that
enable cyberscience within cybercommunities
- A cybercommunity is a distributed group of people
with common goals and it ranges from a few
individuals to an interdisciplinary or
international group. These groups can include,
researchers, policy makers, responders,
educators, and citizens and often have a long
term identity and purpose. - A cyberenvironment is a subset of general CI
capabilities and functionality that is designed
and built to meet the needs of a particular
community. It includes use of broadly used
middleware and networks as well a community
specific facilities, software frameworks,
networks, and people. Further, itis persistent,
robust, and supported. - A cyberservice is a web or grid service, a
software tool or a toolkit, a model or collection
of models, etc.
18NCSAs Strategic Directions
- Cyber-resources
- enabling discovery at the leading edge
- Leading edge computing and data storage resources
as well as network connectivity - User services to help make effective use of
high-end computing resources - Cyberenvironments
- harnessing the power of the national
cyberinfrastructure - Integrated, end-to-end software environments to
provide access and ability to coordinate,
automate, and apply high-end resources and
capabilities - Cyberservices and cybertechnologies needed to
build cyberenvironments - Innovative Computing Systems
- defining the path to petascale computing
- Innovative computing systems that promise to
significantly decrease the cost and/or extend the
range of computational science and engineering
19Why Cyberenvironments?
- Mosaic
- By early 1990s, the internet had a wealth of
resources, but they were inaccessible to most
scientists - Mosaic facilitated the use of the internet by all
scientists (and, eventually, by laymen!) - Cyberenvironments
- Cyberenvironments will facilitate the use of
cyber-infrastructure by all scientists
20CyberenvironmentsBeyond Web Portals
- Web Portals
- Reduce barrier to accessing cyberinfrastructure
by providing convenient point-and-click
interface - Broaden access to and use of cyberinfrastructure
- Cyberenvironments
- Help manage large-scale, complex and
interdependent projects and processes - Help manage diverse experimental, computational
and data resources - Bridge local, institutional, national and
international cyberinfrastructure to create a
seamless environment - Assist in the bi-directional connection between
raw research artifacts and published artifacts
21Environmental Cyberenvironment Prototype
- CLEANER
- Collaborative Large-scale Engineering Analysis
Network for Environmental Research - Human-dominated, complex environmental systems,
e.g., - River basins
- Coastal margins
- What researchers requested
- Access to live and archived
- sensor data
- Analyze, visualize and compare
- data
- Link to computational models
- Collaborate with colleagues
- Organize, automate and share cyber-research
processes
Users can simultaneously view and discuss data
and analyses
22CyberenvironmentsMAEViz Earthquake Engineering
Hazard Definition
Inventory Aggregation
Fragility Models
Damage Prediction
- Elements of MAEViz
- State of the art engineering
- Distributed data/metadata sources
- GIS with visual overlays
- Collaboration environment
- Builds on some NEESgrid technologies
Decision Support
23Building Cyberenvironments
Scientific Pathfinders Scientific Communities
Technology Roadmaps
Scientific Roadmaps
Partners
Integrated Project Teams
Requirements Analysis Specification
SDSC PSC TeraGrid Working Groups Advisory
Committees Industrial Partners International Partn
ers
Portals GUIs Workflow Mgmt SE
Applications Data Mining Analysis Visualization
Webservices Collaboratories Middleware Security
Development System Integration
Prototype or Production Cyberenvironments
Cyberarchitecture Working Group
Research Applications
Scientific Discoveries
24 Engaging and Enabling Communities
Scientists, Engineers, Decision Makers, Policy
Makers, Media and Citizens Engaging in
discovery, analysis, discussion, deliberation,
decisions, policy formulation and communication
Collaboration Framework facilitates Idea and
Knowledge Sharing, eLearning and Multi-Objective
Decision Support Processes
Analysis Framework facilitates Data and Model
Discovery, Exploration, and Analysis via the
Collaboration Framework
Data Management Framework builds logical maps of
distributed, heterogeneous information resources
(data, models, tools, etc.) and facilitates
their use via the Analysis and Collaboration
Frameworks
Physical Infrastructure
25CyberenvironmentsStructure of Cyberenvironments
26An Example - Linked Environments for Atmosheric
Discovery (NSF Large ITR Project)
27My View of The World
Meteorology CS
CS Meteorology
28BIRN Core Software Infrastructure
Distributed Resources
Courtesy Mark Ellisman
29The Basic Idea of Web Services
A stock quote service
getLastTradePrice( tickerSymbol ) returns
price
getLastTradePrice( SGI)
1.73
- A Web Service is
- A network service that provides a functional
interface for remote clients. - An interface is a set of operations the service
performs - Big collaboration between big players
- IBM, Microsoft, Oracle, HP, Sun everybody
- Central to MS .NET and IBM services plans
- Provides a better way to factor complex
distributed applications into basic, reusable,
reliable services.
Credit Dennis Gannon
30So why web services?
- The web works by a simple set of http commands
- Get and Post/Put.
- Complex requests like
- Get me a non-smoking double room at the special
rate at the hilton and bill it to the company
are more complex requests that are encoded in
awkward URL strings. This is very limited. - A web service declares in a WSDL doc,
- Here is what services I provide. Use this XML
language and interface definition to send me
requests and here is how I will respond in XML.
Credit Dennis Gannon
31Web Services Description Language (WSDL)
- A standard
- A description of types and Ports.
- A port is a set of operation the port can do.
- WSDL has XML description of these operations and
their arguments and response types. - WSDL is an XML document that
- Describes the interfaces types for each port
- The contents of messages it receives and sends
- Describes the bindings of interfaces to protocols
- SOAP XML over HTTP is default, others possible
- Describes the access points (host/port) for the
protocol bindings
Credit Dennis Gannon
32NCSAs ALG Research, Development, Technology
Transfer Model (limited web services)
33Knowledge Discovery Process
34Advantages of a Framework for Analytics Such as
D2K
- Scalable Desktop Web Services Grid Services
- Visual programming system employing a
data/workflow paradigm - Integrated environment for models and
visualization - Capability to access data management tools
transparently from multiple sources - Capability to build custom applications rapidly
- Data mining algorithms complex data paradigms
35D2K Infrastructure, Modules, Itineraries, and
Applications
- D2K Infrastructure
- D2K API, data flow environment, distributed
computing framework and runtime system - D2K Modules
- Computational units written in Java that follow
the D2K API - D2K Itineraries
- Modules that are connected to form an application
- D2K Toolkit
- User interface for specification of itineraries
and execution that provides the rapid application
development environment - D2K-Driven Applications
- Applications that use D2K modules, but do not
need to run in the D2K Toolkit - D2KSL Web/Grid Services
- Applications that provide user specific GUI or
application service
36A Value of Cyberenvironments (CEs)The 80 20
Flip
- What if a graduate student or a researcher could
spend 80 of their time on science and only 20
on grunt work through technology and software
innovation? - Answer great things!
- Examples
- Gaskins finding subsequence series in a protein
through analytics - Lewin new advances through info visualization
37Evolution Highway in D2K
- Uses the D2K web service to deliver the
visualization application - Provides a visual means for simultaneously
comparing mammalian genomes of humans, horses,
cats, dogs, pigs, cattle, rats, and mice - Removes the burden of manually aligning these
maps - Allows cognitive skills to be used on something
more valuable than preparation and transformation
of data - evolutionhighway.ncsa.uiuc.edu went live on July
22, 2005 - Science, Vol 309, Issue 5734, 613-617 , 22 July
2005
38First Simulation of an Entire Life Form -
Satellite Tobacco Mosaic Virus
- Klaus Schulten and collaborators at UIUC
- Up to 1 million atoms for 50 nanoseconds
- NAMD modern, scalable molecular dynamics code used
39Unprecedented Planning/Vision
40Survey Findings / Recommendations
Survey
Recommendations
- Expected requirement
- ten to one thousand times the current ITI
hardware capacity over the next five to ten
years, - most critical bottlenecks occurring in the
availability of cpu cycles, memory and
mass-storage capacity, and network bandwidth. - Software systems
- need to re-engineer models, and data analysis and
assimilation packages, for efficient use on
massively parallel computers - advances in visualization techniques to deal
effectively with increasing volumes of
observations and model output - well-designed, documented and tested community
models of all types. - Extreme shortage of skilled ITI technical
personnel accessible to the ocean sciences
community
- Improve access to high-performance computational
resources across the ocean sciences. - Provide technical support for maintenance and
upgrade of local ITI resources. - Provide model, data and software curatorship.
- Facilitate advanced applications programming
41Cyberinfrastructure for the Atmospheric Sciences
in the 21st Century (2004) General
Recommendations
- Give human resource issues top priority
- Academic reward structure
- Investments in CI personnel
- Encourage CI education of AS students, educators,
support staff and scientists - Support mechanisms of communication and
dissemination of ideas, technologies and
processes to promote interdisciplinary
understanding
42Cyberinfrastructure for the Atmospheric Sciences
in the 21st Century (2004) General
Recommendations
- Fund entire software life cycle including
development, testing, hardening, deployment,
training, support and maintenance - Invest in computing infrastructure and capacity
building at all levels, including centers,
campuses, and departments - Support development of geosciences
cyber-environments that allow the seamless
transport of work from the desktop to
supercomputers to the Grid
43Cyberinfrastructure for the Atmospheric Sciences
in the 21st Century (2004) General
Recommendations
- Help organize and coordinate CI across GEO
- Geoinformatics Steering Committee
- Geosciences Technology Forum
- Coordinate with NASA, NOAA, and other agencies to
enable finding, using, publishing, and
distributing geoscience data - Appropriate standards for metadata
44Petascale Collaboratory
Overarching Recommendation Establish a
Petascale Collaboratory for the Geosciences with
the mission to provide leadership-class
computational resources that will make it
possible to address, and minimize the time to
solution of, the most challenging problems facing
the geosciences.
DRAFT
45LEAD A CI Research and Development Effort
Funded by NSF (http//lead.ou.edu)
46(No Transcript)
47LEAD Project Motivation
- Each year, mesoscale weather floods, tornadoes,
hail, strong winds, lightning, and winter storms
causes hundreds of deaths, routinely disrupts
transportation and commerce, and results in
annual economic losses gt 13B.
Source Kelvin Droegemeier
48The LEAD Goal
- To create an grid-based integrated, scalable
framework that allows analysis tools, forecast
models, and data repositories to be used as
dynamically adaptive, on-demand systems that can - operate independent of data formats and the
physical location of data or computing resources - change configuration rapidly and automatically in
response to weather - continually be steered by new data (i.e., the
weather) - respond to decision-driven inputs from users
- initiate other processes automatically and
- steer remote observing technologies to optimize
data collection for the problem at hand
49Sample Problem Scenario
Streaming Observations
50Why Is LEAD a Collaboration?
- LEAD is working to develop a comprehensive
national cyberinfrastructure for mesoscale
meteorology research, education, and prediction.
It is addressing the fundamental information
technology (IT) research challenges needed to
create an integrated, scalable environment for - identifying,
- accessing,
- preparing,
- assimilating,
- predicting,
- managing,
- analyzing,
- mining, and
- visualizing
- a broad array of meteorological data and model
output, independent of format and physical
location and having dynamically adaptive,
on-demand response.
51Other Geoscience ITRs and Projects
- GEON
- SCEC
- LTER
- CUASHI
- CLEANER
- NEON
- LOOKING
- MAEVIS
- SEEK
- EARTHSCOPE
- ORION
- BIRN
- CHRONOS
- NEESGRID
52First Remote Interactive High Definition Video
Exploration of Deep Sea Vents
Canadian-U.S. Collaboration
Source John Delaney Deborah Kelley, UWash
53Chemistry
- Science Drivers
- Multiscale modeling including high-dimension,
chemical-accuracy potential energy surfaces - Real-time feedback to control of reacting systems
monitored by sensor technology - Prediction of optimal experiments (lower cost of
discovery and process design) - Validation of computational models vs.
experimental data, and vice versa - Cyber-Enabled Chemistry
- New paradigm for information flow (transparent
resource sharing such as data grids rather than
centrally stored data bases workflow management
tools) - New paradigm for shared instrumentation (remote
chemistry) including broadening participation - Interfacing data and software across disciplines
(interoperability), and development of cyber
collaboration tools
Rohlfing
54IT Challenges from a Scientific Perspective
Organ(ism)
Integration Across Scale Data Algorithms
Cells/Cell interactions
Cellular components
Molecular assemblies
Macromolecular structure dynamics
Mechanism, Property Response
Atomic level Structure
55Cosmic Simulator with a Billion Zone and
Gigaparticle Resolution
Source Mike Norman, UCSD
Compare with Sloan Survey
SDSC Blue Horizon
56Why Does the Cosmic SimulatorNeed
Cyberinfrastructure?
- One Gigazone Run
- Generates 10 TeraByte of Output
- A Snapshot is 100 GB
- Need to Visually Analyze as We Create SpaceTimes
- Visual Analysis Daunting
- Single Frame is About 8GB
- A Smooth Animation of 1000 Frames is 1000 x 8
GB8TB - Stage on Rotating Storage to High Res Displays
- Can Run Evolutions Faster than We can Archive
Them - File Transport Over Shared Internet 50 Mbit/s
- 4 Hours to Move ONE Snapshot!
- Many Scientists Will Need Access for Analysis
Source Mike Norman, UCSD
57Limitations of Uniform Grids for Complex
Scientific and Engineering Problems
512x512x512 Run on 512-node CM-5
Source Greg Bryan, Mike Norman, NCSA
58Develop Automatic Mesh Refinement (AMR) to
Resolve Mass Concentrations
64x64x64 Run with Seven Levels of Adaption on SGI
Power Challenge, Locally Equivalent to
8192x8192x8192 Resolution
Source Greg Bryan, Mike Norman, John Shalf, NCSA
59Cosmic SimulatorThresholds of Capability and
Discovery
- 2000 Formation of Galaxy Cluster Cores (1
TFLOP/s) - 2006 Properties of First Galaxies (40 TFLOP/s)
- 2010 Emergence of Hubble Types (150 TFLOP/s)
- 2014 Large Scale Distribution Of Galaxies By
Luminosity And Morphology (500 TFLOP/s)
Source Mike Norman, UCSD
60Biomedical Information Research Network
- Enable new understanding of neurological disease
by integrating data across multiple scales from
macroscopic brain function to its molecular and
cellular underpinnings - Federate distributed multiscale brain data
- Accommodate associated large scale computational
requirements - Provide Infrastructure for Next Generation
Collaboratory
Scales of NS from Maryann Martone
61What BIRN is doing
- Integrating the activities of the most advanced
biomedical imaging and clinical research centers
in the U.S. - Serving as a model for programs
everywhere - Establishing distributed and linked data
collections with partnering groups - create a
Data GRID for the BIRN - Facilitating the use of "grid-based"
computational infrastructure and integrate BIRN
with other GRID middleware projects - Enabling data mining from multiple distributed
data collections or databases on neuroimaging and
bioinformatics - Building a stable software and hardware
infrastructure that will allow centers to
coordinate efforts to accumulate larger studies
than can be carried out at one site. -
62What BIRN is doing
- Changing the use pattern for research data from
the individual laboratory/project to shared use. - Defining processes, procedures and establishing
best practices so that the BIRN is reliable,
scalable and extensible to biomedical research
programs outside of the pioneering Neuroimaging
Test-beds - able to support the work of thousands
of researchers. - Pushing the envelope of biomedical informatics
and computer science by causing the development
of new techniques in databases, information
retrieval, visualization and computational
processing.
63BIRN Network
Yale New Haven
MIT
UCSF San Francisco
Memphis Tenn
64CARMA
- Long history of important contributions at NCSA
in radio astronomy - MIRIAD software package originated at NCSA
contributed to other community codes - BIMA archive (1.25 TB) developed and based at
NCSA - BIMA pipeline was developed and is deployed at
NCSA - Major NCSA CARMA involvement is in data reduction
pipeline, archiving and databases
Combined Array for Research in Millimeter
Astronomy (CARMA)
Effort led by Dick Crutcher and Athol Kimball
65Building prototypes for LSST
- LSST a new telescope for exploring the variable
universe and dark energy - 3.2 GPixel camera will image entire available sky
every 3 days - At first light in 2013, telescope will produce 15
TB/night of raw data 130 TB/night of processed
products
- Goal for 2006 to build and test a working
automated processing system as input to the LSST
construction proposal - Deploy the NOAO Science Archive at NCSA as the
foundation for the LSST archive - use it as a testbed for advanced data access
mechanisms while serving an existing user base - we will deploy an NVO-interoperable security
framework based on grid security tools - In 2005
- We deployed automated data mirroring system based
on SRB NCSA BIMA Archive system - Currently mirroring data from NOAO telescopes
- Deploy an intelligent data access system that
uses grid tools to efficiently distribute data
across a cluster. - Integrate grid-based data workflow systems for to
automatically create processed data products
using Teragrid - In 2005 we deployed and demonstrated an early
version for processing simulated data on Teragrid - Apply automated system to the first LSST Data
Challenge using precursor data - In 2005 we created the LSST Precursor Data
Archive for LSST developers nationwide
66Large Environmental ObservatoriesNeeding
Cyberenvironments
- CUAHSI (Consortium of Universities for the
Advancement of Hydrologic Sciences Inc.) for
hydrology - NEON (National Ecological Observatory Network)
for ecology - LOOKING (Laboratory for the Ocean Observatory
Knowledge Integration Grid) - CLEANER (Collaborative Large Scale Engineering
Analysis Network for Environmental Research) for
environmental engineering - LTER (U.S. Long-Term Ecological Research Network)
investigating ecological processes over long
temporal and broad spatial scale
67CLEANER-Hydrologic Observatories MISSION
STATEMENT
To transform our understanding of the earths
water and related biogeochemical cycles across
spatial and temporal scales to enable
forecasting of critical water-related processes
that affect and are affected by human
activities. And develop scientific and
engineering tools to enable more effective
adaptive management approaches for large-scale
human-dominated environments.
68The Needand Why Now?
Nothing is more fundamental to life than water.
Not only is water a basic need, but adequate
safe water underpins the nations health,
economy, security, and ecology. NRC (2004)
Confronting the nations water problems the role
of research.
Three critical deficiencies in current
abilities (1) We lack basic data and the
infrastructure to collect them at the needed
resolution. (2) Even if we could collect
them, we lack the means to integrate data across
scales from different media and sources
(observations, experiments, simulations).
(3) We lack sufficiently accurate modeling and
decision-support tools to predict underlying
processes, let alone forecast effects of
different management strategies.
69Critical Environmental Grand Challenges
- Understanding and forecasting hydrologic cycle
processes - Designing ecologically sustainable cities
- Assessing effects of climate change on water
resources (droughts/floods) - Understanding human impacts on major
biogeochemical cycles and the incidence of
water-borne communicable diseases - Quantifying relationship of land-use/cover to
aquatic ecosystem quality - Reinventing the use of materials (that become
pollutants)
References NRC (2001) Grand Challenges in
the Environmental Sciences NAE (2002)
Engineering and Environmental Challenges
70NCSA CLEANER Efforts
- NCSA has developed a prototype of the CLEANER
CyberCollaboratory (http//cleaner.ncsa.uiuc.edu) - NCSA is leading the CLEANER Project Office
activities - Major requirements gathering initiative
- Create prototypes
- Community surveys and interviews to assess needs
- report of recommendations CLEANER needs
- NCSA is creating prototypes that build on a
common CI architecture across communities - Two environmental testbeds
- Illinois River Basin testbed
- Corpus Christi Bay testbed
Effort led by Barbara Minsker
71Environmental CI Architecture Research Services
Integrated CI
Supporting Technology
Data Services
Workflows Model Services
Knowledge Services
Meta-Workflows
Collaboration Services
Digital Library
Analyze Data /or Assimilate into Model(s)
Link /or Run Analyses /or Model(s)
Create Hypo-thesis
Obtain Data
Discuss Results
Publish
Research Process
72National Institute of General Medical Sciences
Mission Statement
- In ten years, we want every person involved in
the biomedical enterprise---basic researcher,
clinical researcher, practitioner, student,
teacher, policy maker---to have at their
fingertips through their keyboard instant access
to all the data sources, analysis tools, modeling
tools, visualization tools, and interpretative
materials necessary to do their jobs with no
inefficiencies in computation or information
technology being a rate-limiting step. - In twenty years, we want intelligent
computational agents to do complex query and
modeling tasks in the biomedical computing
environment, freeing humans for creative
hypothesis construction and high level analysis
and interpretation.
Jakobsson
73Some important problems with biomedical computing
tools
- They are difficult to use.
- They are fragile.
- They lack interoperability of different
components - They suffer limitations on dissemination
- They often work in one program/one function mode
as opposed to being part of an integrated
computational environment. - There are not sufficient personnel to meet the
needs for creating better biological computing
tools and user environments.
Jakobsson
74Computation holds great promise for future
progress in biomedical science
- Cataloguing and analyzing individual genome-based
variations to permit customized diagnosis and
therapy. - Building comprehensive pathway models for human
and pathogen cells to provide a framework for
understanding normal function and disease at the
subcellular level. - Building and deploying dynamic models of disease
epidemics as a tool for responding to natural
pandemics and bioterrorist attacks - Use of biomimetic principles to construct
Computer Aided Design systems for molecular
devices - Predicting protein structures and functional
properties from sequences
Jakobsson
75The Paradox of Computational BiologyIts
successes are the flip side of its deficiencies
- The success of computational biology is shown by
the fact that computation has become integral and
critical to modern biomedical research. - Because computation is integral to biomedical
research, its deficiencies have become
significant rate limiting factors in the rate of
progress of biomedical research.
Jakobsson
76Biological Computing Challenges
- Simulation and prediction
- structures and dynamics
- Multilevel networks
- signaling, metabolic, protein interaction
- gene regulatory
Temporal (seconds)
Spatial (nM3)
Reed
77High-Performance Computing is a Major GTL Partner
Protein machine Interactions
?
1000 TF 100 TF 10 TF 1 TF
Molecule-based cell simulation
Molecular machine classical simulation
Cell, pathway, and network simulation
Community metabolic regulatory, signaling
simulations
Current U.S. Computing for Open Science
Constrained rigid docking
Constraint-Based Flexible Docking
Genome-scale protein threading
?
OBER / OASCR Partnership
Comparative Genomics
Patinos
Teraflops
Biological Complexity
78Companies are not Using HPC as Aggressively as
Possible
- Education and Training Barriers
- Lack of computational scientists (internal or
external) - Not enough people in the pipeline
- Poor match between skills taught and skills
needed
Tichenor - Council on Competitiveness
79Grand Challenge Case Studies
- Â Â
- Five currently intractable
- Problems that could profoundly advance
industrial productivity and national
competitiveness if petaflop or greater compute
capability can be made available to solve them.
Tichenor
80Grand Challenge Case Studies
- Auto Crash Safety Its Not Just for Dummies
- Full Vehicle Design Optimization for Global
Market Dominance - Keeping the Lifeblood Flowing Boosting Oil and
Gas Recovery from the Earth - Customized Catalysts to Improve Crude Oil Yields
Getting More Bang From Each Barrel - Spin Fiber Faster to Gain a Competitive Edge for
U.S. Textile Manufacturing
Tichenor
81Impact of 10X Easier to Use Computers
- 10X easier-to-use machines deliver strategic
benefits - Develop more powerful applications, or
fundamentally rewrite current applications - Shorten design cycles, faster time-to-market
- Make HPC available to researchers who dont
understand programming - Increase RD efficiency reduce costs
We would look to rewrite the entire science
underlying the current technology and methodology
we are using.
Tichenor
82TeraGrid is One of the First Broad Instantiations
of CI
83The Grid Today
- Common Middleware
- abstracts independent, hardware, software, user
ids, into a service layer with defined APIs - added comprehensive security,
- allows for site autonomy
- provides a common infrastructure based on
middleware
User
Application
The underlying infrastructure is abstracted into
defined APIs thereby simplifying developer and
the user access to resources, however, this layer
is not intelligent
Grid Middleware
Infrastructure
Network
Site A
Site B
Source NASA
84Hope for the Future
- Customizable Grid Services built on defined
Infrastructure APIs - automatic selection of resources
- information products tailored to users
- Account-less processing
- flexible interface web based, command line, APIs
User
Application
Resources are accessed via various intelligent
services that access infrastructure APIs The
result The Scientist and Application Developer
can focus on science and not on systems management
Intelligent, Customized Middleware
Grid Middleware - Infrastructure APIs (service
oriented)
Infrastructure
Network
Site A
Site B
Source NASA
85More Than Grid Computing
- Cyberinfrastructure is
- A shared responsibility among scientists,
communities, agencies, and even nations - Research as well as deployed systems for
production research - The trick is to do both well and keep everyone
happy! - Because CI is inherently distributed, all of us
will have a greater role in bringing it about,
moving it forward and sustaining it
Credit Droegemeier
86A New Era in the US
- More interagency involvement in the provision of
services - Stronger linkages with industry to develop next
generation capabilities - A broader, balanced portfolio among multiple
elements (HPC, data, viz, networking, software,
people, tools) - Emphasis on sustained services and diversity of
provision
Credit Droegemeier
87A New Era in the World
Courtesy I. Foster