Data- and Compute-Driven Transformation of Modern Science - PowerPoint PPT Presentation

View by Category
About This Presentation

Data- and Compute-Driven Transformation of Modern Science


Edward Seidel Assistant Director, Mathematical and Physical Sciences, NSF (Director, Office of Cyberinfrastructure) * – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 23
Provided by: JeffB166
Learn more at:


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data- and Compute-Driven Transformation of Modern Science

Data- and Compute-Driven Transformation of Modern
Edward Seidel Assistant Director, Mathematical
and Physical Sciences, NSF (Director, Office of
Profound Transformation of ScienceGravitational
  • Galileo, Newton usher in birth of modern science
    c. 1600
  • Problem single particle (apple) in
    gravitational field (General 2 body-problem
    already too hard)
  • Methods
  • Data notebooks (Kbytes)
  • Theory driven by data
  • Computation calculus by hand (1 Flop/s)
  • Collaboration
  • 1 brilliant scientist, 1-2 student

Profound Transformation of ScienceCollision of
Two Black Holes
  • Science Result
  • The Pair of Pants
  • Year 1972
  • Team size
  • 1 person (S. Hawking)
  • Computation
  • Flop/s
  • Data produced
  • Kbytes (text, hand-drawn sketch)
  • 400 years latersame!
  • Science Result
  • The Pair of Pants
  • Year 1994
  • Team size
  • 10
  • Data produced
  • 50Mbytes

Move to 3D 1000x more data!
  • 3D Collision
  • Science Result
  • Year 1998
  • Team size
  • 15
  • Data produced
  • 50Gbytes

Just ahead Complexity of UniverseLHC,
Gamma-ray bursts!
  • Gamma-ray bursts!
  • GR now soluble complex problems in relativistic
  • All energy emitted in lifetime of sun bursts out
    in a few seconds what are they?! Colliding
    BH-NS? SN?
  • GR, hydrodynamics, nuclear physics, radiation
    transport, neutrinos, magnetic fields globally
    distributed collab!
  • Scalable algorithms, complex AMR codes, viz,
    PFlopsweek, PB output!
  • LHC Higgs particle?
  • 10K scientists, 33 countries, 25PB
  • Planetary lab for scientific discovery!

Remote Instrument
Remote Instrument
Grand Challenge Communities Combine it
All...Where is it going to go?
Same CI useful for black holes, hurricanes
Framing the QuestionScience is Radically
Revolutionized by CI
  • Modern science
  • Data- and compute-intensive
  • Integrative
  • Multiscale Collaborations for Complexity
  • Individuals, groups
  • Teams, communities
  • Must Transition NSF CI approach to support
  • Integrative, multiscale
  • 4 centuries of constancy, 4 decades 109-12

But such radical change cannot be adequately
addressed with (our current) incremental approach!
We still think like this
Students take note!
Data Crisis Information Big Bang
Scientific Computing and Imaging Institute,
University of Utah
Explosive Trends in Data Growth
  • Comparative Metagenomics
  • DNA sequencing of entire families of organisms
  • Already hundreds of TB, thousands of users
  • HD Collaborations and Optiportals
  • Multichannel HD, gigapixel visualizations
  • Petascale-Exascale simulation
  • They generate peta-exabytes per simulation!
  • Square Kilometer Array
  • 3000 radio receivers, 1 km2 area!
  • 19 countries! Possibly beginning in 201X,
    operational 202X
  • Data exabyte per week! Analysis Exaflops!

Provenance in Science
Source Juliana Freire, U of Utah
  • Provenance is as important as the result
  • Not a new issue
  • Lab notebooks have
  • been used for a long time
  • What is new?
  • Large volumes of data
  • Complex analysescomputational processes
  • Writing notes is no longer an option
  • GC Communities require open, sharable data,
    standards, metadata

Observed data
DNA recombination By Lederberg
Data Deluge Drives Change at NSF
There is a major shift in science towards
data-intensive methods. NSF is responding
  • Data Issues resonate the most across NSF units!
  • DataNet 100M investment in Sustainable Archive
    Access Partners development of widely
    accessible network of interactive data archives
    Driven by todays grand challenges, integrating
    multiple disciplines
  • INTEROP Community-led interoperability
    interdisciplinary, community approaches to
    combine and re-use data in ways not envisioned by
    their creators
  • Data-intensive computing SDSC Gordon facility
  • NSF Data Policy The Data Working Group
    (NSF-wide group of Program Directors) working to
    assure that data are shared within and across

NSF Vision and National CI Blueprint
Science is becoming unreproducible in this
environment. Validation?Provenance?
Track 1
The Shift Towards DataImplications
  • All science is becoming data-dominated
  • Experiment, computation, theory
  • Totally new methodologies
  • Algorithms, mathematics
  • All disciplines from science and engineering to
    arts and humanities
  • End-to-end networking becomes critical part of CI
  • Campuses, please note!
  • How do we train data-intensive scientists?
  • Data policy becomes critical!

Recent NSF Activities on Data Policy and
Fundamental points on data and publication policy
Who pays? The NSF? The Institution? What is
the cost model? What is reasonable?
Where is it placed? Author web site? Library?
NSF sites?
  • Publicly funded scientific data and publications
    should be available, and science benefits
  • There has to be a place to keep data, and a way
    to access it
  • There needs to be an affordable, sustainable
    cost model for this

What data must be made available? Raw data?
Peer reviewed? When is it available? 6 months?
1 year? After publication?
How long is it made available? How do we enforce
it post-award?
There is great variability in requirements across
science communities peer review can help guide
this process.
Changes Coming for Data!
  • Long-standing NSF Policy on Data (Proposal
    Award Policies Procedures Guide)
  • Investigators are expected to share with other
    researchers, at no more than incremental cost and
    within a reasonable time, the primary data...
    created or gathered in the course of work under
    NSF grants
  • NSF will soon require a Data Management Plan
    (DMP), subject to peer review criterion for
  • The DMP will be in the form of a 2-page
    supplementary document to the proposal
  • It will not be possible to submit proposals
    without such a document
  • Customization by discipline, program necessary

Upcoming Implementation of NSF Data Policy
  • Directorate-Specific Issues Peer Review
  • Many details are implemented/enforced via peer
    review and Program Officer discretion, including
    things like embargo period, standards, etc.
  • The challenge at NSF is that no one size fits
    all so each Directorate will be responsible for
    its own recommendations for DMP content,
    appropriate institutional repositories, etc.
  • This does not address Open Access as applied

Electronic Access to Scientific Publications
Why is this Important?
  • Science requires it
  • Science progress accelerated by making
    publications available and searchable
  • Results in one community need to easily propagate
    to another for multidisciplinary complex problem
  • Search technologies can be brought to bear
  • Publications need to be associated with rich
    information videos of simulations, supporting
    data, simulation and analysis codes
  • Equality and Broadening participation
  • Young scientists at smaller universities at a
    needless disadvantage without it. They may lose
    journal access. This hurts science and puts
    talent at risk
  • US Administration focus on transparency and

Current Activities
  • We have begun serious discussions within NSF on
    these issues
  • National Science Board Committee on Data started
  • Goals similar to those for Data
  • We have had numerous visits from funding agencies
    from around the world
  • Primary topic what is NSF doing on OA?
  • Discussion with various publishers, libraries to
    explore options
  • Quality of science relies on peer review systems
    of best journals need a way to support OA

On Working with Publishers
  • Quality of science, identification of talented
    scientists we rely on the peer review systems of
    the best journals
  • NSF receives an assurance that the work done on a
    grant meets a standard
  • Universities use impact factors as part of their
    tenure and promotion process
  • I believe it is in the interests of science, and
    hence the public interest, to help journals find
    a viable OA business model..
  • Bernard Schutz, Presentation to NSF, May 2009

Final Remarks
  • Science is becoming collaborative and data
  • We are accelerating efforts to advance NSF in all
    aspects of data
  • Science requires that data need to be open and
    accessible we are working to achieve this
  • All forms of data are important, and must be more
    tightly connected in the future
  • Collections, software, publications
  • Time is of the essence