Semantics - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Semantics

Description:

Semantics – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 22
Provided by: Dagg3
Category:
Tags: issa | semantics

less

Transcript and Presenter's Notes

Title: Semantics


1
eScience Opportunities for Applied Computing
Dan Fay www.microsoft.com/science
2
Science _at_ Microsoft
Life Sciences
Social Sciences
Earth Sciences
Accelerating Discovery
New Materials,Technologies Processes
MultidisciplinaryResearch

Computer Information Sciences
Math andPhysical Science
3
A Data Deluge in Science
  • Data collection
  • Sensor networks, satellite surveys, high
    throughput laboratory instruments, astronomical
    telescopes, supercomputers, LHC
  • Data processing, analysis, visualization
  • Legacy codes, workflows, data mining, indexing,
    searching, graphics
  • Archiving
  • Digital repositories, libraries, preservation,

SensorMap Functionality Map navigation Data
sensor-generated temperature, video camera feed,
traffic feeds, etc.
Scientific visualizations NSF Cyberinfrastructure
report, March 2007
4
Emergence of a New Research Paradigm?
  • Thousand years ago Experimental Science
  • Description of natural phenomena
  • Last few hundred years Theoretical Science
  • Newtons Laws, Maxwells Equations
  • Last few decades Computational Science
  • Simulation of complex phenomena
  • Today eScience or Data-centric Science
  • Unify theory, experiment, and simulation
  • Using data exploration and data mining
  • Data captured by instruments
  • Data generated by simulations
  • Data generated by sensor networks
  • Scientists over-whelmed with data
  • Computer Science and IT companies
  • have technologies that will help
  • (With thanks to Jim Gray)

5
The Perfect Data Storm
  • The era of remote sensing, cheap ground-based
    sensors and web service access to agency
    repositories is here
  • Extracting and deriving the data needed for the
    science remains problematic
  • Specialized knowledge
  • Finding the right needle in the haystack

6
The Data Pipeline
7
Dynameomics
High-throughput molecular dynamics to simulate
representative proteins from all known folds
  • Valerie Daggett University of Washington
  • Perform MD simulations of representatives of all
    folds (41K structures in PDB ? 1130 fold
    families)
  • Top 30 folds - Many are potential biomedical
    targets
  • Current Status
  • gt 650 proteins simulated
  • gt 4744 simulations
  • gt 64 TB of data
  • gt 1.26x108 structures
  • Housed in novel hybrid SQL/OLAP database using
    SQL Server
  • We invite you to experience it!
    www.dynameomics.org

8
The Cosmic Genome Project
  • The Sloan Digital Sky Survey is the first major
    astronomical survey project
  • 5 color images of ¼ of the sky
  • Pictures of 300 million celestial objects
  • Distances to the closest 1 million galaxies
  • Jim Gray from Microsoft Research worked with
    astronomer Alex Szalay to build the public
    SkyServer archive for the survey
  • New model of scientific publishing
  • Have to publish the data before astronomers
    publish their analysis

9
Public Use of the SkyServer
  • Posterchild in 21st century data publishing
  • 380 million web hits in 6 years
  • 930,000 distinct usersvs 10,000 astronomers
  • 1600 scientific papers
  • Delivered 50,000 hoursof lectures to high
    schools
  • Delivered 100B rows of data
  • Citizen Science GalaxyZoo
  • Goal of 1 million visual galaxy classifications
    by the public
  • Allows general public to search for photographs
    and classify different types of galaxies

10
Hanny van Arkles Voorwerp
11
World Wide Telescope
Seamless Rich Social Media Virtual Sky Web
application for science and education
  • Participants
  • Alyssa Goodman Harvard University
  • Alex Szalay Johns Hopkins University
  • Curtis Wong, Jonathan Fay Microsoft Research
  • Goals
  • Integration of data sets and one-click contextual
    access
  • Easy access and use
  • In just over a little more than two months, a
    million users have downloaded, installed and
    launched the application (2,206,497 unique
    sessions)
  • We invite you to experience it!
    www.worldwidetelescope.org

12
Berkeley Water Center
Understanding regional hydrology
  • Project Organization
  • Jim Hunt, Dennis Baldocchi, UC Berkeley
  • Deb Agarwal, Lawrence Berkeley Laboratory
  • Catharine van Ingen, MSR
  • Goals
  • Enable rapid scientific data browsing for
    availability and applicability
  • Enable environmental science via data synthesis
    from multiple sources
  • Progress
  • Environmental Data Server, www.fluxdata.org
    (SharePoint), serves 921 site years of
    carbon-climate field data from 160 field teams
    to 60 paper writing teams (800M values)
  • Multiple projects now leveraging same SQL Server
    database and data cube approach
  • CUAHSI consortium 100 universities collaborating
    on hydrology

13
Carbo-Climate Synthesis (BWC Dennis Baldocchi et
al)
  • Sharepoint site www.fluxnet.org
  • 921 site-years of data from 240 sites around the
    world 80 site-years now being added
  • American data subset is public and served more
    widely
  • Summary data greatly simplify initial data
    discovery
  • Communal field science each investigator acts
    independently.
  • Cross site studies and integration with modeling
    increasingly important

14
Browsing For Data Availability, Applicability,
Early Science
15
Browsing the Whole Dataset
Daily Rg 2005, 72 sites
Daily Rg 2000-2006, 200 sites
16
(No Transcript)
17
(No Transcript)
18
Data Depot Social Data Aggregation and Analysis
  • http//datadepot.msresearch.us

Removal of CO2 from the air, by latitude over the
course of a year.
Sensors
Phones
Applications
Internet
Web datadepot.msresearch.us Contact
counts_at_microsoft.com
19
Trident Scientific Workflow WorkbenchUniv. of
Washington and Monterey Bay Aquarium Research
Institute
Scientific workflow workbench to automate the
data processing pipelines of the worlds first
plate-scale undersea observatory
  • Goals
  • From raw data to useable data products
  • Focusing on cleaning, analysis, re-gridding,
    interpolation
  • Support real time, on-demand visualizations
  • Custom activities and workflow libraries for
    authoring
  • Visual programming accessible via a browser
  • Trial Cloud Services for science
  • Proof Points
  • A scientific workflow workbench for a number of
    science projects, reusable workflows, automatic
    provenance capture.
  • Demonstrate scientific use of Windows WF, HPCS,
    SQL Server and Cloud Service SSDS

20
Resources
  • Microsoft Research
  • http//research.microsoft.com
  • Microsoft Research downloads http//research.micr
    osoft.com/research/downloads
  • Science at Microsoft
  • http//www.microsoft.com/science
  • Scholarly Communications
  • http//www.microsoft.com/scholarlycomm
  • CodePlex
  • http//www.codeplex.com
  • The Faculty Connection
  • http//www.microsoft.com/education/facultyconnecti
    on
  • MSDN Academic Alliance
  • http//msdn.microsoft.com/en-us/academic

21
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com