Title: Grids and Web 2.0 supporting eScience
1Grids and Web 2.0 supporting eScience
- STEM Scholars SeminarIndiana University Memorial
Union - August 1 2007
- Geoffrey Fox
- Computer Science, Informatics, Physics
- Pervasive Technology Laboratories
- Indiana University Bloomington IN 47401
- gcf_at_indiana.edu
- http//www.infomall.org
2Community Grids LaboratoryTechnology Expertise
- Web Service and Web 2.0 technologies for large
scale distributed systems -- largely to support
science - Web Services Integrate ideas in Enterprise
Software into science - Web 2.0 Integrate ideas in Flickr Connotea
Slideshare Scribd and YouTabe into science - Geographical Information Systems (e.g. Google
Maps) - Streaming Sensor data (including audio-video
streams) - Portals (User Interfaces)
- Parallel computing to make computers fast
- Technologies built as part of applications
3Community Grids Laboratory Projects
- Funded by NSF NASA NIH DoE and DoD
- Cheminformatics High Throughput Screening data
and filtering PubChem PubMed including document
analysis - Interactive Particle Physics Data Analysis
- Earthquake Science predicting earthquakes using
simulations and satellite and GPS global
positioning system Sensor Grid - eSports collaboration for real time trainers and
sportsman with HPER IU School of Health, Physical
Education, and Recreation. - Ice Sheet Dynamics melting of Glaciers
- Navajo Nation Grid Education (Science Gateways)
and Healthcare - Web 2.0 tutorial and distance education course
spring 2007 - Architecture of Air Force Sensor and Decision
support systems
4Why Cyberinfrastructure Useful
- Supports distributed science data, people,
computers - Exploits Internet technology (Web2.0) adding (via
Grid technology) management, security,
supercomputers etc. - It has two aspects parallel low latency
(microseconds) between nodes and distributed
highish latency (milliseconds) between nodes - Parallel needed to get high performance on
individual 3D simulations, data analysis etc.
must decompose problem - Distributed aspect integrates already distinct
components - Cyberinfrastructure is in general a distributed
collection of parallel systems - Cyberinfrastructure is made of services (usually
Web services) that are just programs or data
sources packaged for distributed access
5e-moreorlessanything and Cyberinfrastructure
- e-Science is about global collaboration in key
areas of science, and the next generation of
infrastructure that will enable it. from its
inventor John Taylor Director General of Research
Councils UK, Office of Science and Technology - e-Science is about developing tools and
technologies that allow scientists to do faster,
better or different research - Similarly e-Business captures an emerging view of
corporations as dynamic virtual organizations
linking employees, customers and stakeholders
across the world. - The growing use of outsourcing is one example
- The Grid or Web 2.0 (Enterprise 2.0) provides the
information technology e-infrastructure for
e-moreorlessanything. - A deluge of data of unprecedented and inevitable
size must be managed and understood. - People (see Web 2.0), computers, data and
instruments must be linked. - On demand assignment of experts, computers,
networks and storage resources must be supported
6TeraGrid Integrating NSF Cyberinfrastructure
TeraGrid is a facility that integrates
computational, information, and analysis
resources at the San Diego Supercomputer Center,
the Texas Advanced Computing Center, the
University of Chicago / Argonne National
Laboratory, the National Center for
Supercomputing Applications, Purdue University,
Indiana University, Oak Ridge National
Laboratory, the Pittsburgh Supercomputing Center,
and the National Center for Atmospheric
Research. Today 250 Teraflop tomorrow a
petaflop Indiana 20 teraflop today becoming 30
teraflop
7Virtual Observatory Astronomy GridIntegrate
Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible X-ray
Galaxy Density Map
8Grid Capabilities for Science
- Open technologies for any large scale distributed
system that is adopted by industry, many sciences
and many countries (including UK, EU, USA, Asia) - Security, Reliability, Management and state
standards - Service and messaging specifications
- User interfaces via portals and portlets
virtualizing to desktops, email, PDAs etc. - 20 TeraGrid Science Gateways (their name for
portals) - OGCE Portal technology effort led by Indiana
- Uniform approach to access distributed
(super)computers supporting single (large) jobs
and spawning lots of related jobs - Data and meta-data architecture supporting
real-time and archives as well as federation - Links to Semantic web and annotation
- Grid (Web service) workflow with standards and
several successful instantiations (such as
Taverna and MyLead) - Many Earth science grids including ESG (DoE),
GEON, LEAD, SCEC, SERVO LTER and NEON for
Environment - http//www.nsf.gov/od/oci/ci-v7.pdf
9Old and New (Web 2.0) Community Tools
- e-mail and list-serves are oldest and best used
- Kazaa, Instant Messengers, Skype, Napster,
BitTorrent for P2P Collaboration text,
audio-video conferencing, files - del.icio.us, Connotea, Citeulike, Bibsonomy,
Biolicious manage shared bookmarks - MySpace, YouTube, Bebo, Hotornot, Facebook, or
similar sites allow you to create (upload)
community resources and share them Friendster,
LinkedIn create networks - http//en.wikipedia.org/wiki/List_of_social_networ
king_websites - Writely, Wikis and Blogs are powerful specialized
shared document systems - ConferenceXP and WebEx share general applications
- Google Scholar tells you who has cited your
papers while publisher sites tell you about
co-authors - Windows Live Academic Search has similar goals
- Note sharing resources creates (implicit)
communities - Social network tools study graphs to both define
communities and extract their properties
10Best Web 2.0 Sites -- 2006
- Extracted from http//web2.wsj2.com/
- Social Networking
- Start Pages
- Social Bookmarking
- Peer Production News
- Social Media Sharing
- Online Storage (Computing)
10
11Web 2.0 Systems are Portals, Services, Resources
- Captures the incredible development of
interactive Web sites enabling people to create
and collaborate
12Mashups v Workflow?
- Mashup Tools are reviewed at http//blogs.zdnet.co
m/Hinchcliffe/?p63 - Workflow Tools are reviewed by Gannon and Fox
http//grids.ucs.indiana.edu/ptliupages/publicatio
ns/Workflow-overview.pdf
- Both include scripting in PHP, Python, sh etc. as
both implement distributed programming at level
of services - Mashups use all types of service interfaces and
do not have the potential robustness (security)
of Grid service approach - Typically pure HTTP (REST)
12
13Grid Workflow Datamining in Earth Science
- Work with Scripps Institute
- Grid services controlled by workflow process real
time data from 70 GPS Sensors in Southern
California
NASA GPS
Earthquake
13
14Web 2.0 uses all types of Services
- Here a Gadget Mashup uses a 3 service workflow
with a JavaScript Gadget Client
14
15Web 2.0 APIs
- http//www.programmableweb.com/apis has (May 14
2007) 431 Web 2.0 APIs with GoogleMaps the most
often used in Mashups - This site acts as a UDDI for Web 2.0
16The List of Web 2.0 APIs
- Each site has API and its features
- Divided into broad categories
- Only a few used a lot (42 APIs used in more than
10 mashups) - RSS feed of new APIs
- Amazon S3 growing in popularity
174 more Mashups each day
- For a total of 1906 April 17 2007 (4.0 a day over
last month) - Note ClearForest runs Semantic Web Services
Mashup competitions (not workflow competitions) - Some Mashup types aggregators, search
aggregators, visualizers, mobile, maps, games
18Mash Planet Web 2.0 Architecture
http//www.imagine-it.org/mashplanet Display too
large to be a Gadget
18
19Searched on Transit/Transportation
19
20(No Transcript)
21Grid-style portal as used in Earthquake Grid
- The Portal is built from portlets providing
user interface fragments for each service that
are composed into the full interface uses OGCE
technology as does planetary science VLAB portal
with University of Minnesota
Now to Portals
21
22Portlets v. Google Gadgets
- Portals for Grid Systems are built using portlets
with software like GridSphere integrating these
on the server-side into a single web-page - Google (at least) offers the Google sidebar and
Google home page which support Web 2.0 services
and do not use a server side aggregator - Google is more user friendly!
- The many Web 2.0 competitions is an interesting
model for promoting development in the world-wide
distributed collection of Web 2.0 developers - I guess Web 2.0 model will win!
22
23Building Distributed Systems or
Cyberinfrastructure for Science
- One use Web 2.0 which is more intuitive and has
lower barrier to entry - Typically uses PHP
- Or Web Service technology which is more powerful
(e.g. for security) but has a high learning and
infrastructure overhead - Typically uses Java
- One can use Grid resources like TeraGrid and/or
- Web 2.0 capabilities like MySpace, Google Maps
- We try to use best of both worlds!
24(No Transcript)
25Workflows - Taverna (taverna.sourceforge.net)
26The first particle physics experiment The Big
Bang
CMB
- A Brief History of Time
- 10-43 secs 10-37 secs
- Gravity Strong forces separate
- 10-35 secs
- Inflation
- 10-10 seconds
- Quark-AntiQuark Annihilation (CP Violation)
- 10 microseconds
- Quarks form protons, neutrons
- 380,000 years (last scatter)
- Nuclei capture electrons, form atoms universe
transparent to light - 1.0 Gigayear
- Galaxies begin to form
- 13.7 Gigayears Today
LHC
27Closing CMS for the first time (July)
28Higgs diphoton Analysis using Rootlets
29Ice Sheet Dynamics
30My Tags Menu Opened up. My Account also opens
up to show account and profile information
31Add To CITeam button opens new window
Clicking the Add To CITeam button opens up this
box to add information about this page (tags,
description, etc), which will be added to our
database and to Connotea