Title: Building Science Gateways
1Building Science Gateways
- Marlon Pierce
- Community Grids Laboratory
- Indiana University
2Tutorial Overview
Type Title Presenter
Talk Gateways overview Marlon
Talks OGCE overview Marlon
Talk TeraGrid Resources Overview Simms
Break Break Break
Demo LEAD Portal and workflows Suresh
Demo GridChem Workflow Suresh
Demo OGCE and TGUP Portals Marlon
Lunch Lunch Lunch
3Theres More
Type Title Presenter
Hands On OGCE, LEAD, and TGUP portals and workflows Marlon, Suresh
Talk/HO Building the OGCE Portal Marlon
Talk/HO Building gadgets with GTLAB Marlon
Break (200-230) Break (200-230) Break (200-230)
Talk Web 2.0 for Science Gateways (Optional) Marlon
HO Continue hands on work Suresh, Marlon
4Slides and Demo Site
- Tutorial slides are available from
http//www.collab-ogce.org/ogce/index.php/Tutorial
s - We run a permanent demo portal at
https//community.ucs.indiana.edu8443/gridsphere/
- Also aliased as https//ogceportal.iu.teragrid.org
8443/gridsphere - Portal accounts train01-train30 have been created
for the workshop. Password is the same as the
account name. - Also train31-train49 from TG08 workshop.
- We also have TeraGrid training accounts with
names train01-train30 that can be used to
retrieve TG proxy credentials. - These should be active all week.
- You can also log into the TeraGrid User Portal
with this account and the secret password.
5Concept 1 Web Portal
- Web container that aggregates content from
multiple sources into a single display. - Start Pages
- Typically consume RSS/Atom news feeds.
- More powerful versions these days support Flickr,
calendars, games, etc. - Gadgets, widgets
- Examples iGoogle, Netvibes, My Yahoo!
6Gadget
RSS Feeds
7Concept 2 Grid Computing
- Grid computing software is designed to integrate
large supercomputing facilities. - TeraGrid, Open Science Grid, EGEE, etc.
- This is done via network services
- Software providers in the US include Globus and
Condor - Key Service Components (and example services)
- Authentication and authorization framework
(MyProxy) - Remote process access and control (GRAM, Condor)
- Remote file, I/O access (GridFTP, SRB, RFT)
- Additional Services
- Information services, replica management,
database federation, storage management,
schedulers, etc. - Example Grid Software Stacks CTSS and VDT
- For TeraGrid and Open Science Grid, respectively
- Being pushed by Cloud Computing (Amazon, Google,
Microsoft, others)
8(No Transcript)
9Science Portals and Gateways
- Science Gateways adapt Web portal technology to
build user interfaces to the Grid. - Science portals resemble standard portals, but
must also - Support access to computing and storage
resources. - Allow users remote, direct access to these
resources. - You often want to run applications and access
data that you own directly. - Provide access to science applications and data
sets. - And we must provide value added services as well
as user interfaces.
10Example Science Gateways
- Many listed here
- http//www.teragrid.org/programs/sci_gateways/
- Cover many different scientific fields
- Atmospheric science, geophysics, computational
chemistry, bioinformatics, etc - See also GCE08 workshop at SC08 and earlier
proceedings - http//www.collab-ogce.org/gce08/index.php/Main_Pa
ge - GCE05-07 also linked.
11TeraGrid Science Gateways Program
- Slides courtesy of Nancy Wilkins-Diehr
- TeraGrid Area Director for Science Gateways
- wilkinsn_at_sdsc.edu
12Today, there are approximately 29 gateways using
the TeraGrid
13Does a gateway have to use TeraGrid to be a
gateway?
- No, but the TeraGrid does fund the development
and support of these gateways - Using high end resources is more work and is not
recommended unless it serves a demonstrated need - Gateways are an excellent way to extend the
impact of high-end resources - Are they all funded by TeraGrid?
- Can TeraGrid claim success for all gateways?
- No, we dont make the gateways you use, we make
the gateways you use better - TeraGrid does fund a small number of developers
to provide advanced support. - More later.
14Why are gateways worth the effort?
Full path to executable executable/user
s/wilkinsn/tutorial/bin/mcell Working
directory, where Condor-G will write its
output and error files on the local
machine. initialdir/users/wilkinsn/tutorial/exerc
ise_3 To set the working directory of the
remote job, we specify it in this globus RSL,
which will be appended to the RSL that Condor-G
generates globusrsl(directory'/users/wilkinsn/tu
torial/exercise_3') Arguments to pass to
executable. argumentsnmj_recon.main.mdl
Condor-G can stage the executable transfer_executa
blefalse Specify the globus resource to
execute the job globusschedulertg-login1.sdsc.ter
agrid.org/jobmanager-pbs Condor has multiple
universes, but Condor-G always uses
globus universeglobus Files to receive sdout
and stderr. outputcondor.out errorcondor.err
Specify the number of copies of the job to submit
to the condor queue. queue 1
- Increasing range of expertise needed to tackle
the most challenging scientific problems - How many details do you want each individual
scientist to need to know? - PBS, RSL, Condor
- Coupling multi-scale codes
- Assembling data from multiple sources
- Collaboration frameworks
! /bin/sh PBS -q dque PBS -l nodes1ppn2
PBS -l walltime000200 PBS -o pbs.out PBS
-e pbs.err PBS -V cd /users/wilkinsn/tutorial/exe
rcise_3 ../bin/mcell nmj_recon.main.mdl
( (resourceManagerContact"tg-login1.sds
c.teragrid.org/jobmanager-pbs")
(executable"/users/birnbaum/tutorial/bin/mcell")
(argumentsnmj_recon.main.mdl)
(count128) (hostCount10) (maxtime2)
(directory"/users/birnbaum/tutorial/exerci
se_3") (stdout"/users/birnbaum/tutorial/e
xercise_3/globus.out") (stderr"/users/bir
nbaum/tutorial/exercise_3/globus.err") )
15Not just ease of useWhat can scientists do that
they couldnt do previously?
- LEAD - access to radar data
- NVO access to sky surveys
- OOI access to sensor data
- PolarGrid access to polar ice sheet data
- SIDGrid analysis tools
- GridChem developing multiscale coupling
- How would this have been done before gateways?
16Gateways Greatly Expand Access
- Almost anyone can investigate scientific
questions using high end resources - Not just those in the research groups of those
who request allocations - Gateways allow anyone with a web browser to
explore - Opportunities can be uncovered via google
- Nancys 11-year-old son discovered nanoHUB.org
himself while his class was studying Bucky Balls - Fosters new ideas, cross-disciplinary approaches
- Encourages students to experiment
- But used in production too
- Significant number of papers resulting from
gateways including GridChem, nanoHUB - Scientists can focus on challenging science
problems rather than challenging infrastructure
problems
17TeraGrid Pathways Activities
- Program funding to involve MSI communities
- 2 Gateway components
- Adapt gateways for educational use by
underrepresented communities - GEON SDSC, Navajo Tech
- Teach participants from underrepresented
communities how to build gateways - PolarGrid IU, ECSU
18Navajo Technical College and gateways
- Incorporating the use of gateways in their
curricula - GEON, GISolve areas of initial interest
19PolarGrid
- Cyberinfrastructure Center for Polar Science
(CICPS) - Experts in polar science, remote sensing and
cyberinfrastructure - Indiana, ECSU, CReSIS
- Satellite observations show disintegration of ice
shelves in West Antarctica and speed-up of
several glaciers in southern Greenland - Most existing ice sheet models, including those
used by IPCC cannot explain the rapid changes
http//www.polargrid.org/polargrid/images/4/42/C00
50-polargrid-big.m4v
Source Geoffrey Fox
20- Components of PolarGrid
- Expedition grid consisting of ruggedized laptops
in a field grid linked to a low power multi-core
base camp cluster - Prototype and two production expedition grids
feed into a 17 Teraflops "lower 48" system at
Indiana University and Elizabeth City State
(ECSU) split between research, education and
training. - Gives ECSU a top-ranked 5 Teraflop MSI high
performance computing system - Access to expensive data
- High-end resources for analysis
- MSI student involvement
Source Geoffrey Fox
21Recent Gateways using TeraGrid Significantly
22SCEC using gateway to produce hazard map
- PSHA hazard map for California using newly
released Earthquake Rupture Forecast (UCERF2.0)
calculated using SCEC Science Gateway - Warm colors indicate regions with a high
probability of experiencing strong ground motion
in the next 50 years. - High resolution map, significant CPU use
23Social Informatics Data Grid
- Heavy use of multimodal data.
- Subject might be viewing a video, while a
researcher collects heart rate and eye movement
data. - Events must be synchronized for analysis, large
datasets result - Extensive analysis capabilities are not something
that each researcher should have to create for
themselves.
http//www.ci.uchicago.edu/research/files/sidgrid.
mov
24- Social scientists have traditionally worked in
isolated labs without the capability to share
data or insights with others. - SIDGrid enables a number of capabilities.
- Data that is expensive to collect can now be
shared with others, increasing the potential for
scientific impact. - Geographically distant researchers can
collaborate on the analysis of the same data set. - Complex analysis tools and workflows are now
available for all to use, rather than having each
lab duplicate efforts. - All researchers now have access to the highest
quality computational resources - SIDGrid uses TeraGrid resources for
computationally-intensive tasks such as media
transcoding algorithms for pitch analysis of
audio tracks and fMRI image analysis - SIDGrid is unique among social science data
archive projects - Focused on streaming data which change over time
- Provides the ability to investigate multiple
datasets, collected at different time scales,
simultaneously - Active users of the SIDGrid system include a
human neuroscience group and linguistic research
groups from the University of Chicago and the
University of Nottingham, UK
25- 40 institutional members
- 9 foreign affiliates
- Researchers request synthetic seismograms for any
given earthquake - Allows scientists to understand the ground motion
associated with any given earthquake - Requested and received advanced support from
TeraGrid
26Talks at E-Science
- See the PSE Workshop http//escience2008.iu.edu/w
orkshops/innovative/index.shtml - Friday, 1000 am-430 pm
- Nancy Wilkins-Diehr will have more to say about
some of these gateways. - See also Rich Wolskis keynote on cloud
computing. Next generation gateways will (need
to) support cloud computing and virtual
machine-based backends. - Purdues NanoHUB and HUB0 software have done this
for some time.
27Getting Started Building a Gateway
- Should you? And how can you get help?
28When might a gateway be appropriate?
- Researchers using defined sets of tools in
different ways - Same executables, different input
- GridChem, CHARMM
- Creating multi-scale or complex workflows
- Datasets
- Common data formats
- National Virtual Observatory
- Earth System Grid
- Some groups have invested significant efforts
here - caBIG, extensive discussions to develop common
terminology and formats - BIRN, extensive data sharing agreements
- Difficult to access data/advanced workflows
- Sensor/radar input
- LEAD, GEON
29Advanced support for OCI resourcesIncluding
gateway integration
- Same peer review process used to request
resources - 30,000 CPUs
-
- 6 months of Nancy
- Reviews based on appropriate use of resources,
science is not reviewed if already funded - Petascale
- Multisite workflows
- Gateways
- Domain expertise
Or someone really talented
30Support is Very Targeted
- Start with well-defined objectives
- Focus on efficient or novel use of OCI resources
- Access to minimum 0.25 FTE for months to a year
- Enough investment to really understand and help
solve complex problems - Must have commitment from PIs
- Want to make sure work is incorporated into
production codes and gateways - Good candidates for targeted support include
- Large, high impact projects
- Ability to influence new communities
- Lessons learned move into training and
documentation
31GATEWAYS UNDER THE HOOD
32My 2002 octopus SOA diagram, from the archives.
Browser Interface
HTTP(S)
Portlets Client Stubs
SOAP/HTTP
WSDL
WSDL
WSDL
WSDL
WSDL
WSDL
WSDL
WSDL
WSDL
DB Service
Job Sub/Mon And File Services
Visualization Service
JDBC
DB
DB
Operating and Queuing Systems
Host 1
Host 2
Host 3
33Terminology
- Portlet this is a standard Java component that
generates HTML and can also act as a client to a
remote service. - Lives in a portal container.
- I will also use this term generically.
- Web Service a remotely invoke-able function on
the Internet. - SOAP the XML message envelop for carrying
commands over HTTP. - WSDL describes the services API in XML.
- REST A variation of this approach.
- Lots more info http//grids.ucs.indiana.edu/ptliu
pages/presentations/I590WebService.ppt
34But Why?
- Three-tiered Service Oriented Architecture is the
network equivalent of the the famous
Model-View-Controller design pattern. - View the user interface components.
- Controller Web service middleware
- Model the backend resources.
- Independence of tiers gives flexibility
- Services can be reused with alternative user
interfaces - Workflow composers like Taverna, Xbaya, Kepler
- User interfaces can work with different service
implementations. - Drawback reliability and robustness are issues.
35Two Approaches to the Middle Tier
Fat Client
Thin Client
Portal Comp.
Portal Comp.
Grid Client
HTTP SOAP
Web Service
Grid Protocol (SOAP)
Grid Client
Grid Protocol (SOAP)
Grid Service
Grid Service
Backend Resource
Backend Resource
36Managing Scientific Workflows
- A Preview for Sureshs Talks and Demos
37Scientific Workflows
- Portal interfaces encode scientific use cases.
- If you have a rich set of services, it is a lot
of work to make portlets for all possible use
cases. - And power users will have always want something
more. - Example our CICC project has dozens of chemical
informatics Web services. - http//www.chembiogrid.org.wiki
- Workflow composers can simplify this.
- Allow users to encode and execute their own use
cases.
38Web Services and Workflows
- Perform a similarity search on the NIH DTP Human
Tumor data. - Filter the results based on Pharmacokinetic
properties (FILTER) - Convert to 3D (OMEGA)
- Docking into a pre-defined protein (FRED)
- Visualize (JMOL).
Taverna workflow connects remote services.
39OGCEs XBaya Workflow Composer
40Updating the Octopus
Browser Interface
HTTP(S)
Social GadgetsAJAX
RSS,JSON/HTTP
REST
REST
REST
REST
REST
WSDL
REST
REST
REST
DB Service
Job Sub/Mon And File Services
Visualization Service
JDBC
DB
DB
Operating and Queuing Systems
Host 1
Host 2
Host 3
41Enterprise Approach Web 2.0 Approach
JSR 168 Portlets Gadgets, Widgets
Server-side integration and processing AJAX, client-side integration and processing, JavaScript
SOAP RSS, Atom, JSON
WSDL REST (GET, PUT, DELETE, POST)
Portlet Containers Open Social Containers (Orkut, LinkedIn, Shindig) Facebook StartPages
User Centric Gateways Social Networking Portals
Workflow managers (Taverna, Kepler, etc) Mash-ups
Grid computing Globus, condor, etc Cloud computing Amazon WS Suite, Xen Virtualization
Semantic Web RDF, OWL, ontologies Microformats, folksonomies
42Sample Grid Gadgets in iGoogle
43Microformats, KML, and GeoRSS feeds used to
deliver SAR data to multiple clients.
44More Information
- Contact me mpierce_at_cs.indiana.edu
- See what Im up to http//communitygrids.blogspot
.com/ - OGCE software http//collab-ogce.org/
- Lots of people worked on all of these.
45Tremendous Opportunities Using the Largest Shared
Resources - Challenges too!
- Whats different when the resource doesnt belong
just to me? - Resource discovery
- Accounting
- Security
- Proposal-based requests for resources
(peer-reviewed access) - Code scaling and performance numbers
- Justification of resources
- Gateway citations
- Tremendous benefits at the high end, but even
more work for the developers - Potential impact on science is huge
- Small number of developers can impact thousands
of scientists - But need a way to train and fund those developers
and provide them with appropriate tools
46Gateways can further investments in other projects
- Increase access
- To instruments
- Increase capabilities
- To analyze data
- Improve workforce development
- For underserved populations
- Increase outreach
- Increase public awareness
- Public sees value in investments in large
facilities