Building Science Gateways - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Building Science Gateways

Description:

Building Science Gateways Marlon Pierce Community Grids Laboratory Indiana University – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 47
Provided by: NancyW170
Category:

less

Transcript and Presenter's Notes

Title: Building Science Gateways


1
Building Science Gateways
  • Marlon Pierce
  • Community Grids Laboratory
  • Indiana University

2
Tutorial Overview
Type Title Presenter
Talk Gateways overview Marlon
Talks OGCE overview Marlon
Talk TeraGrid Resources Overview Simms
Break Break Break
Demo LEAD Portal and workflows Suresh
Demo GridChem Workflow Suresh
Demo OGCE and TGUP Portals Marlon
Lunch Lunch Lunch
3
Theres More
Type Title Presenter
Hands On OGCE, LEAD, and TGUP portals and workflows Marlon, Suresh
Talk/HO Building the OGCE Portal Marlon
Talk/HO Building gadgets with GTLAB Marlon
Break (200-230) Break (200-230) Break (200-230)
Talk Web 2.0 for Science Gateways (Optional) Marlon
HO Continue hands on work Suresh, Marlon
4
Slides and Demo Site
  • Tutorial slides are available from
    http//www.collab-ogce.org/ogce/index.php/Tutorial
    s
  • We run a permanent demo portal at
    https//community.ucs.indiana.edu8443/gridsphere/
  • Also aliased as https//ogceportal.iu.teragrid.org
    8443/gridsphere
  • Portal accounts train01-train30 have been created
    for the workshop. Password is the same as the
    account name.
  • Also train31-train49 from TG08 workshop.
  • We also have TeraGrid training accounts with
    names train01-train30 that can be used to
    retrieve TG proxy credentials.
  • These should be active all week.
  • You can also log into the TeraGrid User Portal
    with this account and the secret password.

5
Concept 1 Web Portal
  • Web container that aggregates content from
    multiple sources into a single display.
  • Start Pages
  • Typically consume RSS/Atom news feeds.
  • More powerful versions these days support Flickr,
    calendars, games, etc.
  • Gadgets, widgets
  • Examples iGoogle, Netvibes, My Yahoo!

6
Gadget
RSS Feeds
7
Concept 2 Grid Computing
  • Grid computing software is designed to integrate
    large supercomputing facilities.
  • TeraGrid, Open Science Grid, EGEE, etc.
  • This is done via network services
  • Software providers in the US include Globus and
    Condor
  • Key Service Components (and example services)
  • Authentication and authorization framework
    (MyProxy)
  • Remote process access and control (GRAM, Condor)
  • Remote file, I/O access (GridFTP, SRB, RFT)
  • Additional Services
  • Information services, replica management,
    database federation, storage management,
    schedulers, etc.
  • Example Grid Software Stacks CTSS and VDT
  • For TeraGrid and Open Science Grid, respectively
  • Being pushed by Cloud Computing (Amazon, Google,
    Microsoft, others)

8
(No Transcript)
9
Science Portals and Gateways
  • Science Gateways adapt Web portal technology to
    build user interfaces to the Grid.
  • Science portals resemble standard portals, but
    must also
  • Support access to computing and storage
    resources.
  • Allow users remote, direct access to these
    resources.
  • You often want to run applications and access
    data that you own directly.
  • Provide access to science applications and data
    sets.
  • And we must provide value added services as well
    as user interfaces.

10
Example Science Gateways
  • Many listed here
  • http//www.teragrid.org/programs/sci_gateways/
  • Cover many different scientific fields
  • Atmospheric science, geophysics, computational
    chemistry, bioinformatics, etc
  • See also GCE08 workshop at SC08 and earlier
    proceedings
  • http//www.collab-ogce.org/gce08/index.php/Main_Pa
    ge
  • GCE05-07 also linked.

11
TeraGrid Science Gateways Program
  • Slides courtesy of Nancy Wilkins-Diehr
  • TeraGrid Area Director for Science Gateways
  • wilkinsn_at_sdsc.edu

12
Today, there are approximately 29 gateways using
the TeraGrid
13
Does a gateway have to use TeraGrid to be a
gateway?
  • No, but the TeraGrid does fund the development
    and support of these gateways
  • Using high end resources is more work and is not
    recommended unless it serves a demonstrated need
  • Gateways are an excellent way to extend the
    impact of high-end resources
  • Are they all funded by TeraGrid?
  • Can TeraGrid claim success for all gateways?
  • No, we dont make the gateways you use, we make
    the gateways you use better
  • TeraGrid does fund a small number of developers
    to provide advanced support.
  • More later.

14
Why are gateways worth the effort?
Full path to executable executable/user
s/wilkinsn/tutorial/bin/mcell Working
directory, where Condor-G will write its
output and error files on the local
machine. initialdir/users/wilkinsn/tutorial/exerc
ise_3 To set the working directory of the
remote job, we specify it in this globus RSL,
which will be appended to the RSL that Condor-G
generates globusrsl(directory'/users/wilkinsn/tu
torial/exercise_3') Arguments to pass to
executable. argumentsnmj_recon.main.mdl
Condor-G can stage the executable transfer_executa
blefalse Specify the globus resource to
execute the job globusschedulertg-login1.sdsc.ter
agrid.org/jobmanager-pbs Condor has multiple
universes, but Condor-G always uses
globus universeglobus Files to receive sdout
and stderr. outputcondor.out errorcondor.err
Specify the number of copies of the job to submit
to the condor queue. queue 1
  • Increasing range of expertise needed to tackle
    the most challenging scientific problems
  • How many details do you want each individual
    scientist to need to know?
  • PBS, RSL, Condor
  • Coupling multi-scale codes
  • Assembling data from multiple sources
  • Collaboration frameworks

! /bin/sh PBS -q dque PBS -l nodes1ppn2
PBS -l walltime000200 PBS -o pbs.out PBS
-e pbs.err PBS -V cd /users/wilkinsn/tutorial/exe
rcise_3 ../bin/mcell nmj_recon.main.mdl
( (resourceManagerContact"tg-login1.sds
c.teragrid.org/jobmanager-pbs")
(executable"/users/birnbaum/tutorial/bin/mcell")
(argumentsnmj_recon.main.mdl)
(count128) (hostCount10) (maxtime2)
(directory"/users/birnbaum/tutorial/exerci
se_3") (stdout"/users/birnbaum/tutorial/e
xercise_3/globus.out") (stderr"/users/bir
nbaum/tutorial/exercise_3/globus.err") )
15
Not just ease of useWhat can scientists do that
they couldnt do previously?
  • LEAD - access to radar data
  • NVO access to sky surveys
  • OOI access to sensor data
  • PolarGrid access to polar ice sheet data
  • SIDGrid analysis tools
  • GridChem developing multiscale coupling
  • How would this have been done before gateways?

16
Gateways Greatly Expand Access
  • Almost anyone can investigate scientific
    questions using high end resources
  • Not just those in the research groups of those
    who request allocations
  • Gateways allow anyone with a web browser to
    explore
  • Opportunities can be uncovered via google
  • Nancys 11-year-old son discovered nanoHUB.org
    himself while his class was studying Bucky Balls
  • Fosters new ideas, cross-disciplinary approaches
  • Encourages students to experiment
  • But used in production too
  • Significant number of papers resulting from
    gateways including GridChem, nanoHUB
  • Scientists can focus on challenging science
    problems rather than challenging infrastructure
    problems

17
TeraGrid Pathways Activities
  • Program funding to involve MSI communities
  • 2 Gateway components
  • Adapt gateways for educational use by
    underrepresented communities
  • GEON SDSC, Navajo Tech
  • Teach participants from underrepresented
    communities how to build gateways
  • PolarGrid IU, ECSU

18
Navajo Technical College and gateways
  • Incorporating the use of gateways in their
    curricula
  • GEON, GISolve areas of initial interest

19
PolarGrid
  • Cyberinfrastructure Center for Polar Science
    (CICPS)
  • Experts in polar science, remote sensing and
    cyberinfrastructure
  • Indiana, ECSU, CReSIS
  • Satellite observations show disintegration of ice
    shelves in West Antarctica and speed-up of
    several glaciers in southern Greenland
  • Most existing ice sheet models, including those
    used by IPCC cannot explain the rapid changes

http//www.polargrid.org/polargrid/images/4/42/C00
50-polargrid-big.m4v
Source Geoffrey Fox
20
  • Components of PolarGrid
  • Expedition grid consisting of ruggedized laptops
    in a field grid linked to a low power multi-core
    base camp cluster
  • Prototype and two production expedition grids
    feed into a 17 Teraflops "lower 48" system at
    Indiana University and Elizabeth City State
    (ECSU) split between research, education and
    training.
  • Gives ECSU a top-ranked 5 Teraflop MSI high
    performance computing system
  • Access to expensive data
  • High-end resources for analysis
  • MSI student involvement

Source Geoffrey Fox
21
Recent Gateways using TeraGrid Significantly
  • SCEC
  • SIDGrid
  • CIG

22
SCEC using gateway to produce hazard map
  • PSHA hazard map for California using newly
    released Earthquake Rupture Forecast (UCERF2.0)
    calculated using SCEC Science Gateway
  • Warm colors indicate regions with a high
    probability of experiencing strong ground motion
    in the next 50 years.
  • High resolution map, significant CPU use

23
Social Informatics Data Grid
  • Heavy use of multimodal data.
  • Subject might be viewing a video, while a
    researcher collects heart rate and eye movement
    data.
  • Events must be synchronized for analysis, large
    datasets result
  • Extensive analysis capabilities are not something
    that each researcher should have to create for
    themselves.

http//www.ci.uchicago.edu/research/files/sidgrid.
mov
24
  • Social scientists have traditionally worked in
    isolated labs without the capability to share
    data or insights with others.
  • SIDGrid enables a number of capabilities.
  • Data that is expensive to collect can now be
    shared with others, increasing the potential for
    scientific impact.
  • Geographically distant researchers can
    collaborate on the analysis of the same data set.
  • Complex analysis tools and workflows are now
    available for all to use, rather than having each
    lab duplicate efforts.
  • All researchers now have access to the highest
    quality computational resources
  • SIDGrid uses TeraGrid resources for
    computationally-intensive tasks such as media
    transcoding algorithms for pitch analysis of
    audio tracks and fMRI image analysis
  • SIDGrid is unique among social science data
    archive projects
  • Focused on streaming data which change over time
  • Provides the ability to investigate multiple
    datasets, collected at different time scales,
    simultaneously
  • Active users of the SIDGrid system include a
    human neuroscience group and linguistic research
    groups from the University of Chicago and the
    University of Nottingham, UK

25
  • 40 institutional members
  • 9 foreign affiliates
  • Researchers request synthetic seismograms for any
    given earthquake
  • Allows scientists to understand the ground motion
    associated with any given earthquake
  • Requested and received advanced support from
    TeraGrid

26
Talks at E-Science
  • See the PSE Workshop http//escience2008.iu.edu/w
    orkshops/innovative/index.shtml
  • Friday, 1000 am-430 pm
  • Nancy Wilkins-Diehr will have more to say about
    some of these gateways.
  • See also Rich Wolskis keynote on cloud
    computing. Next generation gateways will (need
    to) support cloud computing and virtual
    machine-based backends.
  • Purdues NanoHUB and HUB0 software have done this
    for some time.

27
Getting Started Building a Gateway
  • Should you? And how can you get help?

28
When might a gateway be appropriate?
  • Researchers using defined sets of tools in
    different ways
  • Same executables, different input
  • GridChem, CHARMM
  • Creating multi-scale or complex workflows
  • Datasets
  • Common data formats
  • National Virtual Observatory
  • Earth System Grid
  • Some groups have invested significant efforts
    here
  • caBIG, extensive discussions to develop common
    terminology and formats
  • BIRN, extensive data sharing agreements
  • Difficult to access data/advanced workflows
  • Sensor/radar input
  • LEAD, GEON

29
Advanced support for OCI resourcesIncluding
gateway integration
  • Same peer review process used to request
    resources
  • 30,000 CPUs
  • 6 months of Nancy
  • Reviews based on appropriate use of resources,
    science is not reviewed if already funded
  • Petascale
  • Multisite workflows
  • Gateways
  • Domain expertise

Or someone really talented
30
Support is Very Targeted
  • Start with well-defined objectives
  • Focus on efficient or novel use of OCI resources
  • Access to minimum 0.25 FTE for months to a year
  • Enough investment to really understand and help
    solve complex problems
  • Must have commitment from PIs
  • Want to make sure work is incorporated into
    production codes and gateways
  • Good candidates for targeted support include
  • Large, high impact projects
  • Ability to influence new communities
  • Lessons learned move into training and
    documentation

31
GATEWAYS UNDER THE HOOD
32
My 2002 octopus SOA diagram, from the archives.
Browser Interface
HTTP(S)
Portlets Client Stubs
SOAP/HTTP
WSDL
WSDL
WSDL
WSDL
WSDL
WSDL
WSDL
WSDL
WSDL
DB Service
Job Sub/Mon And File Services
Visualization Service
JDBC
DB
DB
Operating and Queuing Systems
Host 1
Host 2
Host 3
33
Terminology
  • Portlet this is a standard Java component that
    generates HTML and can also act as a client to a
    remote service.
  • Lives in a portal container.
  • I will also use this term generically.
  • Web Service a remotely invoke-able function on
    the Internet.
  • SOAP the XML message envelop for carrying
    commands over HTTP.
  • WSDL describes the services API in XML.
  • REST A variation of this approach.
  • Lots more info http//grids.ucs.indiana.edu/ptliu
    pages/presentations/I590WebService.ppt

34
But Why?
  • Three-tiered Service Oriented Architecture is the
    network equivalent of the the famous
    Model-View-Controller design pattern.
  • View the user interface components.
  • Controller Web service middleware
  • Model the backend resources.
  • Independence of tiers gives flexibility
  • Services can be reused with alternative user
    interfaces
  • Workflow composers like Taverna, Xbaya, Kepler
  • User interfaces can work with different service
    implementations.
  • Drawback reliability and robustness are issues.

35
Two Approaches to the Middle Tier
Fat Client
Thin Client
Portal Comp.
Portal Comp.
Grid Client
HTTP SOAP
Web Service
Grid Protocol (SOAP)
Grid Client
Grid Protocol (SOAP)
Grid Service
Grid Service
Backend Resource
Backend Resource
36
Managing Scientific Workflows
  • A Preview for Sureshs Talks and Demos

37
Scientific Workflows
  • Portal interfaces encode scientific use cases.
  • If you have a rich set of services, it is a lot
    of work to make portlets for all possible use
    cases.
  • And power users will have always want something
    more.
  • Example our CICC project has dozens of chemical
    informatics Web services.
  • http//www.chembiogrid.org.wiki
  • Workflow composers can simplify this.
  • Allow users to encode and execute their own use
    cases.

38
Web Services and Workflows
  • Perform a similarity search on the NIH DTP Human
    Tumor data.
  • Filter the results based on Pharmacokinetic
    properties (FILTER)
  • Convert to 3D (OMEGA)
  • Docking into a pre-defined protein (FRED)
  • Visualize (JMOL).

Taverna workflow connects remote services.
39
OGCEs XBaya Workflow Composer
40
Updating the Octopus
Browser Interface
HTTP(S)
Social GadgetsAJAX
RSS,JSON/HTTP
REST
REST
REST
REST
REST
WSDL
REST
REST
REST
DB Service
Job Sub/Mon And File Services
Visualization Service
JDBC
DB
DB
Operating and Queuing Systems
Host 1
Host 2
Host 3
41
Enterprise Approach Web 2.0 Approach
JSR 168 Portlets Gadgets, Widgets
Server-side integration and processing AJAX, client-side integration and processing, JavaScript
SOAP RSS, Atom, JSON
WSDL REST (GET, PUT, DELETE, POST)
Portlet Containers Open Social Containers (Orkut, LinkedIn, Shindig) Facebook StartPages
User Centric Gateways Social Networking Portals
Workflow managers (Taverna, Kepler, etc) Mash-ups
Grid computing Globus, condor, etc Cloud computing Amazon WS Suite, Xen Virtualization
Semantic Web RDF, OWL, ontologies Microformats, folksonomies
42
Sample Grid Gadgets in iGoogle
43
Microformats, KML, and GeoRSS feeds used to
deliver SAR data to multiple clients.
44
More Information
  • Contact me mpierce_at_cs.indiana.edu
  • See what Im up to http//communitygrids.blogspot
    .com/
  • OGCE software http//collab-ogce.org/
  • Lots of people worked on all of these.

45
Tremendous Opportunities Using the Largest Shared
Resources - Challenges too!
  • Whats different when the resource doesnt belong
    just to me?
  • Resource discovery
  • Accounting
  • Security
  • Proposal-based requests for resources
    (peer-reviewed access)
  • Code scaling and performance numbers
  • Justification of resources
  • Gateway citations
  • Tremendous benefits at the high end, but even
    more work for the developers
  • Potential impact on science is huge
  • Small number of developers can impact thousands
    of scientists
  • But need a way to train and fund those developers
    and provide them with appropriate tools

46
Gateways can further investments in other projects
  • Increase access
  • To instruments
  • Increase capabilities
  • To analyze data
  • Improve workforce development
  • For underserved populations
  • Increase outreach
  • Increase public awareness
  • Public sees value in investments in large
    facilities
Write a Comment
User Comments (0)
About PowerShow.com