Title: CLADE Review 2003-2008
1CLADE Review2003-2008
- Nancy Wilkins-Diehr
- wilkinsn_at_sdsc.edu
2The Origin of CLADE
- The CLADE workshop began with a discussion at
HPDC-11, July 24-26, 2002, at Edinburgh
International Conference Center in Scotland. - Salim Hariri, C.S. Raghavendra, and I and likely
a couple of others got to talking about the state
of Grid applications. - At that time quite a lot of progress had been
made with tools and technologies for distributed
applications, but we were not seeing many
applications papers at HPDC, or in other forums
either. - So Salim suggested that we put together a
workshop to focus attention on applications, and
he asked me to help organize it. - Ray Bair
3Keys to the Success of CLADE
- Compliments the HPDC program
- Focus on real applications that demonstrate the
use of Grid approaches on a significant scale. - CLADE's association with HPDC still distinguishes
it from other conferences - Bringing together cutting edge computer science
and applications - Support of the HPDC Steering Committee
- Strong Program Committee chairs
- Good advice from CLADE's Steering Committee
- Engaged Program Committee members
- Peer-review system has been important in
selecting good papers that are timely and
interesting - Distribution of the CLADE proceedings at the
workshop increases the value and usefulness of
the papers to the participants
42008 CLADE Organization
- STEERING COMMITTEE
- Raymond Bair, ANL
- Ioana Banicescu, Mississippi State Univ.
- Francine Berman, Univ. of Calif., San Diego
- Jack Dongarra, Univ. of Tenn., Knoxville
- Salim Hariri, University of Arizona
- Manish Parashar, Rutgers University
- Viktor Prasanna, Univ. of Southern Calif.
- Joel Saltz, Ohio State University
- Edward Seidel, Louisiana State University
- Alan Sussman, University of Maryland
- PROGRAM COMMITTEE Henrique Andrade, IBM
ResearchDavid Bernholdt, ORNLJiannong Cao, HK
PolyUUmit Catalyurek, Ohio State U.Kenneth
Chiu, U. BinghamtonJose Cunha, U. Nova de
LisboaEwa Deelman, ISIFrederic Desprez, ENS
LyonHai Jin, HUSTTevfik Kosar, Louisiana State
U.Tahsin Kurc, Ohio State U.Jysoo Lee, Calit2 - Sang Boem Lim, KonKuk U.David Lowenthal, U.
GeorgiaMalika Mahoui, IUPUIJames Myers, NCSA - Gregory Newby, Arctic Region Supercomputing
CenterJun Ni, U. IowaYoonho Park, IBM
ResearchMarlon Pierce, Indiana U. - Ilkyun Ra, U. Colorado DenverThomas Rauber,
U. BayreuthGudula Rünger, TU ChemnitzEdward
Walker, TACC - Shaowen Wang, UIUC
5Todays Talk
- Overview CLADE keynotes 2003-2007
- 2003 Dynamic Data Driven Application Systems,
Frederica Darema - 2004 A Grid based Diagnostics and Prognosis
System for Rolls Royce Aero Engines The DAME
Project, Jim Austin - 2005 Enabling Science and Engineering
Applications on the Grid, Ed Seidel - 2006 Gridcast - a Next Generation Broadcasting
Infrastructure?, Terry Harmer - 2007 The Cancer Biomedical Informatics Grid
Connecting the Cancer Research Community, Scott
Oster - TeraGrid Science Gateways
6CLADE 2003, Seattle
- Keynote Presentation
- Frederica Darema, Senior Science and Technology
Advisor and Director of the Next Generation
Software Program, National Science Foundation - Dynamic Data Driven Application Systems
- Highlighted the relationship between theory,
simulation and experiment or field data - Dynamic feedback and control loop between
simulation and experimental data - DDDAS has potential for significant impact to
science, engineering, and commercial world, akin
to the transformation effected since the 50s by
the advent of computers
7Example DDDAS Applications
- Generalized methodology for state estimation and
prediction - Predictor-Corrector methods
- Advanced Driving Assistance Systems for
automobiles - Tracking algorithms for Air Traffic Control
- Enhancing oil exploration methods and
capabilities - Enhanced manufacturing supply chains through
sensor information
Source Frederica Darema
8- Virtual operations re-planning and control
- Event-driven simulations for systems subject to
unplanned outages - Earthquake tolerant buildings and bridges
- Fire propagation prediction and management
Source Frederica Darema
9- Integrated Image-Guided Interventions
- Real-time, three-dimensional (3D) imaging needs
of surgeons. - Biodiversity and bio-complexity
- Dramatic changes due to habitat transformation,
invasions of exotic species, chemical
contamination, diseases and epidemics, climate
change, and floods and drought
Source Frederica Darema
10- Hydro-complexity Weather, Water and Pollution
- Design and configuration methodologies for sensor
networks - The oceanographic community at large has
interests in DDDAS in order to help optimize
observing systems for important scientific
studies.
Source Frederica Darema
11CLADE 2004, Honolulu
- Keynote Presentation
- Jim Austin, University of York
- A Grid based Diagnostics and Prognosis System for
Rolls Royce Aero Engines The DAME Project - Very practical engineering application
- Using distributed data intensive Grid application
to diagnosis and prognosis of Rolls-Royce Aero
Engines
12Distributed Aircraft Maintenance Environment
(DAME)
- UK e-Science pilot project
- Quote
- Neural networkbased techniques for real-time
monitoring - Compare stored vibration data with instantaneous
snapshots - Each flight produces 1GB of data, TBs per year of
distributed data for a fleet. - AURA
- Advanced Uncertain Reasoning Architecture for
Pattern Matching - Pattern matching among terascale datasets,
distribute for speed - CBR
- Case Based Reasoning systems for intelligent
decision support - Correlates engine anomalies with root cause
- Combine into scalable system using grid
middleware - Utilising large amounts of vibration and
performance data available from modern
aero-engines for fleet based diagnostics
Source Jim Austin
13- Fault diagnosis and prognosis integrated with
predictive maintenance - Detect that engine has deviated from normal
(QUOTE) - Diagnose why (AURA)
- Form a prognosis (CBR)
- Plan remedial actions
- Common components of all fault diagnosis and
prognosis systems
Source Jim Austin
14- Quality of Service and Security are two most
important project concerns - QoS critical for commercial deployment, SLAs will
likely be a necessity - Workgroup formed to focus on security
- Future directions
- Base services can be used with many other apps
- Put core services into a portal
- More flexible workflow configurations
- Current project considered a demonstration
project - Commercial implementation will need high
availability, reliability, data integrity,
confidentiality
Source Jim Austin
15CLADE 2005, Research Triangle Park, NC
- Keynote Presentation
- Ed Seidel, Louisiana State University
- Enabling Science and Engineering Applications on
the Grid - Ed Seidel, recently named Office of
Cyberinfrastructure director at NSF reporting to
Dr. Bement - Many years experience with distributed
applications and high performance computing
16Optical Networks 1000x faster than regionalWhat
are people doing with this?
- Collaboration
- Distributed communities (NEES, GEON), shared CI
data, code, tools, resources, simulations - Standard things
- Task farming, resource brokering, remote steering
- New scenarios
- Apps abstracted, dynamic apps find their own
services, resources, people distributed apps
spawned, monitored - Grids bring it all together, but worries in the
US about DOE, NSF CI funding
Source Ed Seidel
17Distributed computation the old way
- Why?
- Capacity computers cant keep up with needs
- Throughput
- Issues
- Bandwidth (increasing faster than computation)
- Latency
- Communication needs, Topology
- Communication/computation
- Techniques to be developed
- Overlapping communication/computation
- Extra ghost zones to reduce latency
- Compression
- Algorithms to do this for scientist
- Gridlab.org, cactuscode.org
Source Ed Seidel
18Distributed computation the new way
- Intelligent parameter surveys, Monte Carlos
- May control other simulations
- Dynamic staging move to faster/cheaper/bigger
machine (Grid Worm) - Need more memory? Need less?
- Multiple universe clone to investigate steered
parameter (Gird Virus) - Automatic component loading
- Needs of process change, discover/load/execute
new component somewhere - Automatic look ahead, convergence testing
- spawn off and run coarser resolution to predict
likely future, study convergence - Routine profiling
- Best machine/queue, choose resolution parameters
based on queue - Dynamic load balancing inhomogeneous loads,
multiple grids - DDDAS injecting data into the above, feed back
to experiment
Source Ed Seidel
19GridLab5M EU Project
- Code/User/Infrastructure should be aware of
- environment
- Discover resources available NOW, and their
current state - What is my allocation?
- What is the bandwidth/latency between sites?
- Code/User/Infrastructure should be able to make
decisions - A slow part of my simulation can run
asynchronouslyspawn it off! - New, more powerful resources just became
availablemigrate there! - Machine went downreconfigure and recover!
- Need more memory (or less!)get it by adding
(dropping) machines! - Code/User/Infrastructure should be able to
publish to central server for tracking,
monitoring, steering - Unexpected eventnotify users!
- Collaborators from around the world all connect,
examine simulation. - Rethink algorithms Task farming, vectors,
pipelines, etc all apply on Grids The Grid IS
your Computer!
Source Ed Seidel
20Eds Conclusions
- Optical Networks, grids promise new ways of
computing - Networks need application toolkits, reasonable
cost model - Standards developing
- 15 years ago parallel computing drove
interconnects, HPF, MPI - Now 2 levels...OGSA grid services, SAGA for apps
- GridLab www.gridlab.org
- Grid Application Toolkit www.gridlab.org/GAT
- Documentation, publications, software download
- Cactus Computational Toolkit www.cactuscode.org
- GGF Simple API for Grid Applications (SAGA)
- Today, SAGA continues as an active research group
in the Open Grid Forum (OGF) - Paper presentation on GAT/SAGA at TeraGrid 08
last week
Source Ed Seidel
21CLADE 2006, Paris
- Keynote Presentation
- Terry Harmer, Technical Director of the Belfast
e-Science Centre (BeSC) - Gridcast - a Next Generation Broadcasting
Infrastructure? - Media broadcasting
- BBC has offices in most world capitals
- Large scale, distributed, dynamic, highly
reactive management of broadcast content - Prototype broadcasting grid developed has been
deployed since 2004 - UK e-Science project
- 50 of funding for UK e-Science centers must come
from industry
22Broadcasting is distributedUndergoing rapid
technical change
- Grid can potentially address technical challenges
- Secure, wide area distribution of high volume
content - Secure remote access to high value technical
resources - Advanced editing suites
- Integration of devices, equipment, applications
- Economic challenges to deliver cost-effective.
Resilient, extensible infrastructure in rapidly
changing environment - BBC wanted move to commodity infrastructure
- 280 gig per hour in data movement
- Grid as integration framework
- Tie together various platforms
- Deploy software
- Not really for computing at this stage
- 13 May, 2008
- BeSC awarded over 900,000 to continue its role
in developing the successor to the world wide web - Use of grid via Gridcast provides greater
programming autonomy among BBC sites
Source Terry Harmer
23CLADE 2007, Monterrey, CA
- Keynote Presentation
- Scott Oster, Ohio State University
- The Cancer Biomedical Informatics Grid
Connecting the Cancer Research Community - Goal Relieve suffering due to cancer by 2015
- 61 cancer labs supported by the National Cancer
Institute (NCI) - More than 50 of these, 30 organizations, 800
people involved in caBIG - Create scalable, actively managed organization
that will connect members of the NCI-supported
cancer enterprise by building a biomedical
informatics network
24caBIG Motivation
- This year there will be approximately 1,400,000
Americans diagnosed with cancer - More than 500,000 Americans are expected to die
from cancer this year - In 2005, the NIH estimated costs for cancer at
209.9 billion, with direct medical costs of 74
billion
Source Scott Oster
25What is caBIG?
- Common, widely distributed infrastructure that
permits the cancer research community to focus on
innovation - Shared, harmonized set of terminology, data
elements, and data models that facilitate
information exchange - Collection of interoperable applications
developed to common standards - Cancer research data available for mining and
integration
Source Scott Oster
26Driving Needs
- A multitude of legacy information systems, most
of which cannot be readily shared between
institutions - Difficulty in identifying and accessing available
resources - Approach standards-based grid, WSRF web
services, Introduce - But standards in Web/Grid service domain are
turbulent at best - Competing interests of big business and
multiple standards bodies - An absence of tools to connect different
databases - An absence of common data formats
- Approach Adopt XML as data exchange format
- Cancer Data Standards Repository (caDSR) captures
logical model with annotations facilitates reuse
and formal definition - A huge and growing volume of data must be
collected, analyzed, and made accessible - Gridftp, move services to data
- Few common vocabularies, making it difficult, if
not impossible, to interlink diverse research and
clinical results
Source Scott Oster
27- An absence of information infrastructure to share
data within an institution, or among different
institutions - If cancer is cured, and caBIG resources play a
role, there will be much interest in knowing who
contributed what (and who funded them) - Technical Approach
- Single sign on, Grid Authentication and
Authorization with Reliably Distributed Services
(GAARDS) - Federate Identity Management (Dorian)
- Authorization solutions
- GridGrouper for group-based
- CSM for local policy
- Globus PDPs for complex rules
- Institutional Review Boards (IRB) involved for
any protected health information (PHI) even for
de-identified data - Grid is multi-institutional which means IRBs must
reach agreements (read separately employed
lawyers working together) - Socio-Cultural Approach
- Whole workspace in caBIG dedicated to it (DSIC)
- NCI in a good position to encourage it
- Large percentage of institutions cancer research
funding comes from NCI - Hope is motivation will be value-based once
initially primed
Source Scott Oster
28Scotts Summary
- The bad news
- Large-scale, distributed knowledge sharing is
hard - The good news
- The potential rewards are large
- The good news (for computer scientists)
- There are lots of unsolved problems (and interest
in getting them solved)
- Disparate Systems
- Lack of Common Data Formats
- Data Interoperability
- Finding Resources
- Data Size
- User Accounting
- Data Privacy
- Intellectual Capital
- Complicated Trust Arrangements
- Computationally Intensive
- Evolving Infrastructure
Source Scott Oster
29TeraGrid Science Gateways
30Phenomenal Impact of the Internet on Worldwide
Communication and Information Retrieval
Only 16 years since the release of Mosaic!
- Implications on the conduct of science are still
evolving - 1980s, Early gateways, National Center for
Biotechnology Information BLAST server, search
results sent by email, still a working portal
today - 1992 Mosaic web browser developed
- 1995 International Protein Data Bank Enhanced by
Computer Browser - 2004 TeraGrid project director Rick Stevens
recognized growth in scientific portal
development and proposed the Science Gateway
Program - Simultaneous explosion of digital information
- Analysis needs in a variety of scientific areas
- Sensors, telescopes, satellites, digital images
and video - 1 machine on Top500 today is more powerful than
all combined entries on the first list in 1993
311998 Workshop Highlights Early Impact of Internet
on Science
- Shared access to geographically disperse
resources - Assembling the best minds to tackle the toughest
problems regardless of location - Tackling the same problems differently, but also
tackling different problems - Not only the scope, but the process of scientific
investigation is changed - As the chemical applications and capabilities
provided by collaboratories become more familiar,
researchers will move significantly beyond
current practice to exciting new paradigms for
scientific work
Requirements for future success include -
Development of interdisciplinary partnerships of
chemists and computer scientists - Flexible and
extensible frameworks for collaboratories - Means
to deploy, support, and evaluate collaboratories
in the field
32Rapid Advances in Web Usability
- First generation
- Static Web pages
- Second generation
- Dynamic, database interfaces, cgi
- Lacked the ease of use of desktop applications
- Third generation
- True networked and internetworked applications
that enable dynamic two-way, even multi-way,
communication and collaboration on the Web. - Remarkable new uses of the Web in the
organizational workplace and on the Internet
Source Screen Porch White Paper, The University
of Western Ontario (1996)
33The Internet as a Resource for News and Information about Science Summary of Findings at a Glance
40 million Americans rely on the internet as their primary source for news and information about science.
For home broadband users, the internet and television are equally popular as sources for science news and the internet leads the way for young broadband users.
The internet is the source to which people would turn first if they need information on a specific scientific topic.
The internet is a research tool for 87 of online users. That translates to 128 million adults.
Consumers of online science information are fact-checkers of scientific claims. Sometimes they use the internet for this, other times they use offline sources.
Convenience plays a large role in drawing people to the internet for science information.
Happenstance also plays a role in users experience with online science resources. Two-thirds of internet users say they have come upon news and information about science when they went online for another reason.
Those who seek out science news or information on the internet are more likely than others to believe that scientific pursuits have a positive impact on society.
Internet users who have sought science information online are more likely to report that they have higher levels of understanding of science.
Between 40 and 50 of internet users say they get information about a specific topic using the internet or through email.
Search engines are far and away the most popular source for beginning science research among users who say they would turn first to the internet to get more information about a specific topic.
Half of all internet users have been to a website which specializes in scientific content.
Fully 59 of Americans have been to a science museum in the past year.
Science websites and science museums may serve effectively as portals to one another.
The convenience of getting scientific material on
the web opens doors to better attitudes and
understanding of science. November 20,
2006 John B. Horrigan, Associate Director
http//www.pewinternet.org/pdfs/PIP_Exploratorium_
Science.pdf
34NSF (my sponsor) has long recognized the
importance of science and technology interactions
- Interdisciplinary programs did much to facilitate
application-technology integration and develop
standard tools - 1997 PACI Program
- Shotgun marriages of technologists and
- application scientists
- A few groups served as path finders and
- benefited tremendously
- NPACI neuroscience thrust in 1997 leads
- to Telescience portal and BIRN in 2001
- Information Technology Research (ITR)
- NSF Middleware Initiative (NMI)
- Plug and play tools so more groups can benefit
35NSF Continues Its Leadership TodayWhat Will Lead
to Transformative Science?
- Virtual environments have the potential to
enhance collaboration, education, and
experimentation in ways that we are just
beginning to explore. - In every discipline, we need new techniques that
can help scientists and engineers uncover fresh
knowledge from vast amounts of data generated by
sensors, telescopes, satellites, or even the
media and the Internet.
Gateways are a terrific example of interfaces
that can support transformative science
36Evolution of the Gateway Program
- 2004 TeraGrid Science Gateway term originates
- We will help them build gateway portals that
leverage TeraGrid capabilities and provide
web-based interfaces to community tools - 2005 Gateway requirements analysis team
- Areas of identified commonality include
- Web services, auditing, community accounts,
flexible allocations, scheduling, outreach - Needs of command-line supercomputing users fairly
well defined - Ssh to tg-login
- Data transfer to and from supercomputer
- Software
- MPI, math libraries, domain software
- Compilers
- Batch queue submission
- Help desk
- Need to address Gateway developer needs just as
efficiently
37Tremendous Opportunities Using the Largest Shared
Resources - Challenges too!
- Whats different when the resource doesnt belong
just to me? - Resource discovery
- Accounting
- Security
- Proposal-based requests for resources
(peer-reviewed access) - Code scaling and performance numbers
- Justification of resources
- Gateway citations
- Tremendous benefits at the high end, but even
more work for the developers - Potential impact on science is huge
- Small number of developers can impact thousands
of scientists - But need a way to train and fund those developers
and provide them with appropriate tools
38Ongoing Work to Meet Common Needs
- Web Services
- GT4 deployment, identification of remaining
capabilities - Information services, MDS
- Registry of Gateway services
- TG-specific where can I run soonest with QBETS
- Auditing
- GRAM audit to retrieve usage information for
individual compute jobs - GridShib
- Counting gateway users, individualized
accounting, increased security - Community Accounts
- Policy finalized, security approaches being
tested by RPs - GridShib development, testing with gateways
- Resource requests
- Collaboration with reviewers to develop
guidelines for Gateway PIs - Adapt to usage uncertainties, ability to assess
impact, Gateway management structure
- Scheduling
- Metascheduling
- On-demand via SPRUCE framework
- Outreach
- Pathways project
- Gateway use by educators
- Training MSI students to build Gateways
- Documentation
- Extensive wiki information transformed into
navigable documentation - Gateway Hosting
- Available at IU through peer review
- Staff Support
- Targeted support, general capabilities,
production coordinator
39Variety of Gateways Available Today
Title Discipline
Open Science Grid (OSG) Advanced Scientific Computing
Special PRiority and Urgent Computing Environment (SPRUCE) Advanced Scientific Computing
Massive Pulsar Surveys using the Arecibo L-band Feed Array (ALFA) Astronomical Sciences
National Virtual Observatory (NVO) Astronomical Sciences
Linked Environments for Atmospheric Discovery (LEAD) Atmospheric Sciences
Computational Chemistry Grid (GridChem) Chemistry
Computational Science and Engineering Online (CSE-Online) Chemistry
Network for Earthquake Engineering Simulation (NEES) Earthquake Hazard Mitigation
GEON(GEOsciences Network) (GEON) Earth Sciences
Network for Computational Nanotechnology and nanoHUB Emerging Technologies Initiation
TeraGrid Geographic Information Science Gateway (GISolve) Geography and Regional Science
CIG Science Gateway for the Geodynamics Community Geophysics
QuakeSim (QuakeSim) Geophysics
The Earth System Grid (ESG) Global Atmospheric Research
National Biomedical Computation Resource (NBCR) Integrative Biology and Neuroscience
Developing Social Informatics Data Grid (SIDGrid) Language, Cognition, and Social Behavior
Neutron Science TeraGrid Gateway (NSTG) Materials Research
Biology and Biomedicine Science Gateway Molecular Biosciences
Open Life Sciences Gateway (OLSG) Molecular Biosciences
The Telescience Project Neuroscience Biology
Grid Analysis Environment (GAE) Physics
SCEC Earthworks Project Seismology
TeraGrid Visualization Gateway Visualization, Graphics, and Image Processing
40Easy Gateway True and False TestAnswers Provided
- TeraGrid selects all gateways (F)
- TeraGrid designs all gateways (F)
- TeraGrid limits the number of gateways (F)
- All gateways need TeraGrid funding to exist (F)
- Any PI can request an allocation and use it to
develop a gateway (T) - Gateway design is community-developed and that is
the core strength of the program (T) - TeraGrid staff are alerted to gateway work when a
proposal is reviewed or when a community account
is requested (T) - Limited TeraGrid support can be provided for
targeted assistance to integrate an existing
gateway with TeraGrid (T)
41Gateway Idea Resonates with Scientists
- Capabilities provided by the Web are easy to
envision because we use them in every day life - Researchers can imagine scientific capabilities
provided through a familiar interface - Groups resonate with the fact that gateways are
designed by communities and provide interfaces
understood by those communities - But also provide access to greater capabilities
on the back end without the user needing to
understand the details of those capabilities - Scientists know they can undertake more complex
analyses and thats all they want to focus on - But this seamless access doesnt come for free.
It all hinges on very capable developers.
42Gateways Greatly Expand Access
- Almost anyone can investigate scientific
questions using high end resources - Not just those in the research groups of those
who request allocations - Fosters new ideas, cross-disciplinary approaches
- Encourages students to experiment
- But used in production too
- Increasing number of papers resulting from the
use of gateways - Scientists can focus on challenging science
problems rather than challenging infrastructure
problems
43Highlights NanoHub Explosive User Growth
- In past 12 months
- 68,975 users
- 43 from U.S.
- 25,187 course downloads
- 8,287 podcast downloads
- 371 online meetings
- Full featured gateway
- Simulation tools, curricula, multimedia, user
contributions, collaborations
44Highlights LEAD Inspires StudentsAdvanced
capabilities regardless of location
- A student gets excited about what he was able to
do with LEAD - Dr. SikoraAttached is a display of 2-m T and
wind depicting the WRF's interpretation of the
coastal front on 14 February 2007. It's
interesting that I found an example using IDV
that parallels our discussion of mesoscale
boundaries in class. It illustrates very nicely
the transition to a coastal low and the strong
baroclinic zone with a location very similar to
Markowski's depiction. I created this image in
IDV after running a 5-km WRF run (initialized
with NAM output) via the LEAD Portal. This
simple 1-level plot is just a precursor of the
many capabilities IDV will eventually offer to
visualize high-res WRF output. Enjoy! - Eric (email, March 2007)
45Highlights GridChem Employs a Client-Server
Approach
46for Production Science
- Chemical Reactivity of the Biradicaloid
(HO...ONO) Singlet States of Peroxynitrous Acid.
The Oxidation of Hydrocarbons, Sulfides, and
Selenides. Bach, R. D et al. J. Am. Chem. Soc.
2005, 127, 3140-3155. -
- The "Somersault" Mechanism for the P-450
Hydroxylation of Hydrocarbons. The Intervention
of Transient Inverted Metastable Hydroperoxides.
Bach, R. D. Dmitrenko, O. J. Am. Chem. Soc.
2006, 128(5), 1474-1488. - The Effect of Carbonyl Substitution on the Strain
Energy of Small Ring Compounds and their
Six-member Ring Reference Compounds Bach, R. D.
Dmitrenko, O. J. Am. Chem. Soc. 2006,128(14),
4598. - Azide Reactions for Controlling Clean Silicon
Surface Chemistry Benzylazide on Si(100)-2 x
1Semyon Bocharov et al..J. Am. Chem. Soc., 128
(29), 9300 -9301, 2006 - Chemistry of Diffusion Barrier Film Formation
Adsorption and Dissociation of Tetrakis(dimethylam
ino)titanium on Si(100)-2 1 Rodriguez-Reyes,
J. C. F. Teplyakov, A. V.J. Phys. Chem. C.
2007 111(12) 4800-4808. - Computational Studies of 22 and 42
Pericyclic Reactions between Phosphinoboranes and
Alkenes. Steric and Electronic Effects in
Identifying a Reactive Phosphinoborane that
Should Avoid Dimerization Thomas M. Gilbert and
Steven M. Bachrach Organometallics, 26 (10), 2672
-2678, 2007.
47cancer Bioinformatics Grid Addressing todays
challenges in cancer research and treatment
- The mission of caBIG is to develop a truly
collaborative information network that
accelerates the discovery of new approaches for
the detection, diagnosis, treatment, and
prevention of cancer, ultimately improving
patient outcomes. - The goals of caBIG are to
- Connect scientists and practitioners through a
shareable and interoperable infrastructure - Develop standard rules and a common language to
more easily share information - Build or adapt tools for collecting, analyzing,
integrating, and disseminating information
associated with cancer research and care.
Source cabig.cancer.gov
48caBIG and TeraGrid
- caBIG conducted study of all Gateways
- Pleased to discover that community accounts and
web services will exactly meet their requirements - TeraGrid resources incorporated into geWorkbench
- an open source platform for integrated genomics
used to - Load data from local or remote data sources.
- Visualize gene expression and sequence data in a
variety of ways. - Provide access to client- and server-side
computational analysis tools such as t-test
analysis, hierarchical clustering, self
organizing maps, regulatory networks
reconstruction, BLAST searches, pattern/motif
discovery, etc. - Clustering is used to build groups of genes with
related expression patterns which may contain
functionally related proteins, such as enzymes
for a specific pathway - Validate computational hypothesis through the
integration of gene and pathway annotation
information from curated sources as well as
through Gene Ontology enrichment analysis.
49geWorkbench Integrages TeraGrid Resources
Although the new service is TeraGrid-aware, the
perspective from geWorkbench does not change.
As far as geWorkbench is concerned, it is still
connecting to a Hierarchical Clustering caGrid
service. The difference is now the caGrid
service is a gateway service that submits a
TeraGrid job on behalf of geWorkbench.
geWorkbench, however, does not notice this
difference.
Source http//wiki.c2b2.columbia.edu/informatics/
index.php/GeWorkbench_Example
50Hide the C in CLADE with a GatewayWhen is a
gateway appropriate?
- Researchers using defined sets of tools in
different ways - Same executables, different input
- GridChem, CHARMM
- Creating multi-scale or complex workflows
- Datasets
- Common data formats
- National Virtual Observatory
- Earth System Grid
- Some groups have invested significant efforts
here - caBIG, extensive discussions to develop common
terminology and formats - BIRN, extensive data sharing agreements
- Difficult to access data/advanced workflows
- Sensor/radar input
- LEAD, GEON
51Tremendous Potential for Gateways
- In only 16 years, the Web has fundamentally
changed human communication - Science Gateways can leverage this amazingly
powerful tool to - Transform the way scientists collaborate
- Streamline conduct of science
- Influence the publics perception of science
- Reliability, trust, continuity are fundamental to
truly change the conduct of science through the
use of gateways - High end resources can have a profound impact
- The future is very exciting!
52Thank you for your attention
- wilkinsn_at_sdsc.edu
- www.teragrid.org