Title: The Emerging Global Collaboratory for Microbial Metagenomics Researchers
1The Emerging Global Collaboratory for Microbial
Metagenomics Researchers
- Keynote
- Pacific Rim Applications and Grid Middleware
Assembly (PRAGMA) - Advanced Computing Applications and Technologies
Institute - NCSA
- September 26, 2007
Dr. Larry Smarr Director, California Institute
for Telecommunications and Information
Technology Harry E. Gruber Professor, Dept. of
Computer Science and Engineering Jacobs School of
Engineering, UCSD
2Abstract
Calit2, the J. Craig Venter Institute, and UCSD's
SDSC and Scripps Institution of Oceanography, are
creating a metagenomic Community
Cyberinfrastructure for Advanced Marine Microbial
Ecology Research and Analysis (CAMERA), funded by
the Gordon and Betty Moore Foundation. The
CAMERA computational and storage cluster, which
contains multiple ocean microbial metagenomic
datasets, as well as the full genomes of 166
marine microbes, is actively in use. End users
can access the metagenomic data either via the
web or over novel dedicated 10 Gb/s light paths
(termed "lambdas") through the National
LambdaRail. The end user clusters are
reconfigured as "OptIPortals," providing the
end user with local scalable visualization,
computing, and storage. Currently over 1200
users from over 45 countries are CAMERA
registered users, with over a dozen remote
OptIPortal sites becoming active. I will review
the status of users from PRAGMA countries and
discuss the possibility of PRAGMA becoming a
global LambdaGrid "living laboratory" for this
emerging high performance collaboratory.
3Most of Evolutionary Time Was in the Microbial
World
Tree of Life Derived from 16S rRNA Sequences
Source Carl Woese, et al
4The New Science of Metagenomics
The emerging field of metagenomics, where the
DNA of entire communities of microbes is studied
simultaneously, presents the greatest opportunity
-- perhaps since the invention of the microscope
to revolutionize understanding of the
microbial world. National Research
Council March 27, 2007
NRC Report Metagenomic data should be made
publicly available in international archives as
rapidly as possible.
5The Sargasso Sea Experiment The Power of
Environmental Metagenomics
- Yielded a Total of Over 1 Billion Base Pairs of
Non-Redundant Sequence - Displayed the Gene Content, Diversity, Relative
Abundance of the Organisms - Sequences from at Least 1800 Genomic Species,
including 148 Previously Unknown - Identified over 1.2 Million Unknown Genes
J. Craig Venter, et al. Science 2 April
2004 Vol. 304. pp. 66 - 74
MODIS-Aqua satellite image of ocean chlorophyll
in the Sargasso Sea grid about the BATS site from
22 February 2003
6Marine Genome Sequencing Project Measuring the
Genetic Diversity of Ocean Microbes
One Million Microbes Ten Million Viruses Per
Cubic Centimeter of Ocean Water
Sorcerer II Data Will Double Number of Proteins
in GenBank!
7Environmental Metadata Beyond Data Collected at
Sampling Site
NASA AQUA-MODIS Images covering GOS sites 8
12, mid November, 2003
8For More In Depth Research on Marine Microbial
Metagenomics
Published March 13, 2007
Special Issue A Sea of Microbes June 2007
9Enormous Increase in Scale of Known Genes Over
Last Decade
1.8 Million Bases 1749 Genes
6.3 Billion Bases 5.6 Million Genes
3300x
10Moore Foundation Funded the Venter Institute to
Provide the Full Genome Sequence of 155 Marine
Microbes
Phylogenetic Trees Created by Uli Stingl, Oregon
State Blue Means Contains One of the Moore 155
Genomes
www.moore.org/microgenome/trees.aspx
11Paul Gilna Ex. Dir.
PI Larry Smarr
Announced January 17, 2006 24.5M Over Seven Years
12(No Transcript)
13The Calit2 CAMERA Microbial Metagenomics Server
is Open to the Community
Launched March 17, 2007
PLOS Biology March 2007
14Calit2 CAMERA ProductionCompute and Storage
Complex
512 Processors 5 Teraflops 200 Terabytes
Storage
15CAMERA 1.2 is Here Next Week!
http//camera.calit2.net/
16Marine Microbial Metagenomics is a Global
Scientific Research Cyber-Community
Over 1300 Registered Users From 48 Countries
17Calit2s Direct Access Core Architecture Creates
a SuperNetwork Metagenomics Server
Sargasso Sea Data Sorcerer II Expedition
(GOS) JGI Community Sequencing Project Moore
Marine Microbial Project NASA and NOAA
Satellite Data Community Microbial Metagenomics
Data
Traditional User
Request
Response
Web Services
Source Phil Papadopoulos, SDSC, Calit2
18The OptIPuter Project Creating High Resolution
Portals Over Dedicated Optical Channels to
Global Science Data
13.5M Over Five YearsFinishes Fifth
Year Next Week!
Picture Source Mark Ellisman, David Lee, Jason
Leigh
Calit2 (UCSD, UCI) and UIC Lead CampusesLarry
Smarr PI Univ. Partners SDSC, USC, SDSU, NW,
TAM, UvA, SARA, KISTI, AIST Industry IBM, Sun,
Telcordia, Chiaro, Calient, Glimmerglass, Lucent
19Shared Internet BandwidthUnpredictable, Widely
Varying, Jitter, Asymmetric
Computers In Australia Canada Czech
Rep. India Japan Korea Mexico Moorea Netherlands P
oland Taiwan United States
UCSD
PRAGMA Bandwidth Challenge Email Teri Simas
Your Results From Home and Office
Source Larry Smarr and Friends
Measured Bandwidth from User Computer to
Stanford Gigabit Server in Megabits/sec http//net
speed.stanford.edu/
20Dedicated Optical Channels Makes High
Performance Cyberinfrastructure Possible
Parallel Lambdas are Driving Optical Networking
The Way Parallel Processors Drove 1990s Computing
21OptIPuter Software Architecture--a
Service-Oriented Architecture Integrating Lambdas
Into the Grid
Source Andrew Chien, UCSD
Globus
XIO
GSI
GRAM
GTP
XCP
UDT
LambdaStream
CEP
RBUDP
22My OptIPortalTM AffordableTermination Device
for the OptIPuter Global Backplane
- 20 Dual CPU Nodes, Twenty 24 Monitors, 50,000
- 1/4 Teraflop, 5 Terabyte Storage, 45 Mega
Pixels--Nice PC! - Scalable Adaptive Graphics Environment ( SAGE)
Jason Leigh, EVL-UIC
Source Phil Papadopoulos SDSC, Calit2
23Use of Tiled Display Wall OptIPortal to
Interactively View Microbial Genome
Acidobacteria bacterium Ellin345 Soil Bacterium
5.6 Mb
Source Raj Singh, UCSD
24Use of Tiled Display Wall OptIPortal to
Interactively View Microbial Genome
Source Raj Singh, UCSD
25Use of Tiled Display Wall OptIPortal to
Interactively View Microbial Genome
Source Raj Singh, UCSD
26Interactive Exploration of Marine Genomes Using
100 Million Pixels
Ginger Armburst (UW), Terry Gaasterland (UCSD SIO)
27CAMERA is Partnering to Port Metagenomic
Community Software to the OptIPortal
Collaboration Between Microbial Genomics Group,
Max Planck Institute for Marine Microbiology in
Bremen, Germany and CAMERA / Rocks Group
28Nearly One Half Billion Pixelsin Calit2 Extreme
Visualization Project!
UC San Diego
Connected at 2,000 Megabits/s!
UC Irvine
UCI HIPerWall Analyzing Pre- and Post- Katrina
Falko Kuester, UCSD Steven Jenks, UCI
29e-Science Collaboratory Without Walls Enabled by
Uncompressed HD Telepresence
iHDTV 1500 Mbits/sec Calit2 to UW Research
Channel Over NLR
May 23, 2007
John Delaney, PI LOOKING, Neptune
Photo Harry Ammons, SDSC
30Goal for SC07iHDTV Integrated into OptIPortal
Source Michael Wellings Research Channel Univ.
Washington
31An Emerging High Performance Collaboratoryfor
Microbial Metagenomics
32CAMERA is Helping Design an OptIPuter Between
Univ. of Hawaii and MIT
The Convergence of Biology, Earth Sciences, and
Computer Sciences
33Moving PRAGMA to the OptIPuter LambdaGridStep
One Join Global Lambda Integrated Facility
(GLIF)
Visualization courtesy of Bob Patterson, NCSA.
www.glif.is Created in Reykjavik, Iceland 2003
34Step Two Build a Rocks / SAGE OptIPortal
UZurich
CNIC
NCHC
Osaka U
35Step Three Get to Know Your Local Microbial
Metagenomicist Add to CAMERA Userbase
Over 1300 Registered Users From 48 Countries
36Step Four Calit2 / PRAGMA CAMERA LambdaGrid
Collaborations
- Add CAMERA Server to PRAGMA Grid Testbed
- Ad hoc Supercomputing
- NIMROD?
- New Bioinformatics Apps
- Set up PRAGMA OptIPortal LambdaGrid for a Few
International PRAGMA Sites - KISTI Konkuk U
- AIST Osaka U
- CNIC
- NCHC
- APAC (UMelbourne, Monash, U Queensland)
- CICESE
- UZurich
- Plus Other Volunteers!
PRAGMA Countries with CAMERA Registered Users
Source Paul Gilna, Kayo Arima, Calit2