Title: Legume Information Network: A Component of the Virtual Plant Information Network
1Legume Information Network A Component of the
Virtual Plant Information Network
- National Center for Genome Resources
- University of Minnesota Center for
Computational Genomics and Bioinformatics - United States Department of Agriculture
Agricultural Research Service
Gregory D. May Atlanta October 2007
2Current State of Bioinformatics Resources
- Hundreds of Project web-sites and DBs
- Project DBs are distributed, autonomous and
ephemeral - Inconsistent user interfaces
TIGR Gene Indices
- Stein et al, (2006) Plant Biology Databases A
Needs Assessment by the NSF-USDA Working Group on
Long-Lived Databases.
3The promises of 30 high throughput omics
technologies
- Improved crops
- nutrition, novel traits, resistance,
- yield, sustainable
- Improved animal production
- Improved human health
- biomarker diagnostics
- personalized medicines and therapies
- Improved environment
- bioremediation
- carbon sequestration
- energy independence
4The need
- The legume biologist still must navigate multiple
information resources for many research questions
- Develop a virtual, easy-to-navigate one-stop
legume information network. By one-stop we
refer by analogy to Google and how it can be seen
as a single, yet non-exclusive, information
resource. - Gepts et al, Report from the CATG meeting.
- Plant Physiology (2005) 1371228.
5(No Transcript)
6Virtual Plant Information Network
- Establish an architecture based on semantic web
technologies to support interoperable (database)
network - Standardize data formats and user-interfaces to
support machine readable representation of
genomes, genetic maps, polymorphisms, QTL,
expression, proteins, metabolites and phenotypes.
- Develop breeders toolboxes with visual
interfaces similar to that depicted in GEYSIR
7Goals
- Design a solution for integrating disparate data
sources - Develop a prototype, Legume Information Network,
demonstrating the capabilities of semantic web
technologies - Legume community take a leadership role in data
and tool integration using semantic-MOBY
8The Requirements
- Devise a way in which resources can be described,
discovered, and invoked on the web using
- a common syntax so machines can parse the data
and services of each other - a public semantic so machines can make
determinations on suitability-for-purpose - a discovery service so machines can find data
and services across the web based on the
semantics of the resources being offered and the
needs of the task at hand
9The Approach Keep it simple
Clients, Providers, and even Discovery Servers
all read and contribute to the same set of
statements.
All actors understand a single, mutable graph
which embeds an explicit logic necessary and
sufficient to describe, query, discover, invoke,
and satisfy resources and requests.
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17Services
Data Provider Services Service Description Provide
r GO Annotated Transcript Sequences LIS Medicago
IMGAG Annotations CCGB Precomputed BlastX against
NCBI's NR LIS Blocks precomputed analysis
retrieval LIS GenScan precomputed gene
predictions LIS Sequence Text Retrieval LIS GO
Annotations Retrieval LIS InterPro precomputed
analysis retrieval LIS
Visualization Services Service
Description Provider Comparative Map and Trait
Viewer LIS ISYS TableViewer LIS Alignment
visualization using PFAAT CCGB
Analysis Services Service Description Provider Clu
stalw Multiple Sequence Alignment CCGB BlastN LIS
Transcript Contigs LIS Blast sequences against
Kegg Genes CCGB Blast sequences against TIGR TOG
Sequence CCGB BlastN Legume BACs LIS BlastN
Lotus finished BACs LIS
18LIN partners
19Resources
A running Discovery Server www.semanticMoby.org
The project web site vpin.ncgr.org Discussion
forum vpin.ncgr.org/mvnforum/forum Collection
of ontologies ontologies.ncgr.org Protocol
documentation ontologies.ncgr.org/OWLDocs/moby P
ublications and other docs vpin.ncgr.org/links.sh
tml Developers resources www.semanticmoby.org/d
eveloper/index.jsp Provider Developer
Kit vpin.ncgr.org/provider.shtml Client
Developer Kit vpin.ncgr.org/client.shtml
20Generation of DNA Sequence Data
Cost/1000 bp 1990 10.00 2000 3.00 2005
1.00 2006 0.10 2007 0.03
21Sequencing Platform Comparison
22Alpheus Cyberinfrastructure for medical and
agricultural resequencing
- Nucleotide variant and splice isoform detection
- 100s Gb-scale resequencing projects
- Short reads (454, Solexa, SOLiD plus Sanger)
- Paired and unpaired
- Alignments to genomic and transcriptomic
references - Greek mythology cleansed the Augean stables and
restored life to the soil
23Pileup Visualization
Slidable window
Overview of transcript
Coding domain
nsSNP SNP in/del
454 reads
24Dynamic Filtering
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30Summary of Medicago ecotype F83005.5 Solexa
resequencing
- With 1x coverage of a 540Mb genome
- One SNP 600bp no filtering
- 45,000 High-stringent SNPs
31(No Transcript)
32Application of Next-Generation Sequencing
Technologies for Variant Detection in Crop Plants
and Pathogens
- Whole transcriptome shotgun re-sequencing
- Expressed portions (or gene space) of the genome
across populations in the absence of a reference
genome - Whole genome shotgun re-sequencing
- Sequence across populations with available
reference genomes - WGS skimming of transformation events
- Target genome re-sequencing across populations
- Area under the QTL
- Pooled long-PCR products to walk between markers
- Restriction enzyme-anchored
33GEYSIR(Genomic Explorer y Survey of Immune
Response)
geysir.ncgr.org
Clickable LOD scores moves selection windows
Map region selection windows (grab slide)
Marker on linkage map (cM)
Zoom pan buttons
View Selected Studies (across all chromosomes)
Sample study 1
Marker on physical map (Mb)
Chromosome Map
Marker titles visible in this 1.5 Mb region
Candidate genes in blue
CTRL-left mouse click takes you to Gene detail
page
Slide-able feature neighborhood window
Nucleotide slider window
Exons in green
Click on chromosome 22
SNP markers
Clickable SNP bubbles take you to dbSNP
Nucleotide slider window View
34Acknowledgements
- NCGR LIS
- Greg May
- Kamal Gajendran
- Andrew Farmer
- Michael Gonzales
- Selene Virk
- Bill Beavis
- USDA-ARS LIS
- Randy Shoemaker
- David Grant
- Rich Wilson
- NCGR GEYSIR
- Susan Baxter
- Faye Schilkey
- Neil Miller
- Dan Weems
- Lar Mader
- USDA-ARS LIN
- Randy Shoemaker
- Michelle Graham
- CCGB/U. Minn LIN
- Ernest Retzel
- Jim Johnson
- Michael Heuer
- John Crow
- NCGR VPIN/LIN
- Damian Gessler
- Gary Schiltz
- Bill Beavis
- Andrew Farmer
- S. Knapp
- N. Young
- Funding
- LIS/LIN USDA-ARS
- SCA 3625-21000-038-01
- GEYSIR NIH-NIAID HHSN266200400064C
- VPIN NSF-BDI 0516487
- LIS Steering Committee
- Mark Burow
- Doug Cook
- Perry Cregan
- Rebecca Dickstein
- David Grant
- Randy Shoemaker
- Michael Udvardi
- Nevin Young