Title: GreenPhylDB Largescale phylogenomic analyses for gene function prediction in GCP crops
1GreenPhylDBLarge-scale phylogenomic analyses for
gene function prediction in GCP crops
Matthieu CONTE
Christophe PERIN Marie-angélique LAPORTE
Mathieu ROUARD (PI)
2Objectives plants comparative genomics
- Why?
- A lot of sequencing project. Need bioinformatics
database and tools to drive experiments on gene
functional genomics. - How?
- Using phylogenomic methods. The only way to
identify orthologous genes (probable same
function). - What data?
- Use plant model with complete genomes.
- The only way to have complete and correct
orthologs predictions (or non prediction)
3GreenPhylDB V1.0http//greenphyl.cines.fr/
- Oryza sativa and Arabidopsis thaliana model
plants - Full genome available.
- Gene annotation quality (TAIR release 8, TIGR
release 5). - Most of functional evidence.
4GreenPhylDB V1.0Statistics
- 81,000 genes
- 6,420 manually annotated gene families
- 4,400 phylogenetically analysed gene families
- 24,000 orthologs relationships between rice and
Arabidopsis - (confidence score gt90)
5GreenPhylDB V1.0Easy to use and to find
information
Gene ID (TAIR, TIGR)
Family name
Gene name (alias)
Gene annotation
KEGG ID
InterPro ID
6GreenPhylDB V1.0If you have only the sequence.
7GreenPhylDB V1.0If you have only a sequence.
8GreenPhylDB V1.0Orthologs prediction
9GOST (GreenPhyl Ortholog Search Tool)How to
study a gene from another species ?
10GOST (GreenPhyl Orthog Search Tool)2 objectives
11GOST (GreenPhyl Orthog Search Tool)Paste ONE
complete protein sequence
12GOST (GreenPhyl Orthog Search Tool)Output
13GOST (GreenPhyl Orthog Search Tool)Output
Your gene
14i-GOSTbeta (Iterative GreenPhyl Ortholog Search
Tool) How to integrate more genes?
15i-GOSTbeta (Iterative GreenPhyl Orthog Search
Tool)Paste Max 20 complete protein sequence
16i-GOSTbeta (Iterative GreenPhyl Orthog Search
Tool) Step 1 gene classification and select
species
17i-GOSTbeta (Iterative GreenPhyl Orthog Search
Tool) Step 2 phylogenomic predictions
18i-GOSTbeta (Iterative GreenPhyl Orthog Search
Tool) Usual errors
- Integration of genes from different families.
- Integration of incomplete sequences.
- We noticed some problems during analysis of some
sequences - Please note that i-GOST is a BETA version
-
19GreenPhylDB V2.0 in progressObjectives
- 10 news fully sequenced genomes are now available
- (Populus alba, Glycine max, sorghum bicolor,
Medicago truncatula, Vitis vinifera , Selaginella
moellendorffii , Physcomitrella patens ,
Ostreococcus Tauri, Chlamydomonas reinhardtii ,
Cyanidioschyzon merolae ) - Why do you integrate these speciesand not GCP
crops? - Complete sequencing and gene prediction
- Will provide the complete list a plant gene
families! - Use functional information available on these
species - Reinforce phylogenomic signal and then orthologs
predictions - Have a taxonomy sampling close to GCP crop target
20Taxonomy Sampling GCP crops
21GreenPhylDB V2.0Step 1 gene clustering
390,000 sequences 25,000 clusters
Family assignment
GreenPhyl Database v2.0
10 news species 300,000 sequences
2 species 81,000 genes 21,400 clusters 6,400
genes families
GreenPhyl Database V1.0
22GreenPhylDB V2.0Family annotation platform
1./An essential step for optimal phylogenomic
analysis
- Development of a family annotation platform with
- Annotator registration system
- Annotation and standardisation rules
- Statistic on data available for each gene members
of the group - User friendly annotation procedure
2./A useful information for future gene
annotation
23GreenPhylDB- iGOST
- A database/tool develop for your comparative
genomic analysis - we need your feedbacks and comments.
Matthieu CONTE M.CONTE_at_CGIAR.ORG
Mathieu ROUARD M.ROUARD_at_CGIAR.ORG